Topics

In the GloVe objective function, we have a weighting function which should have the following properties:

  • and this makes sure we don’t run into undefined problems in the cost function when
  • If is viewed as a continuous function, it should vanish as x → 0 fast enough so that the value is finite
  • should be non-decreasing so that rare co-occurrences are not over-weighted
    • Example: Let’s say we have “the” and “cat”: 1000 times; “quantum” and “physics”: 50 times; “purple” and “banana”: 2 times. A decreasing where will make the common pair “the cat” contribute less to the learning process than the rare pair “purple banana”
  • Similarly, should be relatively small for large values of x, so that frequent co-occurrences are not over-weighted

The below function — an extension of the clipped Power Law function satisfies above properties and used in GloVe.