Topics

Building on the GloVe interpretation through co-occurrence ratios, the objective function can be derived as follows:

Start with the desired property:

Choose a scalar function :

  • We want to be a scalar function since the ratio is a scalar
  • A reasonable choice is dot product:
  • The dot product captures similarity, and the difference compares the two context words
  • Note that this is a heuristic choice which worked well

Determine the form of :

  • Switching indices and should give the reciprocal:
    • The probabilities become inverted and the vector difference becomes negative. Multiplication between original equation and one where we swap and results in
  • The exponential function satisfies this:
  • Therefore, choose

This gives us:

Assume , where is a constant. Take the logarithm and substitute :

Introduce bias term to replace :

This expression is not symmetric w.r.t. and , due to the term. So to balance it out, we add another bias term :

The GloVe objective function is obtained by measuring the squared error of this approximation with weights:

Where is a weighting function in GloVe and is the vocabulary size.

This objective function aims to learn word vectors that capture the relationships between words as expressed by their co-occurrence statistics in the corpus.