Topics
Building on the GloVe interpretation through co-occurrence ratios, the objective function can be derived as follows:
Start with the desired property:
Choose a scalar function :
- We want to be a scalar function since the ratio is a scalar
- A reasonable choice is dot product:
- The dot product captures similarity, and the difference compares the two context words
- Note that this is a heuristic choice which worked well
Determine the form of :
- Switching indices and should give the reciprocal:
- The probabilities become inverted and the vector difference becomes negative. Multiplication between original equation and one where we swap and results in
- The exponential function satisfies this:
- Therefore, choose
This gives us:
Assume , where is a constant. Take the logarithm and substitute :
Introduce bias term to replace :
This expression is not symmetric w.r.t. and , due to the term. So to balance it out, we add another bias term :
The GloVe objective function is obtained by measuring the squared error of this approximation with weights:
Where is a weighting function in GloVe and is the vocabulary size.
This objective function aims to learn word vectors that capture the relationships between words as expressed by their co-occurrence statistics in the corpus.