derive the GloVe objective function

Topics

GloVe

loss functions

Building on the GloVe interpretation through co-occurrence ratios, the objective function can be derived as follows:

Start with the desired property:

f (u_{j}, u_{k}, v_{i}) \approx \frac{p _{ij}}{p _{ik}}

Choose a scalar function $f$ :

We want $f$ to be a scalar function since the ratio is a scalar
A reasonable choice is dot product: $f (u_{j}, u_{k}, v_{i}) = f ((u_{j} - u_{k})^{⊤} v_{i})$
The dot product captures similarity, and the difference $(u_{j} - u_{k})$ compares the two context words
Note that this is a heuristic choice which worked well

Determine the form of $f$ :

Switching indices $j$ and $k$ should give the reciprocal: $f (x) f (- x) = 1$
- The probabilities become inverted and the vector difference becomes negative. Multiplication between original equation and one where we swap $j$ and $k$ results in $f (x) f (- x) = 1$
The exponential function satisfies this: $exp (x) exp (- x) = 1$
Therefore, choose $f (x) = exp (x)$

This gives us:

f (u_{j}, u_{k}, v_{i}) = \frac{exp ( u _{j}^{⊤} v _{i} )}{exp ( u _{k}^{⊤} v _{i} )} \approx \frac{p _{ij}}{p _{ik}}

Assume $exp (u_{j}^{⊤} v_{i}) \approx α p_{ij}$ , where $α$ is a constant. Take the logarithm and substitute $p_{ij} = x_{ij} / x_{i}$ :

u_{j}^{⊤} v_{i} \approx lo g α + lo g x_{ij} - lo g x_{i}

Introduce bias term $b_{i}$ to replace $- lo g α + lo g x_{i}$ :

u_{j}^{⊤} v_{i} + b_{i} \approx lo g x_{ij}

This expression is not symmetric w.r.t. $i$ and $j$ , due to the $b_{i}$ term. So to balance it out, we add another bias term $c_{j}$ :

u_{j}^{⊤} v_{i} + b_{i} + c_{j} \approx lo g x_{ij}

The GloVe objective function is obtained by measuring the squared error of this approximation with weights:

J = i, j = 1 \sum V f (x_{ij}) (u_{j}^{⊤} v_{i} + b_{i} + c_{j} - lo g x_{ij})^{2}

Where $f (x_{ij})$ is a weighting function in GloVe and $V$ is the vocabulary size.

This objective function aims to learn word vectors that capture the relationships between words as expressed by their co-occurrence statistics in the corpus.

Altamash Khan

Altamash Khan

derive the GloVe objective function

Backlinks

Altamash Khan

derive the GloVe objective function

Related

Backlinks