CBOW

Topics

word2vec

Continuous Bag of Words (CBOW) is an algorithm belonging to the word2vec family, for learning fixed-size word embeddings of words in a vocabulary. The modelling is very similar to standard skip-gram, only difference being in how we feed the data.

The design choice is to average out the embeddings for the context words and compute dot products between this avg_context and all words in our vocab V, giving us a vector of size V.

dot-products = {0.5, 33, 12.0, - 220.5, - 1.5, \dots}

Dot product values have no fixed range, so let’s apply softmax to turn this V-sized vector into a probability distribution with values between 0 and 1.

softmax-ed-values = {0.3, 0.7, 0.5, 0.001, 0.01, \dots}

Goal: Value corresponding to the center word be higher than others.

If we use a neural network for modelling our “black-box”:

Our model can use embedding or linear layers to give us word vectors
For window size $m$ , we can do a sliding window over our sentences and give (context_1, context_2, ..., context_2m, center) word tuples as input to model. Input (2, 4, 0) indicates 3rd, 5th being context words and 1st being center word
We get average embeddings of context words, calculate our $softmax-ed-values$ vector and pick out the 1st value, i.e. 0.3. We want to maximize this, i.e. minimize -0.3. During implementation, we use cross-entropy loss with 0 as target class index
Use gradient descent and backprop to train over this loss and refine our embeddings

skip-gram

Altamash Khan

Altamash Khan

CBOW

Backlinks

Altamash Khan

CBOW

Related

Backlinks