Topics
Continuous Bag of Words (CBOW) is an algorithm belonging to the word2vec family, for learning fixed-size word embeddings of words in a vocabulary. The modelling is very similar to standard skip-gram, only difference being in how we feed the data.
The design choice is to average out the embeddings for the context words and compute dot products between this
avg_context
and all words in our vocab V, giving us a vector of size V.
Dot product values have no fixed range, so let’s apply softmax to turn this V-sized vector into a probability distribution with values between 0 and 1.
Goal: Value corresponding to the center word be higher than others.
If we use a neural network for modelling our “black-box”:
- Our model can use embedding or linear layers to give us word vectors
- For window size , we can do a sliding window over our sentences and give
(context_1, context_2, ..., context_2m, center)
word tuples as input to model. Input(2, 4, 0)
indicates 3rd, 5th being context words and 1st being center word - We get average embeddings of context words, calculate our vector and pick out the 1st value, i.e.
0.3
. We want to maximize this, i.e. minimize-0.3
. During implementation, we use cross-entropy loss with0
as target class index - Use gradient descent and backprop to train over this loss and refine our embeddings