Topics
In the standard skip-gram model, for each center word , the model needs to classify which word from the entire vocabulary is the correct context word . This is analogous to typical classification tasks where the model has multiple classes (the words in the vocabulary), and it needs to pick one correct class (the context word). The model uses a softmax function to assign probabilities to all the words in the vocabulary. The word with the highest probability is considered the model’s prediction. This is similar to how a classifier assigns probabilities to classes and picks the class with the highest score.
In this sense, standard skip-gram is a multi-class classification task, where the goal is to classify one context word from many possible candidates.