
It’s like a traditional n-gram model but with the key difference that it skips words rather than using continuous sequences.

For example, given the sentence The quick brown fox jumps, a Skip-gram model might learn from pairs like:

  • (quick → fox) - skipping brown
  • (quick → jumps) - skipping brown and fox
  • (brown → the) - skipping quick

This skipping behavior allows the model to capture broader context relationships between words, even when they’re not directly adjacent.