Topics

  • During chunking stage, context is destroyed
  • While embedding models excel at capturing semantic relationships, they can miss crucial exact matches
    • E.g. “Error code TS-999” in a technical support database
    • Use BM-25 to tackle this
  • Plain word embeddings often lack contrastive information
    • Failing to distinguish between “I love apples” and “I used to love apples” since both convey a similar semantic meaning.
  • Embeddings represent sentences in a relatively low-dimensional space which makes it challenging to encode all relevant information accurately, especially for longer documents or queries. This is one of the main issues with embedding based search