Topics

Finds a sweet spot between no-interaction and all-to-all interaction modelling.

In no-interaction or bi-encoders, model takes either query or document and gives a single embedding. We do cosine similary and get score. In all-to-all interaction or cross-encoders, we train a model to score on query-doc pairs. Model internally interacts with all query and document tokens.

In late interaction, model generates context-aware embeddings (note the plural) from query and document separately. These generated embeddings from query and document, are cross-encoded or interacted to obtain similarity score.

Since the query-document interaction happens late after embeddings have been obtained separately for query and document terms, we call this late-interaction.

  • Example: ColBERT
  • Advantages:
    • Expressiveness via query-document interaction
    • Computational benefits of offline document representation
    • Avoids information bottleneck of single embeddings