ZettelsProblemSetsResumeContact
Altamash Khan

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

Altamash Khan

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

ZettelsProblemSetsResumeContact
redbuffs

powerful ideas for MoEs

Nov 19, 20241 min read

Topics

  • mixture of experts
  • Instead of token routing, if we do sentence/task routing, it would be powerful and allow us to extract sub-networks that can be used to serve specific tasks
    • Extracted model will be smaller, and inference will be faster
  • knowledge distillation: distil into a dense model
  • aggregation of experts: merge weights of experts
  • Extreme quantization, such as ones seen in 1-bit LLMs

Related

  • popular MoE models

Backlinks

  • MoE pros and cons

Created with Quartz v4.4.0 © 2025

  • Source