Zettels ProblemSets Resume Contact

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

Zettels ProblemSets Resume Contact

powerful ideas for MoEs

Nov 19, 20241 min read

Topics

mixture of experts

Instead of token routing, if we do sentence/task routing, it would be powerful and allow us to extract sub-networks that can be used to serve specific tasks
- Extracted model will be smaller, and inference will be faster
knowledge distillation: distil into a dense model
aggregation of experts: merge weights of experts
Extreme quantization, such as ones seen in 1-bit LLMs

Related

popular MoE models

Backlinks

MoE pros and cons

Created with Quartz v4.4.0 © 2025

Source