Topics
- Instead of token routing, if we do sentence/task routing, it would be powerful and allow us to extract sub-networks that can be used to serve specific tasks
- Extracted model will be smaller, and inference will be faster
- knowledge distillation: distil into a dense model
- aggregation of experts: merge weights of experts
- Extreme quantization, such as ones seen in 1-bit LLMs