Zettels ProblemSets Resume Contact

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

Zettels ProblemSets Resume Contact

MoE pros and cons

Nov 18, 20241 min read

Topics

mixture of experts

Pros

Allows for pretraining with less compute
Faster inference
Useful in high throughput scenarios with many machines/cores

Cons

Even though the effective number of params is lower, all experts have to be loaded in memory, since we do not know which expert the token will be routed to. This results in high VRAM usage
MoE fine-tuning is difficult
Knowing what each expert in MoE learns isn’t very helpful in practice

Related

powerful ideas for MoEs

Backlinks

No backlinks found

Created with Quartz v4.4.0 © 2025

Source