ZettelsProblemSetsResumeContact
Altamash Khan

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

Altamash Khan

Altamash Khan

Did the war forge the spear that remained? No. All it did was identify the spear that wouldn't break

ZettelsProblemSetsResumeContact
redbuffs

MoE pros and cons

Nov 18, 20241 min read

Topics

  • mixture of experts

Pros

  • Allows for pretraining with less compute
  • Faster inference
  • Useful in high throughput scenarios with many machines/cores

Cons

  • Even though the effective number of params is lower, all experts have to be loaded in memory, since we do not know which expert the token will be routed to. This results in high VRAM usage
  • MoE fine-tuning is difficult
  • Knowing what each expert in MoE learns isn’t very helpful in practice

Related

  • powerful ideas for MoEs

Backlinks

  • No backlinks found

Created with Quartz v4.4.0 © 2025

  • Source