Topics
A capacity factor greater than 1 allows each expert to handle a buffer above the evenly distributed share, accommodating imbalances in token assignment. If the expert capacity is reached and there’s another token to be sent to this expert, we can choose to do the following:
- drop this token
- send to the next expert
- send to next layer aka token overflow
Tip
Capacity factor too high ⎯ wasting computing resource; too low ⎯ lots of token overflow. Common ranges in
[1, 1.25]