selective precision in models

Topics

quantization

bit representation

High precision (32-bit fp) is often wasteful and researchers attempt to convert everything to 16-bit fp or bfloat16, but in some cases such as in switch-transformer MoE, model doesn’t converge and training is unstable.

To fix this, one can keep parts of the model in high precision while casting other parts to low precision for speed, aka selective precision. A good example is the token routing part in switch-transformers where softmax is commonly used. Since exponentiation is sensitive to even minute changes in values, keeping them in high precision is important. Rest of the model can be casted to bfloat16 ⎯ thus keeping speed and training stability intact.

mixture of experts

Altamash Khan

Altamash Khan

selective precision in models

Backlinks

Altamash Khan

selective precision in models

Related

Backlinks