quantize activation functions

Topics

quantization

activation function

Activations are quantized to a specified bit-width (8-bit, in the case of BitNet b1.58) using absmax per token quantization. This involves scaling the activations into the range [−128, 127] for an 8-bit bit-width. The quantization formula is:

s_{i} X_{i}^{q u an t} X_{i}^{d e q u an t} = \frac{127}{max _{j} ∣ X _{i, j} ∣} = clamp_{[- 128, 127]} (round (X_{i} \cdot s_{i})) = \frac{X _{i}^{q u an t}}{s _{i}}

Breaking down each component:

The scale factor $s_{i}$ which is a scalar for row $i$ is computed as: divide 127 by the maximum absolute value in that row
Create a new quantized matrix $X_{i}^{q u an t}$ by applying round clamp/clip as per formula
Finally, dequantize by simply dividing each row by its scale factor $s_{i}$

The process maintains precision per row while ensuring all values fit within the 8-bit integer range.

Note

We apply layer normalization (LN) before quantizing the activations to maintain the variance of the output:

quantize activation functions in pytorch

Altamash Khan

Altamash Khan

quantize activation functions

Backlinks

Altamash Khan

quantize activation functions

Related

Backlinks