why use log softmax over softmax

Topics

optimization

In machine learning, it is a common practice to take a natural log of the objective function to simplify taking derivatives. For example, softmax has the following probability function:

p (x_{i}) = \frac{e ^{x_{i}}}{\sum _{j = 1} e ^{x_{j}}}

Taking a log simplifies the function:

lo g p (x_{i}) = x_{i} - lo g j = 1 \sum e^{x_{j}}

Simplifying the original softmax function helps with taking the derivatives in the future. Also, natural log is a monotonically increasing function. This ensures that the maximum value of the original probability function occurs at the same point as the log probability function, i.e.

θ argmax p (x_{i}) = θ argmax lo g p (x_{i})

LogSoftmax is more numerically stable and can better handle extreme values in the input data, providing a more balanced and stable output distribution. Another advantage is that it plays nicely with cross-entropy loss. The cross-entropy loss formula involves taking the negative log of the probability assigned to a class. If we use LogSoftmax instead of softmax, the computation becomes more straightforward and efficient, as the logarithm is already applied in the softmax step.

Altamash Khan

Altamash Khan

why use log softmax over softmax

Backlinks

Altamash Khan

why use log softmax over softmax

Related

Backlinks