Topics
In machine learning, it is a common practice to take a natural log of the objective function to simplify taking derivatives. For example, softmax has the following probability function:
Taking a log simplifies the function:
Simplifying the original softmax function helps with taking the derivatives in the future. Also, natural log is a monotonically increasing function. This ensures that the maximum value of the original probability function occurs at the same point as the log probability function, i.e.
LogSoftmax is more numerically stable and can better handle extreme values in the input data, providing a more balanced and stable output distribution. Another advantage is that it plays nicely with cross-entropy loss. The cross-entropy loss formula involves taking the negative log of the probability assigned to a class. If we use LogSoftmax
instead of softmax, the computation becomes more straightforward and efficient, as the logarithm is already applied in the softmax step.