Topics

If we apply log and softmax separately, when the output of softmax becomes very close to zero, then log would yield negative infinity.

x = torch.tensor([-500.0, 0])
 
torch.log(torch.softmax(x, dim=0)) # tensor([-inf, 0.])
torch.log_softmax(x, dim=0) # tensor([-500.0, 0.])

The numerical instability stems from the log and exp operations done separately:

torch.log(torch.exp(x)) # tensor([-inf, 0.])

In above example, we expect log and exp to cancel each other out and get x, but we actually get [-inf, 0.0].