Topics

The outer summation over the N samples (or mini-batch) can either be sum (like above) or average. This is governed by the param reduction in PyTorch. This formula can be used for multi-class classification as well as multi-label classification, but in Pytorch, nn.CrossEntropyLoss is used for multi-class since it applies softmax internally, while nn.BCEWithLogitsLoss is used for mult-label since it applies sigmoid internally.