Topics
Mainly because the network is shallow with just embedding or linear layers, so no need to use non-linearity. Also the loss function adds some non-linearity to the logits (e.g., sigmoid for skip-gram with negative sampling, softmax for cross-entropy).