Topics

The principle of Maximum Likelihood Estimation (MLE) is a general method for estimating parameters for any statistical model. It states that given a dataset and a statistical model, the best parameters are those that maximize the probability (likelihood) of observing the given data under that model.

Assume training examples are independent and identically distributed. For binary classification, assume (target) follows a bernoulli distribution with parameter:

The probability mass function for is

The likelihood of the entire dataset is the product of individual probabilities:

To maximize likelihood , it is numerically more stable and computationally easier to maximize the log-likelihood :

Maximizing is equivalent to minimizing . In an optimization setting, we can even divide by to get an “average loss”.

Note

Observe that this formulation of MLE for binary classification is basically the cost function for logistic regression optimization problem as well.