Topics
Logistic Regression: supervised learning algorithm primarily for binary classification problems. Goal is to predict the probability that an input sample belongs to a particular class (commonly labeled 1 or 0).
Note
Despite ‘regression’ in name, it predicts probability of categorical outcome, not a continuous value. Core idea: model probability using a transformation of a linear combination of input features.
- : input feature vector
- : weight vector learned during training
- : bias term learned during training
- : the sigmoid function
It maps any real number to a value between 0 and 1, making it suitable for representing probabilities. The term is a linear score or evidence for the positive class. A large positive results in a probability close to 1, a large negative in a probability close to 0, and results in .
Logistic Regression models the linear relationship between input features and the log-odds of the positive outcome. The odds of an event are:
Taking the natural logarithm gives the log-odds:
This equation shows that the linear model is modeling the log-odds, not the probability directly. Solving for recovers the sigmoid form.
Decision Boundary: To make a class prediction (0 or 1), a threshold is applied to the predicted probability. Commonly, if , predict class 1; otherwise, predict class 0. . Since when , the decision boundary is defined by . This equation represents a hyperplane in the feature space, making Logistic Regression a linear classifier.
Training: Parameters and are typically learned using maximum likelihood estimation (MLE). This is equivalent to minimizing the cross-entropy loss between predicted probs and true labels. MLE sets up the logistic regression optimization problem and one can obtain maximum likelihood estimates using different methods. Using an optimization algorithm such as gradient descent is one of them.