Topics

Gradient descent is an iterative optimization algorithm that finds the minimum of a function by repeatedly moving in the direction opposite to the gradient. The algo on the logistic regression optimization problem is performed as follows:

  • Initialize parameters: Start with an initial guess for the parameter vector . This is often a vector of zeros or small random values
  • Iteratively update parameters: In each iteration , update the parameters to using the following rule: where is the vector of parameters at iteration , and (eta) is the learning rate (or step length), a small positive value that determines the size of the step taken in the direction of the negative gradient. is the gradient of the cost function evaluated at . The gradient is a vector of partial derivatives of with respect to each parameter
  • Repeat until Convergence: Continue computing gradient and updating params for a predetermined number of iterations or until the change in the parameters or the cost function between iterations is smaller than a specified tolerance

Calculate the Gradient

To perform the update, you need to calculate the partial derivative of the cost function with respect to each parameter . The cost function is

For a single observation in logistic regression:

  • Log-likelihood term:
  • Sigmoid function:
  • Derivative of sigmoid:

Using chain rule:

  1. Let

Breaking it down:

Combining terms:

For full gradient (all examples):

This derivative represents the average error (predicted probability minus actual outcome) scaled by the corresponding feature value across all training examples.

import numpy as np
# Assume X (m, n+1), y (m,), theta (n+1,), learning_rate eta
# n+1 is because of merging w and b terms
m = len(y)
z = X.dot(theta)
h = sigmoid(z) # Using sigmoid function from LR_003
gradient = (1/m) * X.T.dot(h - y)
theta = theta - eta * gradient