Topics

SVM aims to find a hyperplane that maximizes the margin between classes. The standard formulation is a convex quadratic programming (QP) problem

subject to for all data points (for hard margin SVM). Solving this QP can use various methods:

Using GD/SGD means minimizing a different, but equivalent, objective function for the soft margin SVM. The constrained primal problem:

Minimize:

subject to ,

This is equivalent to minimizing the regularized hinge loss objective (unconstrained):

This objective function is convex. Therefore, GD/SGD can find the global minimum:

  1. Initialize and (e.g., to zeros)
  2. Repeat until convergence:
    1. Compute gradient of the objective function w.r.t and
    2. Update parameters: , , where is learning rate

Gradient Calculation

Objective is sum of regularization term and sum of hinge losses (across dataset) . Gradient of a sum is sum of gradients:

  • Gradient of w.r.t is and w.r.t
  • Gradient of : for each data point , let :
    • If (point correctly classified, outside margin): Gradient of is
    • If (point violates margin or misclassified): Gradient of w.r.t is and w.r.t is

Combined Gradients:

Parameter Updates

For Batched GD, find gradients as above (across the dataset) and update the params and at the end, whereas for SGD, update based on gradient from a single sample at each step:

If :

  • (only regularization contributes)

Else (, margin violated):

Update: