maximum margin objective in SVM

Topics

support vector machines

optimization

Support Vector Machines (SVM) find the optimal hyperplane to separate data points into different classes. Key idea is to maximize the distance between the hyperplane and the closest data points. This distance is called the margin. A larger margin means better generalization to unseen data.

The separating hyperplane is defined by $w^{T} x + b = 0$ . Data points are classified based on the sign of $w^{T} x + b$ . Additionally, to ensure a margin, we introduce canonical hyperplanes (margin boundaries) at $w^{T} x + b = + 1$ and $w^{T} x + b = - 1$ . The data points closest to the separating hyperplane, called support vectors, lie exactly on these boundaries.

Note

Choosing a different value, like $\pm 0.5$ or any $\pm k$ for $k > 0$ , would result in the same optimal separating hyperplane but with scaled parameters ( $w$ and $b$ would be scaled by $1/ k$ ). Setting it to $\pm 1$ is simply a standardized normalization that simplifies the mathematical formulation of the optimization problem.

Objective

The geometric distance between the $w^{T} x + b = + 1$ and $w^{T} x + b = - 1$ hyperplanes is:

\frac{2}{∣∣ w ∣ ∣ _{2}}

Maximizing this distance is the objective and has desirable properties.

maximize \frac{2}{∣∣ w ∣ ∣ _{2}} ⟹ minimize ∣∣ w ∣ ∣_{2} \approx \frac{1}{2} ∣∣ w ∣ ∣_{2}^{2}

Why minimize $\frac{1}{2} ∣∣ w ∣ ∣_{2}^{2}$

Minimizing $∣∣ w ∣ ∣_{2}^{2}$ is same as minimizing $∣∣ w ∣ ∣_{2}$ (squaring a positive value doesn’t change location of minimum). The factor $\frac{1}{2}$ is added for mathematical convenience. The gradient of $\frac{1}{2} ∣∣ w ∣ ∣_{2}^{2}$ is $w$ , which simplifies derivative calculations, especially when using lagrange multipliers for optimization.

This minimization is performed subject to constraints: for each training point $(x_{i}, y_{i})$ , the constraint ensures it is on the correct side of the margin. Using the canonical representation where the functional margin is 1, the constraint is $y_{i} (w^{T} x_{i} + b) \geq 1$ .

For soft margin SVM, which allows some points to be misclassified or violate the margin, slack variables $ξ_{i} \geq 0$ are introduced. The constraint becomes $y_{i} (w^{T} x_{i} + b) \geq 1 - ξ_{i}$ , and a penalty term $C \sum ξ_{i}$ is added to the objective function. The objective is then termed as soft margin objective in SVM.

So basically, maximize the margin → minimize the cost function defined above subject to some constraints. Observe that this is a convex quadratic programming (QP) problem and there are few ways to solve this:

typically solved by formulating and solving its dual problem
- specialized algorithms like Sequential Minimal Optimization (SMO) are effective
gradient descent or SGD on the primal formulation

Altamash Khan

Altamash Khan

maximum margin objective in SVM

Objective

Backlinks

Altamash Khan

maximum margin objective in SVM

Objective

Related

Backlinks