Topics
Kernel Trick is a technique used in machine learning algorithms. Allows algorithms designed for linear data to handle non-linear data. This is done by implicitly mapping data into a higher-dimensional “feature space” where it becomes linearly separable.
Explicitly computing this mapping and working in the high-dimensional space can be computationally expensive or impossible, especially if the space is very high or infinite dimensional. Let’s illustrate this with an example: suppose , and let be the vector that contains all the monomials of x with degree :
The “trick” is that many linear machine learning algorithms (such as the SVM dual formulation and PCA) only require calculating dot products (inner product) between data points. The kernel trick defines a function, called a kernel function , that directly computes (we don’t need to transform ) the dot product of the data points after they have been mapped to the higher-dimensional space, i.e.,
By replacing the original dot product with in the algorithm, the model behaves as if it is operating in the high-dimensional feature space without ever needing to explicitly compute the coordinates in that space or the mapping function . This avoids the computational burden of explicit feature mapping.
Note
A function is a valid kernel if it corresponds to a dot product in some feature space (satisfies mercer’s condition).