A mathematical inquiry into kernel functions and their use.
Kernel functions are a stable in machine learning, and is used in a wide array of algorithms from support vector machines, and principal component analysis to convolutional neural networks.
However, these algorithms are often taught in isolation which can obscure the role of kernel functions which may result in them seeming rather arbitrary.
Therefore, it can be useful to look at kernel functions in isolation.
What are kernel functions?
Kernel functions are a generalization of the vector dot product.
Recall that the dot product between two vectors , both in is given by:
Suppose we have a function then the dot product of , and in the
The value of which is in . A corresponding kernel function,
Why kernel functions?
Kernel functions can implicitly operate in a high dimensional feature space without ever visiting the space. It turns out that it's often computationally beneficial to operate in the low dimensional feature space.
As it turns out, it's often cheaper to compute
For example, consider the data on the left.
(Thanks wikimedia for the picture)
Ordinarily, we wouldn't be able to separate the data using a linear SVM, however, by using the kernel, we effectively transform the data to what's depicted on the right which is linearily separable.
Which we trivially can segment using a linear SVM.
Kernel functions provide a computationally cheap way of computing the dot product of two vectors without explicitly visiting the high dimensional feature space, and when used in the kernel trick, kernel functions can enable linear classifiers to make non-linear classifications.