Introduction
A high-performance SVM classifier will likely need thousands of support vectors, and the resulting high complexity of classification prevents their use in many practical applications, with large numbers of training samples or large numbers of features in the input space. To handle this, several approximations to the RBF kernel (and similar kernels) have been devised. Typically, these take the form of a function z that maps a single vector to a vector of higher dimensionality, approximating the kernel.
where Phi is the implicit mapping embedded in the RBF kernel.
Implementation
Scikit-learn has already implemented Fourier transform and Nystroem approximation techniques using RBFSampler and Nystroem classes accordingly.
import matplotlib.pyplot as plt import numpy as np from sklearn import datasets, svm, pipeline from sklearn.kernel_approximation import (RBFSampler, Nystroem) # The digits dataset digits = datasets.load_digits(n_class=9) # To apply an classifier on this data, we need to flatten the image, to # turn the data in a (samples, feature) matrix: n_samples = len(digits.data) data = digits.data / 16. data -= data.mean(axis=0) # We learn the digits on the first half of the digits data_train, targets_train = data[:n_samples / 2], digits.target[:n_samples / 2] # Now predict the value of the digit on the second half: data_test, targets_test = data[n_samples / 2:], digits.target[n_samples / 2:] # Create a classifier: a support vector classifier kernel_svm = svm.SVC(gamma=.2) linear_svm = svm.LinearSVC() # create pipeline from kernel approximation # and linear svm feature_map_fourier = RBFSampler(gamma=.2, random_state=1) feature_map_nystroem = Nystroem(gamma=.2, random_state=1) fourier_approx_svm = pipeline.Pipeline([("feature_map", feature_map_fourier), ("svm", svm.LinearSVC())]) nystroem_approx_svm = pipeline.Pipeline([("feature_map", feature_map_nystroem), ("svm", svm.LinearSVC())]) # fit and predict using linear and kernel svm: kernel_svm.fit(data_train, targets_train) kernel_svm_score = kernel_svm.score(data_test, targets_test) linear_svm.fit(data_train, targets_train) linear_svm_score = linear_svm.score(data_test, targets_test) sample_sizes = 30 * np.arange(1, 10) fourier_scores = [] nystroem_scores = [] fourier_times = [] nystroem_times = [] for D in sample_sizes: fourier_approx_svm.set_params(feature_map__n_components=D) nystroem_approx_svm.set_params(feature_map__n_components=D) nystroem_approx_svm.fit(data_train, targets_train) fourier_approx_svm.fit(data_train, targets_train) fourier_score = fourier_approx_svm.score(data_test, targets_test) nystroem_score = nystroem_approx_svm.score(data_test, targets_test) nystroem_scores.append(nystroem_score) fourier_scores.append(fourier_score) # plot the results: accuracy = plt.figure(figsize=(8, 8)) accuracy = plt.subplot() accuracy.plot(sample_sizes, nystroem_scores, label="Nystroem approx. kernel") accuracy.plot(sample_sizes, fourier_scores, label="Fourier approx. kernel") accuracy.plot([sample_sizes[0], sample_sizes[-1]], [linear_svm_score, linear_svm_score], label="linear svm") accuracy.plot([sample_sizes[0], sample_sizes[-1]], [kernel_svm_score, kernel_svm_score], label="rbf svm") # legends and labels accuracy.set_title("Classification accuracy") accuracy.set_xlim(sample_sizes[0], sample_sizes[-1]) accuracy.set_ylim(np.min(fourier_scores), 1) accuracy.set_ylabel("Classification accuracy") accuracy.legend(loc='best')
Take a look at score plot of different techniques, the more features we use the closer the approximation to original kernel. And since the whole point of approximation was to use it on large number of features, this is exactly what we want to achieve.
Conclusion
SVM kernel approximation can dramatically speed up the prediction process with minimal accuracy loss, enabling the usage of SVM in production resource limited systems.