Wednesday, January 27, 2016

SVM kernel approximation with Python


Introduction

A high-performance SVM classifier will likely need thousands of support vectors, and the resulting high complexity of classification prevents their use in many practical applications, with large numbers of training samples or large numbers of features in the input space. To handle this, several approximations to the RBF kernel (and similar kernels) have been devised. Typically, these take the form of a function z that maps a single vector to a vector of higher dimensionality, approximating the kernel.

where Phi is the implicit mapping embedded in the RBF kernel.

Implementation

Scikit-learn has already implemented Fourier transform and Nystroem approximation techniques using RBFSampler and Nystroem classes accordingly.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, svm, pipeline
from sklearn.kernel_approximation import (RBFSampler, Nystroem)
# The digits dataset
digits = datasets.load_digits(n_class=9)

# To apply an classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.data)
data = digits.data / 16.
data -= data.mean(axis=0)

# We learn the digits on the first half of the digits
data_train, targets_train = data[:n_samples / 2], digits.target[:n_samples / 2]

# Now predict the value of the digit on the second half:
data_test, targets_test = data[n_samples / 2:], digits.target[n_samples / 2:]

# Create a classifier: a support vector classifier
kernel_svm = svm.SVC(gamma=.2)
linear_svm = svm.LinearSVC()

# create pipeline from kernel approximation
# and linear svm
feature_map_fourier = RBFSampler(gamma=.2, random_state=1)
feature_map_nystroem = Nystroem(gamma=.2, random_state=1)
fourier_approx_svm = pipeline.Pipeline([("feature_map", feature_map_fourier),
                                        ("svm", svm.LinearSVC())])

nystroem_approx_svm = pipeline.Pipeline([("feature_map", feature_map_nystroem),
                                        ("svm", svm.LinearSVC())])

# fit and predict using linear and kernel svm:
kernel_svm.fit(data_train, targets_train)
kernel_svm_score = kernel_svm.score(data_test, targets_test)

linear_svm.fit(data_train, targets_train)
linear_svm_score = linear_svm.score(data_test, targets_test)

sample_sizes = 30 * np.arange(1, 10)
fourier_scores = []
nystroem_scores = []
fourier_times = []
nystroem_times = []

for D in sample_sizes:
    fourier_approx_svm.set_params(feature_map__n_components=D)
    nystroem_approx_svm.set_params(feature_map__n_components=D)
    
    nystroem_approx_svm.fit(data_train, targets_train)
    fourier_approx_svm.fit(data_train, targets_train)

    fourier_score = fourier_approx_svm.score(data_test, targets_test)
    nystroem_score = nystroem_approx_svm.score(data_test, targets_test)
    nystroem_scores.append(nystroem_score)
    fourier_scores.append(fourier_score)

# plot the results:
accuracy = plt.figure(figsize=(8, 8))
accuracy = plt.subplot()
accuracy.plot(sample_sizes, nystroem_scores, label="Nystroem approx. kernel")
accuracy.plot(sample_sizes, fourier_scores, label="Fourier approx. kernel")
accuracy.plot([sample_sizes[0], sample_sizes[-1]],
              [linear_svm_score, linear_svm_score], label="linear svm")
accuracy.plot([sample_sizes[0], sample_sizes[-1]],
              [kernel_svm_score, kernel_svm_score], label="rbf svm")

# legends and labels
accuracy.set_title("Classification accuracy")
accuracy.set_xlim(sample_sizes[0], sample_sizes[-1])
accuracy.set_ylim(np.min(fourier_scores), 1)
accuracy.set_ylabel("Classification accuracy")
accuracy.legend(loc='best')

Take a look at score plot of different techniques, the more features we use the closer the approximation to original kernel. And since the whole point of approximation was to use it on large number of features, this is exactly what we want to achieve.

Conclusion

SVM kernel approximation can dramatically speed up the prediction process with minimal accuracy loss, enabling the usage of SVM in production resource limited systems.