Support Vector Machine

Submitted by: Submitted by

Views: 259

Words: 1714

Pages: 7

Category: Literature

Date Submitted: 06/14/2013 11:24 AM

Report This Essay

SVM Example

Dan Ventura March 12, 2009

Abstract We try to give a helpful simple example that demonstrates a linear SVM and then extend the example to a simple non-linear case to illustrate the use of mapping functions and kernels.

1

Introduction

Many learning models make use of the idea that any learning problem can be made easy with the right set of features. The trick, of course, is discovering that “right set of features”, which in general is a very difficult thing to do. SVMs are another attempt at a model that does this. The idea behind SVMs is to make use of a (nonlinear) mapping function Φ that transforms data in input space to data in feature space in such a way as to render a problem linearly separable. The SVM then automatically discovers the optimal separating hyperplane (which, when mapped back into input space via Φ−1 , can be a complex decision surface). SVMs are rather interesting in that they enjoy both a sound theoretical basis as well as state-of-the-art success in real-world applications. To illustrate the basic ideas, we will begin with a linear SVM (that is, a model that assumes the data is linearly separable). We will then expand the example to the nonlinear case to demonstrate the role of the mapping function Φ, and finally we will explain the idea of a kernel and how it allows SVMs to make use of high-dimensional feature spaces while remaining tractable.

2

Linear Example – when Φ is trivial

2

Suppose we are given the following positively labeled data points in 3 1 , 3 −1 , 6 1 , 6 −1

2

:

and the following negatively labeled data points in 1 0 , 0 1 , 0 −1 1 ,

(see Figure 1):

−1 0

Figure 1: Sample data points in 2 . Blue diamonds are positive examples and red squares are negative examples. We would like to discover a simple SVM that accurately discriminates the two classes. Since the data is linearly separable, we can use a linear SVM (that is, one whose mapping function Φ() is the identity function). By...