Support Vector Machines

Support Vector Machines (SVMs) are powerful supervised learning algorithms used for both classification and regression. SVMs find the optimal decision boundary (hyperplane) that maximises the margin between classes, making them particularly effective for high-dimensional data and problems with clear class separation.

The Intuition Behind SVMs

Imagine you have two classes of data points on a 2D plane. There are infinitely many lines that could separate them. An SVM finds the line that maximises the margin — the distance between the decision boundary and the nearest data points from each class.

Key Concepts

Concept	Description
Hyperplane	The decision boundary that separates the classes (a line in 2D, a plane in 3D, a hyperplane in higher dimensions)
Margin	The distance between the hyperplane and the nearest data points from each class
Support Vectors	The data points closest to the hyperplane — they "support" and define the margin
Maximum Margin	The goal of SVM is to find the hyperplane with the largest margin

Tip: Only the support vectors matter for defining the decision boundary. Removing any non-support-vector point from the training set would not change the model.

Linear SVM

A linear SVM works when the classes are linearly separable — they can be separated by a straight line (or hyperplane).

Hard Margin vs Soft Margin

Type	Description	When to Use
Hard Margin	No misclassifications allowed — every point must be on the correct side	Data is perfectly separable and has no noise
Soft Margin	Allows some misclassifications using a penalty parameter C	Real-world data with noise and overlap

The C Parameter

The parameter C controls the trade-off between a wide margin and misclassification:

C Value	Effect
Small C	Wide margin, more misclassifications allowed — simpler model, may underfit
Large C	Narrow margin, fewer misclassifications — complex model, may overfit

Linear SVM in Python

from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Generate data
X, y = make_classification(n_samples=500, n_features=2,
                           n_redundant=0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a linear SVM
svm_linear = SVC(kernel='linear', C=1.0, random_state=42)
svm_linear.fit(X_train, y_train)

# Evaluate
y_pred = svm_linear.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Number of support vectors: {sum(svm_linear.n_support_)}")

The Kernel Trick

Real-world data is often not linearly separable. The kernel trick maps data into a higher-dimensional space where a linear separation becomes possible — without actually computing the transformation.

Common Kernels

Kernel	Description	Use Case
Linear	No transformation — finds a straight hyperplane	Linearly separable data
RBF (Radial Basis Function)	Maps to infinite-dimensional space, creates flexible boundaries	Most common default, general-purpose
Polynomial	Maps to polynomial feature space	Structured non-linear patterns
Sigmoid	Similar to a neural network activation	Rarely used in practice

RBF Kernel — The Gamma Parameter

The gamma parameter controls how far the influence of a single training example reaches:

Support Vector Machines

Support Vector Machines

The Intuition Behind SVMs

Key Concepts

Linear SVM

Hard Margin vs Soft Margin

The C Parameter

Linear SVM in Python

The Kernel Trick

Common Kernels

RBF Kernel — The Gamma Parameter

More in Data Science