You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Support Vector Machines (SVMs) are powerful supervised learning algorithms used for both classification and regression. SVMs find the optimal decision boundary (hyperplane) that maximises the margin between classes, making them particularly effective for high-dimensional data and problems with clear class separation.
Imagine you have two classes of data points on a 2D plane. There are infinitely many lines that could separate them. An SVM finds the line that maximises the margin — the distance between the decision boundary and the nearest data points from each class.
| Concept | Description |
|---|---|
| Hyperplane | The decision boundary that separates the classes (a line in 2D, a plane in 3D, a hyperplane in higher dimensions) |
| Margin | The distance between the hyperplane and the nearest data points from each class |
| Support Vectors | The data points closest to the hyperplane — they "support" and define the margin |
| Maximum Margin | The goal of SVM is to find the hyperplane with the largest margin |
Tip: Only the support vectors matter for defining the decision boundary. Removing any non-support-vector point from the training set would not change the model.
A linear SVM works when the classes are linearly separable — they can be separated by a straight line (or hyperplane).
| Type | Description | When to Use |
|---|---|---|
| Hard Margin | No misclassifications allowed — every point must be on the correct side | Data is perfectly separable and has no noise |
| Soft Margin | Allows some misclassifications using a penalty parameter C | Real-world data with noise and overlap |
The parameter C controls the trade-off between a wide margin and misclassification:
| C Value | Effect |
|---|---|
| Small C | Wide margin, more misclassifications allowed — simpler model, may underfit |
| Large C | Narrow margin, fewer misclassifications — complex model, may overfit |
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Generate data
X, y = make_classification(n_samples=500, n_features=2,
n_redundant=0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train a linear SVM
svm_linear = SVC(kernel='linear', C=1.0, random_state=42)
svm_linear.fit(X_train, y_train)
# Evaluate
y_pred = svm_linear.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Number of support vectors: {sum(svm_linear.n_support_)}")
Real-world data is often not linearly separable. The kernel trick maps data into a higher-dimensional space where a linear separation becomes possible — without actually computing the transformation.
| Kernel | Description | Use Case |
|---|---|---|
| Linear | No transformation — finds a straight hyperplane | Linearly separable data |
| RBF (Radial Basis Function) | Maps to infinite-dimensional space, creates flexible boundaries | Most common default, general-purpose |
| Polynomial | Maps to polynomial feature space | Structured non-linear patterns |
| Sigmoid | Similar to a neural network activation | Rarely used in practice |
The gamma parameter controls how far the influence of a single training example reaches:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.