Supervised Learning

Supervised learning is the most common and widely used type of machine learning. In supervised learning, the algorithm is trained on labelled data — a dataset where each example has both input features and a known correct output (the label). The goal is to learn a mapping function that can predict the output for new, unseen inputs.

How Supervised Learning Works

The supervised learning process follows these steps:

Collect labelled data — Each example has features (X) and a known label (y)
Split the data — Divide into training set and test set
Choose an algorithm — Select a model appropriate for the task
Train the model — The algorithm learns patterns from the training data
Evaluate — Test the model on unseen data to measure its performance
Predict — Use the trained model to make predictions on new data

from sklearn.model_selection import train_test_split

# X = features, y = labels
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model.fit(X_train, y_train)       # Train
predictions = model.predict(X_test)  # Predict

Classification

Classification is the task of predicting a discrete category or class label. The output belongs to one of a finite set of classes.

Binary Classification

The output has exactly two classes — e.g., spam/not-spam, positive/negative, yes/no.

Example Problem	Class 0	Class 1
Email filtering	Not spam	Spam
Medical diagnosis	Healthy	Diseased
Fraud detection	Legitimate	Fraudulent
Sentiment analysis	Negative	Positive

Multi-class Classification

The output has three or more classes — e.g., classifying images of animals into cat, dog, bird, etc.

Common Classification Algorithms

Algorithm	Strengths	Weaknesses
Logistic Regression	Simple, interpretable, fast	Assumes linear decision boundary
Decision Trees	Interpretable, handles non-linear data	Prone to overfitting
Random Forest	Robust, handles missing data	Less interpretable, slower
Support Vector Machine	Effective in high dimensions	Slow on large datasets
K-Nearest Neighbours	Simple, no training phase	Slow at prediction time, sensitive to scale
Naive Bayes	Fast, works well with text data	Assumes feature independence

Classification Example

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Load dataset
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)

# Train a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Evaluate
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

Classification Metrics

Metric	Description
Accuracy	Proportion of correct predictions out of all predictions
Precision	Of all positive predictions, how many were actually positive
Recall (Sensitivity)	Of all actual positives, how many were correctly predicted
F1 Score	Harmonic mean of precision and recall
ROC AUC	Area under the Receiver Operating Characteristic curve

Regression

Regression is the task of predicting a continuous numerical value. The output can be any real number.

Supervised Learning

Supervised Learning

How Supervised Learning Works

Classification

Binary Classification

Multi-class Classification

Common Classification Algorithms

Classification Example

Classification Metrics

Regression

Common Regression Problems

More in Data Science