You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Supervised learning is the most common and widely used type of machine learning. In supervised learning, the algorithm is trained on labelled data — a dataset where each example has both input features and a known correct output (the label). The goal is to learn a mapping function that can predict the output for new, unseen inputs.
The supervised learning process follows these steps:
from sklearn.model_selection import train_test_split
# X = features, y = labels
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model.fit(X_train, y_train) # Train
predictions = model.predict(X_test) # Predict
Classification is the task of predicting a discrete category or class label. The output belongs to one of a finite set of classes.
The output has exactly two classes — e.g., spam/not-spam, positive/negative, yes/no.
| Example Problem | Class 0 | Class 1 |
|---|---|---|
| Email filtering | Not spam | Spam |
| Medical diagnosis | Healthy | Diseased |
| Fraud detection | Legitimate | Fraudulent |
| Sentiment analysis | Negative | Positive |
The output has three or more classes — e.g., classifying images of animals into cat, dog, bird, etc.
| Algorithm | Strengths | Weaknesses |
|---|---|---|
| Logistic Regression | Simple, interpretable, fast | Assumes linear decision boundary |
| Decision Trees | Interpretable, handles non-linear data | Prone to overfitting |
| Random Forest | Robust, handles missing data | Less interpretable, slower |
| Support Vector Machine | Effective in high dimensions | Slow on large datasets |
| K-Nearest Neighbours | Simple, no training phase | Slow at prediction time, sensitive to scale |
| Naive Bayes | Fast, works well with text data | Assumes feature independence |
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
# Load dataset
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42
)
# Train a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
# Evaluate
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
| Metric | Description |
|---|---|
| Accuracy | Proportion of correct predictions out of all predictions |
| Precision | Of all positive predictions, how many were actually positive |
| Recall (Sensitivity) | Of all actual positives, how many were correctly predicted |
| F1 Score | Harmonic mean of precision and recall |
| ROC AUC | Area under the Receiver Operating Characteristic curve |
Regression is the task of predicting a continuous numerical value. The output can be any real number.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.