You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Unsupervised learning is a type of machine learning where the algorithm learns from unlabelled data — there are no predefined correct answers. The goal is to discover hidden patterns, structures, or groupings in the data without human guidance on what the output should be.
| Aspect | Supervised Learning | Unsupervised Learning |
|---|---|---|
| Data | Labelled (X, y) | Unlabelled (X only) |
| Goal | Predict known outputs | Discover hidden structure |
| Evaluation | Compare predictions to known labels | Domain knowledge, visual inspection, internal metrics |
| Examples | Spam detection, price prediction | Customer segmentation, anomaly detection |
Clustering groups similar data points together based on their features. Points within the same cluster are more similar to each other than to points in other clusters.
K-Means is the most popular clustering algorithm. It partitions data into k clusters by iteratively assigning points to the nearest cluster centre and updating the centres.
How K-Means works:
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Generate sample data
X, _ = make_blobs(n_samples=300, centers=4, random_state=42)
# Fit K-Means
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
kmeans.fit(X)
# Plot results
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis', s=30)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
c='red', marker='X', s=200, label='Centroids')
plt.legend()
plt.title('K-Means Clustering')
plt.show()
inertias = []
K_range = range(1, 11)
for k in K_range:
km = KMeans(n_clusters=k, random_state=42, n_init=10)
km.fit(X)
inertias.append(km.inertia_)
plt.plot(K_range, inertias, 'bo-')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Inertia')
plt.title('Elbow Method')
plt.show()
The "elbow" in the plot — where adding more clusters stops significantly reducing inertia — suggests the optimal k.
| Algorithm | Strengths | Weaknesses |
|---|---|---|
| K-Means | Fast, scalable, simple | Requires choosing k, assumes spherical clusters |
| DBSCAN | Finds arbitrarily shaped clusters, detects outliers | Sensitive to density parameters |
| Hierarchical | Produces a dendrogram, no need to prespecify k | Slow on large datasets |
| Gaussian Mixture | Soft clustering (probabilities), flexible shapes | Assumes Gaussian distributions |
| Mean Shift | No need to specify k, finds arbitrary shapes | Computationally expensive |
from sklearn.cluster import DBSCAN
# DBSCAN does not require specifying k
dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X)
# Label -1 indicates noise / outlier points
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
n_noise = list(labels).count(-1)
print(f"Clusters found: {n_clusters}, Noise points: {n_noise}")
Dimensionality reduction transforms high-dimensional data into a lower-dimensional representation while preserving as much meaningful structure as possible.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.