Linear and Logistic Regression

Linear regression and logistic regression are two of the most fundamental algorithms in machine learning. Despite the shared name, they solve different tasks: linear regression predicts continuous values (regression), while logistic regression predicts discrete categories (classification).

Linear Regression

Linear regression models the relationship between one or more input features and a continuous target variable by fitting a straight line (or hyperplane in multiple dimensions) through the data.

Simple Linear Regression

With one feature, the model is:

y = w * x + b

Where:

y is the predicted value
x is the input feature
w (weight/slope) is the coefficient
b (bias/intercept) is the y-intercept

Multiple Linear Regression

With multiple features, the model becomes:

y = w1 * x1 + w2 * x2 + ... + wn * xn + b

How Linear Regression Learns

Linear regression finds the line of best fit by minimising the sum of squared residuals (Ordinary Least Squares — OLS).

Concept	Description
Residual	Difference between the actual value and the predicted value
OLS	Minimises the sum of squared residuals
Normal Equation	Closed-form mathematical solution
Gradient Descent	Iterative optimisation — updates weights to reduce error step by step

Linear Regression in Python

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import fetch_california_housing
import numpy as np

# Load data
data = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)

# Train
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.4f}")
print(f"R2: {r2_score(y_test, y_pred):.4f}")

# Coefficients
for name, coef in zip(data.feature_names, model.coef_):
    print(f"  {name}: {coef:.4f}")
print(f"  Intercept: {model.intercept_:.4f}")

Assumptions of Linear Regression

Assumption	Description
Linearity	The relationship between features and target is linear
Independence	Observations are independent of each other
Homoscedasticity	Constant variance of residuals
Normality	Residuals are normally distributed
No multicollinearity	Features are not highly correlated with each other

Tip: Violations of these assumptions can lead to unreliable coefficient estimates. Always check residual plots.

Regularisation

When features are numerous or correlated, linear regression can overfit. Regularisation adds a penalty to the loss function to constrain the weights.

Ridge Regression (L2 Regularisation)

Adds the sum of squared weights as a penalty. Shrinks coefficients towards zero but never sets them exactly to zero.

from sklearn.linear_model import Ridge

ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

Lasso Regression (L1 Regularisation)

Adds the sum of absolute weights as a penalty. Can drive coefficients to exactly zero — effectively performing feature selection.

from sklearn.linear_model import Lasso

lasso = Lasso(alpha=0.01)
lasso.fit(X_train, y_train)

# Features with zero coefficients are effectively removed
n_used = np.sum(lasso.coef_ != 0)
print(f"Features used: {n_used} / {X_train.shape[1]}")

Linear and Logistic Regression

Linear and Logistic Regression

Linear Regression

Simple Linear Regression

Multiple Linear Regression

How Linear Regression Learns

Linear Regression in Python

Assumptions of Linear Regression

Regularisation

Ridge Regression (L2 Regularisation)

Lasso Regression (L1 Regularisation)

Elastic Net

More in Data Science