Neural Network Fundamentals

This lesson covers the foundational building blocks of neural networks — from the single neuron (perceptron) through multi-layer architectures, activation functions, and the backpropagation algorithm that makes learning possible.

The Perceptron

The perceptron is the simplest form of a neural network — a single artificial neuron that takes multiple inputs, multiplies each by a weight, sums them, adds a bias, and passes the result through an activation function.

Mathematical Formulation

output = activation(w1*x1 + w2*x2 + ... + wn*xn + b)

Where:

x1, x2, ..., xn are the input features
w1, w2, ..., wn are the learnable weights
b is the bias term
activation is the activation function

Perceptron in Python

import numpy as np

class Perceptron:
    def __init__(self, n_features, lr=0.01):
        self.weights = np.zeros(n_features)
        self.bias = 0.0
        self.lr = lr

    def predict(self, x):
        z = np.dot(x, self.weights) + self.bias
        return 1 if z >= 0 else 0

    def train(self, X, y, epochs=100):
        for _ in range(epochs):
            for xi, yi in zip(X, y):
                pred = self.predict(xi)
                error = yi - pred
                self.weights += self.lr * error * xi
                self.bias += self.lr * error

Limitations of the Perceptron

The perceptron can only learn linearly separable functions. It famously cannot solve the XOR problem, which motivated the development of multi-layer networks.

Problem	Linearly Separable?	Perceptron Can Solve?
AND	Yes	Yes
OR	Yes	Yes
XOR	No	No
NAND	Yes	Yes

Multi-Layer Perceptron (MLP)

A Multi-Layer Perceptron extends the single perceptron by stacking neurons into layers. An MLP has:

Input layer — receives the raw features
One or more hidden layers — transform the data through non-linear activations
Output layer — produces the final prediction

The addition of hidden layers with non-linear activation functions allows MLPs to learn any continuous function (Universal Approximation Theorem).

MLP Architecture

Input Layer     Hidden Layer 1     Hidden Layer 2     Output Layer
  [x1] ──────►  [h1]  ──────────►  [h4]  ──────────►  [y1]
  [x2] ──────►  [h2]  ──────────►  [h5]  ──────────►  [y2]
  [x3] ──────►  [h3]  ──────────►  [h6]

Activation Functions

Activation functions introduce non-linearity into the network. Without them, stacking layers would produce only a linear transformation, no matter how many layers you add.

Common Activation Functions

Function	Formula	Range	Typical Use
Sigmoid	1 / (1 + e^(-x))	(0, 1)	Binary classification output
Tanh	(e^x - e^(-x)) / (e^x + e^(-x))	(-1, 1)	Hidden layers (older networks)
ReLU	max(0, x)	[0, infinity)	Default hidden layer activation
Leaky ReLU	x if x > 0, else alpha * x	(-infinity, infinity)	Avoids "dying ReLU" problem
ELU	x if x > 0, else alpha * (e^x - 1)	(-alpha, infinity)	Smooth alternative to ReLU
GELU	x * Phi(x)	(-0.17, infinity)	Transformer hidden layers
Softmax	e^(xi) / sum(e^(xj))	(0, 1) per class	Multi-class output layer

Choosing an Activation Function

Tip: For most hidden layers, start with ReLU. Use Softmax for multi-class outputs and Sigmoid for binary outputs. If you observe dead neurons, try Leaky ReLU or ELU.

Activation Functions in PyTorch

import torch.nn as nn

# Common activation functions
relu = nn.ReLU()
sigmoid = nn.Sigmoid()
tanh = nn.Tanh()
leaky_relu = nn.LeakyReLU(negative_slope=0.01)
softmax = nn.Softmax(dim=1)
gelu = nn.GELU()

Forward Pass

The forward pass is the process of computing the output of the network given an input. Data flows from the input layer through each hidden layer to the output layer.

For a single hidden layer:

z1 = W1 * x + b1        # Linear transformation
a1 = activation(z1)     # Non-linear activation
z2 = W2 * a1 + b2       # Linear transformation
output = activation(z2) # Final activation

import torch
import torch.nn as nn

class TwoLayerNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        z1 = self.fc1(x)
        a1 = self.relu(z1)
        output = self.fc2(a1)
        return output

Neural Network Fundamentals

Neural Network Fundamentals

The Perceptron

Mathematical Formulation

Perceptron in Python

Limitations of the Perceptron

Multi-Layer Perceptron (MLP)

MLP Architecture

Activation Functions

Common Activation Functions

Choosing an Activation Function

Activation Functions in PyTorch

Forward Pass

Loss Functions

More in Data Science