You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
This lesson covers the foundational building blocks of neural networks — from the single neuron (perceptron) through multi-layer architectures, activation functions, and the backpropagation algorithm that makes learning possible.
The perceptron is the simplest form of a neural network — a single artificial neuron that takes multiple inputs, multiplies each by a weight, sums them, adds a bias, and passes the result through an activation function.
output = activation(w1*x1 + w2*x2 + ... + wn*xn + b)
Where:
x1, x2, ..., xn are the input featuresw1, w2, ..., wn are the learnable weightsb is the bias termactivation is the activation functionimport numpy as np
class Perceptron:
def __init__(self, n_features, lr=0.01):
self.weights = np.zeros(n_features)
self.bias = 0.0
self.lr = lr
def predict(self, x):
z = np.dot(x, self.weights) + self.bias
return 1 if z >= 0 else 0
def train(self, X, y, epochs=100):
for _ in range(epochs):
for xi, yi in zip(X, y):
pred = self.predict(xi)
error = yi - pred
self.weights += self.lr * error * xi
self.bias += self.lr * error
The perceptron can only learn linearly separable functions. It famously cannot solve the XOR problem, which motivated the development of multi-layer networks.
| Problem | Linearly Separable? | Perceptron Can Solve? |
|---|---|---|
| AND | Yes | Yes |
| OR | Yes | Yes |
| XOR | No | No |
| NAND | Yes | Yes |
A Multi-Layer Perceptron extends the single perceptron by stacking neurons into layers. An MLP has:
The addition of hidden layers with non-linear activation functions allows MLPs to learn any continuous function (Universal Approximation Theorem).
Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer
[x1] ──────► [h1] ──────────► [h4] ──────────► [y1]
[x2] ──────► [h2] ──────────► [h5] ──────────► [y2]
[x3] ──────► [h3] ──────────► [h6]
Activation functions introduce non-linearity into the network. Without them, stacking layers would produce only a linear transformation, no matter how many layers you add.
| Function | Formula | Range | Typical Use |
|---|---|---|---|
| Sigmoid | 1 / (1 + e^(-x)) | (0, 1) | Binary classification output |
| Tanh | (e^x - e^(-x)) / (e^x + e^(-x)) | (-1, 1) | Hidden layers (older networks) |
| ReLU | max(0, x) | [0, infinity) | Default hidden layer activation |
| Leaky ReLU | x if x > 0, else alpha * x | (-infinity, infinity) | Avoids "dying ReLU" problem |
| ELU | x if x > 0, else alpha * (e^x - 1) | (-alpha, infinity) | Smooth alternative to ReLU |
| GELU | x * Phi(x) | (-0.17, infinity) | Transformer hidden layers |
| Softmax | e^(xi) / sum(e^(xj)) | (0, 1) per class | Multi-class output layer |
Tip: For most hidden layers, start with ReLU. Use Softmax for multi-class outputs and Sigmoid for binary outputs. If you observe dead neurons, try Leaky ReLU or ELU.
import torch.nn as nn
# Common activation functions
relu = nn.ReLU()
sigmoid = nn.Sigmoid()
tanh = nn.Tanh()
leaky_relu = nn.LeakyReLU(negative_slope=0.01)
softmax = nn.Softmax(dim=1)
gelu = nn.GELU()
The forward pass is the process of computing the output of the network given an input. Data flows from the input layer through each hidden layer to the output layer.
For a single hidden layer:
z1 = W1 * x + b1 # Linear transformation
a1 = activation(z1) # Non-linear activation
z2 = W2 * a1 + b2 # Linear transformation
output = activation(z2) # Final activation
import torch
import torch.nn as nn
class TwoLayerNet(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.fc1 = nn.Linear(input_dim, hidden_dim)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
z1 = self.fc1(x)
a1 = self.relu(z1)
output = self.fc2(a1)
return output
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.