Introduction to Neural Networks

Neural networks are machine learning models inspired by the structure of the biological brain. They consist of layers of interconnected neurons (nodes) that learn to transform inputs into outputs by adjusting their internal weights during training. Neural networks are the foundation of deep learning and power many modern AI applications — from image recognition and language translation to game playing and drug discovery.

The Biological Inspiration

A biological neuron receives signals through dendrites, processes them in the cell body, and transmits the output through the axon to other neurons. An artificial neuron mimics this:

Biological	Artificial
Dendrites (inputs)	Input features (x1, x2, ...)
Synaptic weights	Learnable weights (w1, w2, ...)
Cell body (processing)	Weighted sum + activation function
Axon (output)	Output value

The Perceptron

The perceptron is the simplest neural network — a single neuron that computes a weighted sum of inputs and applies a step function:

output = step(w1x1 + w2x2 + ... + wn*xn + b)

Where the step function returns 1 if the sum is positive, 0 otherwise.

import numpy as np

def perceptron(X, weights, bias):
    """A simple perceptron."""
    linear_output = np.dot(X, weights) + bias
    return (linear_output >= 0).astype(int)

# Example: AND gate
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([0, 0, 0, 1])

weights = np.array([0.5, 0.5])
bias = -0.7
predictions = perceptron(X, weights, bias)
print(f"AND gate predictions: {predictions}")

Limitation of the Perceptron

A single perceptron can only learn linearly separable functions. It cannot solve the XOR problem (where the classes are not separable by a straight line). This limitation motivated the development of multi-layer networks.

Multi-Layer Neural Networks

A multi-layer neural network (also called a multi-layer perceptron or MLP) has:

Input layer — Receives the features
Hidden layer(s) — Intermediate layers that learn representations
Output layer — Produces the final prediction

Architecture

Layer	Role	Typical Size
Input	Receives features	Number of features
Hidden	Learns intermediate representations	32, 64, 128, 256, or more neurons
Output	Produces predictions	1 neuron (regression/binary) or n neurons (multi-class)

Forward Pass

Each neuron computes:

Weighted sum: z = w1x1 + w2x2 + ... + b
Activation function: a = f(z)

The output of one layer becomes the input to the next.

Activation Functions

Activation functions introduce non-linearity into the network, enabling it to learn complex patterns.

Function	Formula	Range	Use Case
Sigmoid	1 / (1 + e^(-z))	(0, 1)	Binary classification output
Tanh	(e^z - e^(-z)) / (e^z + e^(-z))	(-1, 1)	Hidden layers (centred output)
ReLU	max(0, z)	[0, infinity)	Most popular for hidden layers
Leaky ReLU	max(0.01z, z)	(-infinity, infinity)	Avoids "dying ReLU" problem
Softmax	e^(zi) / sum(e^(zj))	(0, 1), sums to 1	Multi-class classification output

Tip: ReLU is the default choice for hidden layers. Use sigmoid for binary output and softmax for multi-class output.

Training a Neural Network — Backpropagation

Neural networks learn through backpropagation combined with gradient descent:

Introduction to Neural Networks

Introduction to Neural Networks

The Biological Inspiration

The Perceptron

Limitation of the Perceptron

Multi-Layer Neural Networks

Architecture

Forward Pass

Activation Functions

Training a Neural Network — Backpropagation

More in Data Science