Query Optimisation

Writing correct SQL is only the first step. Writing fast SQL requires understanding how PostgreSQL executes your queries. This lesson covers execution plans, scan types, join algorithms, statistics, vacuuming, and common anti-patterns.

Reading Execution Plans

Every query goes through the planner/optimiser before execution. Use EXPLAIN ANALYZE to see the actual execution plan:

EXPLAIN ANALYZE
SELECT c.name, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.total > 500;

Anatomy of a Plan Node

Hash Join  (cost=50.00..350.00 rows=200 width=36) (actual time=1.2..5.8 rows=185 loops=1)
  Hash Cond: (o.customer_id = c.id)
  -> Seq Scan on orders o  (cost=0.00..250.00 rows=200 width=20) (actual time=0.01..3.2 rows=200 loops=1)
       Filter: (total > 500)
       Rows Removed by Filter: 9800
  -> Hash  (cost=30.00..30.00 rows=1000 width=20) (actual time=0.5..0.5 rows=1000 loops=1)
       -> Seq Scan on customers c  (cost=0.00..30.00 rows=1000 width=20)

Field	Meaning
cost=start..total	Estimated startup and total cost (arbitrary units)
rows	Estimated number of output rows
actual time	Real wall-clock time (milliseconds)
rows (actual)	Actual number of rows produced
loops	Number of times this node was executed
Rows Removed by Filter	Rows scanned but discarded by a filter

Key insight: When estimated rows differ greatly from actual rows, the planner's statistics are stale or misleading.

Scan Types

Sequential Scan (Seq Scan)

Reads every row in the table. Used when:

No useful index exists
The query reads a large percentage of the table (index would be slower)

Index Scan

Uses an index to locate specific rows, then fetches them from the table:

Index Scan using idx_orders_customer_id on orders
  Index Cond: (customer_id = 123)

Index Only Scan

All needed columns are in the index — no table access needed:

Index Only Scan using idx_customers_covering on customers
  Index Cond: (customer_id = 123)
  Heap Fetches: 0

If Heap Fetches is high, the visibility map is stale. Run VACUUM on the table.

Bitmap Index Scan

Two-phase approach: build a bitmap of matching row locations from the index, then read the heap in physical order:

Bitmap Heap Scan on orders
  Recheck Cond: (status = 'active')
  -> Bitmap Index Scan on idx_orders_status
       Index Cond: (status = 'active')

Best when the index matches many rows — too many for an index scan, too few for a sequential scan.

Join Algorithms

PostgreSQL chooses between three join strategies:

Nested Loop

Nested Loop
  -> Seq Scan on departments
  -> Index Scan using idx_emp_dept on employees
       Index Cond: (dept_id = departments.id)

Best for small outer tables with indexed inner tables. PostgreSQL may use this more than you expect when indexes are available.

Hash Join

Hash Join
  Hash Cond: (orders.customer_id = customers.id)
  -> Seq Scan on orders
  -> Hash
       -> Seq Scan on customers

Builds a hash table from the smaller table, then probes it for each row of the larger table. Best for medium-to-large equi-joins.

Merge Join

Merge Join
  Merge Cond: (a.id = b.ref_id)
  -> Sort on a
  -> Sort on b

Sorts both inputs and merges them. Best when both sides are already sorted (e.g. by index) or for very large datasets.

Statistics and the Planner

The planner uses statistics to estimate row counts and choose the best plan:

-- View statistics for a table
SELECT
  attname,
  n_distinct,
  most_common_vals,
  most_common_freqs,
  correlation
FROM pg_stats
WHERE tablename = 'orders'
  AND attname = 'status';

Query Optimisation

Query Optimisation

Reading Execution Plans

Anatomy of a Plan Node

Scan Types

Sequential Scan (Seq Scan)

Index Scan

Index Only Scan

Bitmap Index Scan

Join Algorithms

Nested Loop

Hash Join

Merge Join

Statistics and the Planner

More in Database