You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Writing correct SQL is only the first step. Writing fast SQL requires understanding how PostgreSQL executes your queries. This lesson covers execution plans, scan types, join algorithms, statistics, vacuuming, and common anti-patterns.
Every query goes through the planner/optimiser before execution. Use EXPLAIN ANALYZE to see the actual execution plan:
EXPLAIN ANALYZE
SELECT c.name, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.total > 500;
Hash Join (cost=50.00..350.00 rows=200 width=36) (actual time=1.2..5.8 rows=185 loops=1)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0.00..250.00 rows=200 width=20) (actual time=0.01..3.2 rows=200 loops=1)
Filter: (total > 500)
Rows Removed by Filter: 9800
-> Hash (cost=30.00..30.00 rows=1000 width=20) (actual time=0.5..0.5 rows=1000 loops=1)
-> Seq Scan on customers c (cost=0.00..30.00 rows=1000 width=20)
| Field | Meaning |
|---|---|
| cost=start..total | Estimated startup and total cost (arbitrary units) |
| rows | Estimated number of output rows |
| actual time | Real wall-clock time (milliseconds) |
| rows (actual) | Actual number of rows produced |
| loops | Number of times this node was executed |
| Rows Removed by Filter | Rows scanned but discarded by a filter |
Key insight: When estimated rows differ greatly from actual rows, the planner's statistics are stale or misleading.
Reads every row in the table. Used when:
Uses an index to locate specific rows, then fetches them from the table:
Index Scan using idx_orders_customer_id on orders
Index Cond: (customer_id = 123)
All needed columns are in the index — no table access needed:
Index Only Scan using idx_customers_covering on customers
Index Cond: (customer_id = 123)
Heap Fetches: 0
If Heap Fetches is high, the visibility map is stale. Run
VACUUMon the table.
Two-phase approach: build a bitmap of matching row locations from the index, then read the heap in physical order:
Bitmap Heap Scan on orders
Recheck Cond: (status = 'active')
-> Bitmap Index Scan on idx_orders_status
Index Cond: (status = 'active')
Best when the index matches many rows — too many for an index scan, too few for a sequential scan.
PostgreSQL chooses between three join strategies:
Nested Loop
-> Seq Scan on departments
-> Index Scan using idx_emp_dept on employees
Index Cond: (dept_id = departments.id)
Best for small outer tables with indexed inner tables. PostgreSQL may use this more than you expect when indexes are available.
Hash Join
Hash Cond: (orders.customer_id = customers.id)
-> Seq Scan on orders
-> Hash
-> Seq Scan on customers
Builds a hash table from the smaller table, then probes it for each row of the larger table. Best for medium-to-large equi-joins.
Merge Join
Merge Cond: (a.id = b.ref_id)
-> Sort on a
-> Sort on b
Sorts both inputs and merges them. Best when both sides are already sorted (e.g. by index) or for very large datasets.
The planner uses statistics to estimate row counts and choose the best plan:
-- View statistics for a table
SELECT
attname,
n_distinct,
most_common_vals,
most_common_freqs,
correlation
FROM pg_stats
WHERE tablename = 'orders'
AND attname = 'status';
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.