You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service. It enables you to run complex analytical queries against structured and semi-structured data using standard SQL. Redshift is optimised for online analytical processing (OLAP) — aggregating, joining, and summarising billions of rows — rather than the transactional (OLTP) workloads served by RDS or Aurora.
| Characteristic | OLTP (RDS, Aurora, DynamoDB) | OLAP (Redshift) |
|---|---|---|
| Query type | Short, simple (INSERT, SELECT by key) | Long, complex (GROUP BY, JOIN, window functions) |
| Data volume | Gigabytes to low terabytes | Terabytes to petabytes |
| Users | Application servers (thousands of concurrent transactions) | Analysts, BI tools (tens to hundreds of concurrent queries) |
| Schema design | Normalised (3NF) | Denormalised (star / snowflake schema) |
| Optimised for | Low-latency reads and writes | High-throughput scans and aggregations |
A Redshift cluster consists of:
┌──────────────────────────────────────┐
│ Leader Node │
│ (query parsing, planning, agg) │
└──────────┬──────────┬────────────────┘
│ │
┌──────┴───┐ ┌────┴─────┐
│ Compute │ │ Compute │
│ Node 1 │ │ Node 2 │
│ Slice A │ │ Slice C │
│ Slice B │ │ Slice D │
└──────────┘ └──────────┘
| Type | Family | Storage | Use Case |
|---|---|---|---|
| RA3 | ra3.xlplus, ra3.4xlarge, ra3.16xlarge | Managed storage (S3-backed) | Most workloads; scales compute and storage independently |
| DC2 | dc2.large, dc2.8xlarge | Local SSD | Workloads < 1 TB; lowest cost for small datasets |
| DS2 | ds2.xlarge, ds2.8xlarge | Local HDD | Legacy; replaced by RA3 |
Recommendation: Use RA3 nodes for new clusters. They decouple compute from storage, so you pay for compute and storage independently. Hot data is cached on local SSD; cold data lives in S3.
Unlike row-based databases (MySQL, PostgreSQL), Redshift stores data column by column. This architecture is ideal for analytics:
| Encoding | Best For |
|---|---|
| AZ64 | Numeric and date/time columns (default for these types) |
| LZO | Long strings, VARCHAR |
| ZSTD | General purpose, good compression ratio |
| RAW | No compression; rarely used |
| BYTEDICT | Low-cardinality strings (e.g., country codes) |
Redshift automatically chooses optimal encoding when you use COPY to load data.
When a query joins two large tables, Redshift must move data between nodes to perform the join. The distribution style determines how data is spread across nodes:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.