You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Data analytics is one of GCP's greatest strengths. Google pioneered technologies like MapReduce, Dremel, and Colossus that revolutionised how the world processes data. BigQuery — GCP's serverless data warehouse — is a direct descendant of these innovations.
BigQuery is a serverless, petabyte-scale data warehouse designed for fast SQL analytics over large datasets.
| Feature | Detail |
|---|---|
| Serverless | No infrastructure to manage — no clusters, no tuning |
| Petabyte scale | Analyse massive datasets in seconds |
| Standard SQL | Full ANSI SQL support |
| Columnar storage | Data stored in columns for fast analytical queries |
| Separation of compute and storage | Scale each independently |
| Built-in ML | Create machine learning models with SQL (BigQuery ML) |
A dataset is a container for tables, views, and routines. Datasets belong to a project and are located in a specific region.
Tables hold your data in rows and columns. BigQuery supports:
| Table Type | Description |
|---|---|
| Native tables | Data stored in BigQuery's columnar format |
| External tables | Query data stored in Cloud Storage, Bigtable, or Drive |
| Views | Saved SQL queries that act as virtual tables |
| Materialised views | Precomputed views for faster queries |
Each table has a schema defining columns and their types:
CREATE TABLE my_dataset.orders (
order_id STRING,
customer_id STRING,
total NUMERIC,
created_at TIMESTAMP
)
PARTITION BY DATE(created_at);
You can load data into BigQuery from multiple sources:
| Source | Method |
|---|---|
| Cloud Storage | Load CSV, JSON, Avro, Parquet, ORC files |
| Streaming | Insert rows in real time via the streaming API |
| Dataflow | ETL pipelines for transformation |
| Transfer Service | Scheduled loads from SaaS (Google Ads, YouTube, etc.) |
| Federated queries | Query external data in place without loading |
BigQuery uses standard SQL:
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total) AS total_spent
FROM my_dataset.orders
WHERE created_at >= '2024-01-01'
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10;
BigQuery processes queries by scanning only the columns referenced in your query (columnar storage). To optimise performance and cost:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.