You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Alibaba Cloud offers a comprehensive suite of big data and analytics services. At the centre is MaxCompute (formerly ODPS), a fully managed data warehousing solution that can process massive datasets.
MaxCompute is a serverless, distributed data warehouse that handles petabyte-scale data processing. It is Alibaba Cloud's answer to AWS Redshift, Google BigQuery, or Azure Synapse.
| Capability | Description |
|---|---|
| SQL queries | Ad-hoc and scheduled analytical queries |
| ETL processing | Transform and clean large datasets |
| Machine learning | Built-in ML algorithms via PAI integration |
| Data storage | Store structured and semi-structured data |
| Graph computation | Process graph-based data models |
Data Sources MaxCompute Consumers
┌──────────┐ ┌──────────────────────────────┐ ┌──────────┐
│ OSS │───▶│ ┌──────┐ ┌──────────────┐ │───▶│ BI Tools │
│ RDS │───▶│ │Tables│ │ SQL Engine │ │───▶│ Reports │
│ Log Svc │───▶│ │ │ │ MapReduce │ │───▶│ DataV │
│ Kafka │───▶│ └──────┘ │ Spark │ │───▶│ PAI (ML) │
└──────────┘ │ └──────────────┘ │ └──────────┘
└──────────────────────────────┘
A project is the basic unit of organisation in MaxCompute:
Tables in MaxCompute are similar to relational database tables:
Partitions divide large tables into smaller segments for efficient querying:
-- Create a partitioned table
CREATE TABLE user_events (
user_id STRING,
event_type STRING,
event_data STRING
)
PARTITIONED BY (dt STRING, region STRING);
-- Query only one partition
SELECT * FROM user_events WHERE dt = '2024-01-15' AND region = 'cn';
Partitioning dramatically reduces the amount of data scanned, lowering both cost and query time.
DataWorks is the integrated data development platform for MaxCompute. It provides:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.