You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Alibaba Cloud offers a comprehensive suite of big data and analytics services. At the centre is MaxCompute (formerly ODPS), a fully managed data warehousing solution that can process massive datasets.
MaxCompute is a serverless, distributed data warehouse that handles petabyte-scale data processing. It is Alibaba Cloud's answer to AWS Redshift, Google BigQuery, or Azure Synapse.
| Capability | Description |
|---|---|
| SQL queries | Ad-hoc and scheduled analytical queries |
| ETL processing | Transform and clean large datasets |
| Machine learning | Built-in ML algorithms via PAI integration |
| Data storage | Store structured and semi-structured data |
| Graph computation | Process graph-based data models |
graph LR
subgraph DS["Data Sources"]
OSS["OSS"]
RDS["RDS"]
Log["Log Svc"]
Kafka["Kafka"]
end
subgraph MC["MaxCompute"]
Tables["Tables"]
Eng["SQL Engine / MapReduce / Spark"]
end
subgraph CN["Consumers"]
BI["BI Tools"]
Rep["Reports"]
DataV["DataV"]
PAI["PAI (ML)"]
end
OSS --> MC
RDS --> MC
Log --> MC
Kafka --> MC
MC --> BI
MC --> Rep
MC --> DataV
MC --> PAI
A project is the basic unit of organisation in MaxCompute:
Tables in MaxCompute are similar to relational database tables:
Partitions divide large tables into smaller segments for efficient querying:
-- Create a partitioned table
CREATE TABLE user_events (
user_id STRING,
event_type STRING,
event_data STRING
)
PARTITIONED BY (dt STRING, region STRING);
-- Query only one partition
SELECT * FROM user_events WHERE dt = '2024-01-15' AND region = 'cn';
Partitioning dramatically reduces the amount of data scanned, lowering both cost and query time.
DataWorks is the integrated data development platform for MaxCompute. It provides:
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.