You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Google Cloud Dataflow is a fully managed service for executing Apache Beam data processing pipelines. It handles both batch and streaming workloads, automatically scaling resources to match the volume of data being processed. Dataflow is deeply integrated with Pub/Sub, making it the primary choice for building real-time stream processing pipelines on Google Cloud.
Dataflow is Google Cloud's implementation of the Apache Beam programming model. Apache Beam provides a unified API for defining data processing pipelines that can run on multiple execution engines (runners). Dataflow is one such runner — and it is Google's fully managed, cloud-native runner.
| Characteristic | Description |
|---|---|
| Fully managed | No cluster provisioning, scaling, or maintenance |
| Unified | Same API for batch and streaming processing |
| Autoscaling | Workers scale up and down based on workload |
| Apache Beam | Open-source SDK in Java, Python, and Go |
| Exactly-once | Guarantees exactly-once processing in streaming mode |
| Integrated | Native connectors for Pub/Sub, BigQuery, GCS, Bigtable, and more |
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.