Cloud Run Services & Jobs

Cloud Run offers two distinct workload types: services for handling HTTP requests and events, and jobs for running containers to completion. Understanding when and how to use each is essential for building effective serverless architectures on Google Cloud.

Cloud Run Services in Depth

A Cloud Run service is a long-running process that listens on a network port and responds to incoming HTTP requests. Each service gets a stable HTTPS URL, automatic TLS certificate management, and built-in load balancing.

Revisions

Every time you deploy a new container image or change a configuration parameter (environment variables, memory, CPU, concurrency), Cloud Run creates a new revision. Revisions are immutable snapshots of your service configuration.

Concept	Description
Revision	An immutable snapshot of the service's container image and configuration
Traffic splitting	Route percentages of traffic to different revisions
Rollback	Route 100% of traffic back to a previous revision

# Deploy a new revision
gcloud run deploy my-service \
  --image europe-west2-docker.pkg.dev/my-project/repo/app:v2 \
  --region europe-west2

# Split traffic: 90% to latest, 10% to previous revision
gcloud run services update-traffic my-service \
  --to-revisions my-service-00001=10,LATEST=90 \
  --region europe-west2

# Rollback to a specific revision
gcloud run services update-traffic my-service \
  --to-revisions my-service-00001=100 \
  --region europe-west2

Concurrency and Request Handling

Each Cloud Run instance can handle multiple concurrent requests. The concurrency setting controls how many requests a single container instance processes simultaneously.

Setting	Description	Default
concurrency	Max concurrent requests per instance	80
max-instances	Maximum number of container instances	100
min-instances	Minimum warm instances (0 = scale to zero)	0
timeout	Maximum time for a single request	300 seconds

Setting concurrency too high can overwhelm a single instance, while setting it too low wastes resources. Profile your application to find the right balance.

CPU Allocation

Cloud Run offers two CPU allocation modes:

CPU allocated only during request processing — CPU is throttled between requests. This is cheaper and suits request/response workloads.
CPU always allocated — CPU is never throttled. Required for background work, WebSocket connections, and long-running processes.

# Always allocate CPU
gcloud run deploy my-service \
  --cpu-throttling=false \
  --region europe-west2

Cloud Run Services & Jobs

Cloud Run Services & Jobs

Cloud Run Services in Depth

Revisions

Concurrency and Request Handling

CPU Allocation

Cloud Run Jobs in Depth

More in Cloud