Working with Databases in Python

Databases are at the heart of data engineering. Whether you are extracting from a source database or loading into a data warehouse, you need to interact with SQL databases efficiently and safely from Python. This lesson covers SQLAlchemy, connection pooling, ORM vs raw SQL, migrations, and bulk operations.

SQLAlchemy: The Standard

SQLAlchemy is the most popular database toolkit for Python. It provides two layers:

Layer	Description	When to Use
Core	SQL expression language	Data engineering, ETL
ORM	Object-Relational Mapping	Application development

Installing

pip install sqlalchemy psycopg2-binary  # PostgreSQL
pip install sqlalchemy pymysql          # MySQL

Creating a Connection

from sqlalchemy import create_engine

# PostgreSQL
engine = create_engine(
    "postgresql://user:password@localhost:5432/mydb",
    pool_size=5,           # Number of connections to keep open
    max_overflow=10,       # Extra connections allowed beyond pool_size
    pool_timeout=30,       # Seconds to wait for a connection
    pool_recycle=1800,     # Recycle connections after 30 minutes
    echo=False,            # Set True to log all SQL
)

# Test the connection
with engine.connect() as conn:
    result = conn.execute("SELECT 1")
    print(result.scalar())  # 1

Connection URL Formats

Database	URL Format
PostgreSQL	`postgresql://user:pass@host:5432/db`
MySQL	`mysql+pymysql://user:pass@host:3306/db`
SQLite	`sqlite:///path/to/database.db`

Connection Pooling

Connection pooling reuses database connections instead of creating a new one for every query.

graph TD
  subgraph without ["Without Pooling"]
    R1["Request"] --> DB1["DB (new conn)"]
    R2["Request"] --> DB2["DB (new conn)"]
  end
  subgraph with_pool ["With Pooling"]
    R3["Request"] --> POOL["Pool"]
    POOL --> DB3["DB"]
    R4["Request"] --> POOL
  end

SQLAlchemy manages a connection pool automatically when you use create_engine().

Executing Queries

Using SQLAlchemy Core (Text)

from sqlalchemy import text

# Simple query
with engine.connect() as conn:
    result = conn.execute(text("SELECT * FROM customers WHERE status = :status"), {"status": "active"})
    rows = result.fetchall()
    for row in rows:
        print(row.name, row.email)

# Always use parameterised queries to prevent SQL injection!
# NEVER do this:
# conn.execute(f"SELECT * FROM customers WHERE status = '{status}'")  # DANGEROUS

Using SQLAlchemy Core (Expression Language)

from sqlalchemy import MetaData, Table, select

metadata = MetaData()
customers = Table("customers", metadata, autoload_with=engine)

# Build a query programmatically
query = (
    select(customers.c.name, customers.c.email)
    .where(customers.c.status == "active")
    .order_by(customers.c.name)
    .limit(100)
)

with engine.connect() as conn:
    result = conn.execute(query)
    rows = result.fetchall()

Working with Databases in Python

Working with Databases in Python

SQLAlchemy: The Standard

Installing

Creating a Connection

Connection URL Formats

Connection Pooling

Executing Queries

Using SQLAlchemy Core (Text)

Using SQLAlchemy Core (Expression Language)

ORM vs Raw SQL

When to Use Each

More in Programming