Working with AI APIs

AI APIs are the primary interface for integrating large language models into your applications. This lesson covers how to make API calls using both REST and SDKs, understand the message format, stream responses, and handle errors gracefully.

The Chat Completions Pattern

Most modern LLM APIs use a chat completions interface. You send a list of messages (conversation history) and receive a model-generated response.

Message Roles

Role	Purpose
`system`	Sets the model's behaviour, persona, or instructions
`user`	The human user's input
`assistant`	The model's previous responses (for context)

Basic Request Structure

messages = [
    {"role": "system", "content": "You are a concise coding assistant."},
    {"role": "user", "content": "Write a Python function to reverse a string."},
]

Using the OpenAI SDK

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    temperature=0.7,
    max_tokens=256,
)

answer = response.choices[0].message.content
print(answer)

Key Parameters

Parameter	Description	Typical Value
`model`	Which model to use	`gpt-4o-mini`
`temperature`	Randomness (0 = deterministic, 2 = very creative)	0.0–1.0
`max_tokens`	Maximum length of the response	256–4096
`top_p`	Nucleus sampling (alternative to temperature)	0.9–1.0

Using the Anthropic SDK

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
    ],
)

print(message.content[0].text)

Note: Anthropic's API uses a system parameter rather than a system message in the messages array.

Streaming Responses

For long responses, streaming delivers tokens as they are generated, improving perceived latency.

OpenAI Streaming

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain quantum computing."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Anthropic Streaming

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Making Raw REST Calls

You can also use the APIs directly via HTTP if you prefer not to use an SDK:

import requests
import os

Working with AI APIs

Working with AI APIs

The Chat Completions Pattern

Message Roles

Basic Request Structure

Using the OpenAI SDK

Key Parameters

Using the Anthropic SDK

Streaming Responses

OpenAI Streaming

Anthropic Streaming

Making Raw REST Calls

More in AI