Tokens, Costs & Rate Limits

Understanding tokens is fundamental to building cost-effective AI applications. This lesson explains how tokenisation works, how to count and estimate costs, and how to work within rate limits.

What Are Tokens?

Tokens are the fundamental units that language models process. A token is not the same as a word — it is a subword unit determined by the model's tokeniser.

Examples of Tokenisation

Text	Approximate Tokens
"Hello"	1 token
"Hello, world!"	3 tokens
"Tokenisation"	2–3 tokens
"supercalifragilistic"	5–6 tokens

Rules of thumb:

~4 characters per token in English
~¾ of a word per token
Code tends to use more tokens per character than natural language
Non-English languages often require more tokens per word

How Tokenisers Work

Modern LLMs use Byte Pair Encoding (BPE) or similar subword tokenisation:

Start with individual characters
Iteratively merge the most frequent pairs
Build a vocabulary of common subword units

"unbelievable" → ["un", "believ", "able"]     (3 tokens)
"AI"           → ["AI"]                         (1 token)
"GPT-4"        → ["G", "PT", "-", "4"]          (4 tokens)

Counting Tokens

Using tiktoken (OpenAI)

import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4o-mini")
tokens = encoding.encode("Hello, how are you?")
print(f"Token count: {len(tokens)}")      # 5
print(f"Token IDs:   {tokens}")            # [9906, 11, 1268, 527, 499, 30]

Using the Anthropic Token Counter

import anthropic

client = anthropic.Anthropic()
result = client.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
)
print(f"Input tokens: {result.input_tokens}")

Understanding Costs

LLM pricing is based on tokens processed, split into input (prompt) tokens and output (completion) tokens.

Typical Pricing (illustrative)

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o mini	$0.15	$0.60
GPT-4o	$2.50	$10.00
Claude Haiku	$0.25	$1.25
Claude Sonnet	$3.00	$15.00

Cost Estimation Formula

\text{Total cost} = (\text{input\_tokens} \times \text{input\_price}) + (\text{output\_tokens} \times \text{output\_price})

Example Calculation

def estimate_cost(input_tokens, output_tokens,
                  input_price_per_m=0.15, output_price_per_m=0.60):
    input_cost = (input_tokens / 1_000_000) * input_price_per_m
    output_cost = (output_tokens / 1_000_000) * output_price_per_m
    return input_cost + output_cost

# Example: 1,000 input tokens, 500 output tokens with GPT-4o mini
cost = estimate_cost(1000, 500)
print(f"Estimated cost: ${cost:.6f}")  # $0.000450

Rate Limits

API providers impose rate limits to ensure fair usage and system stability.

Tokens, Costs & Rate Limits

Tokens, Costs & Rate Limits

What Are Tokens?

Examples of Tokenisation

How Tokenisers Work

Counting Tokens

Using tiktoken (OpenAI)

Using the Anthropic Token Counter

Understanding Costs

Typical Pricing (illustrative)

Cost Estimation Formula

Example Calculation

Rate Limits

Types of Rate Limits

More in AI