You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
A single API call is stateless — the model has no memory of previous interactions. To build conversational agents that remember context, you need explicit memory management. This lesson covers conversation history, windowing strategies, summarisation, and persistent memory stores.
LLMs are stateless. Every API call is independent. The model only "remembers" what you include in the messages array.
Call 1: "My name is Alice." → "Hello, Alice!"
Call 2: "What's my name?" → "I don't know your name."
(no context from Call 1)
To maintain conversation continuity, you must manage the message history yourself.
The simplest approach: keep all messages in a list.
from openai import OpenAI
client = OpenAI()
class ChatBot:
def __init__(self, system_prompt: str):
self.messages = [
{"role": "system", "content": system_prompt}
]
def chat(self, user_input: str) -> str:
self.messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=self.messages,
)
assistant_msg = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_msg})
return assistant_msg
bot = ChatBot("You are a helpful maths tutor.")
print(bot.chat("What is 2 + 2?")) # "4"
print(bot.chat("Now multiply by 10.")) # "40" — remembers previous context
Every model has a maximum context window. As the conversation grows, you will eventually exceed it:
| Model | Context Window |
|---|---|
| GPT-4o mini | 128K tokens |
| GPT-4o | 128K tokens |
| Claude Sonnet | 200K tokens |
| Claude Opus | 200K tokens |
Even with large windows, longer contexts are slower and more expensive.
Keep only the most recent N messages:
class SlidingWindowChat:
def __init__(self, system_prompt: str, window_size: int = 20):
self.system_msg = {"role": "system", "content": system_prompt}
self.messages: list[dict] = []
self.window_size = window_size
def chat(self, user_input: str) -> str:
self.messages.append({"role": "user", "content": user_input})
# Keep only the last N messages (plus system prompt)
windowed = [self.system_msg] + self.messages[-self.window_size:]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=windowed,
)
assistant_msg = response.choices[0].message.content
self.messages.append({"role": "assistant", "content": assistant_msg})
return assistant_msg
| Window Size | Pros | Cons |
|---|---|---|
| Small (10) | Fast, cheap | Loses context quickly |
| Medium (30) | Good balance | Moderate cost |
| Large (100) | Rich context | Expensive, may hit limits |
When the conversation gets long, summarise older messages:
class SummarisedChat:
def __init__(self, system_prompt: str, max_messages: int = 20):
self.system_prompt = system_prompt
self.summary = ""
self.messages: list[dict] = []
self.max_messages = max_messages
def _summarise(self):
"""Summarise old messages and reset the history."""
old_messages = self.messages[:self.max_messages // 2]
text = "\n".join(
f"{m['role']}: {m['content']}" for m in old_messages
)
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.