Building Conversational Agents with Memory

A single API call is stateless — the model has no memory of previous interactions. To build conversational agents that remember context, you need explicit memory management. This lesson covers conversation history, windowing strategies, summarisation, and persistent memory stores.

The Memory Problem

LLMs are stateless. Every API call is independent. The model only "remembers" what you include in the messages array.

Call 1: "My name is Alice."       → "Hello, Alice!"
Call 2: "What's my name?"         → "I don't know your name."
                                     (no context from Call 1)

To maintain conversation continuity, you must manage the message history yourself.

Basic Conversation History

The simplest approach: keep all messages in a list.

from openai import OpenAI

client = OpenAI()

class ChatBot:
    def __init__(self, system_prompt: str):
        self.messages = [
            {"role": "system", "content": system_prompt}
        ]

    def chat(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": user_input})

        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=self.messages,
        )

        assistant_msg = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_msg})
        return assistant_msg

bot = ChatBot("You are a helpful maths tutor.")
print(bot.chat("What is 2 + 2?"))      # "4"
print(bot.chat("Now multiply by 10.")) # "40" — remembers previous context

Problem: Context Window Limits

Every model has a maximum context window. As the conversation grows, you will eventually exceed it:

Model	Context Window
GPT-4o mini	128K tokens
GPT-4o	128K tokens
Claude Sonnet	200K tokens
Claude Opus	200K tokens

Even with large windows, longer contexts are slower and more expensive.

Sliding Window Strategy

Keep only the most recent N messages:

class SlidingWindowChat:
    def __init__(self, system_prompt: str, window_size: int = 20):
        self.system_msg = {"role": "system", "content": system_prompt}
        self.messages: list[dict] = []
        self.window_size = window_size

    def chat(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": user_input})

        # Keep only the last N messages (plus system prompt)
        windowed = [self.system_msg] + self.messages[-self.window_size:]

        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=windowed,
        )

        assistant_msg = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_msg})
        return assistant_msg

Trade-offs

Window Size	Pros	Cons
Small (10)	Fast, cheap	Loses context quickly
Medium (30)	Good balance	Moderate cost
Large (100)	Rich context	Expensive, may hit limits

Summarisation Strategy

When the conversation gets long, summarise older messages:

class SummarisedChat:
    def __init__(self, system_prompt: str, max_messages: int = 20):
        self.system_prompt = system_prompt
        self.summary = ""
        self.messages: list[dict] = []
        self.max_messages = max_messages

    def _summarise(self):
        """Summarise old messages and reset the history."""
        old_messages = self.messages[:self.max_messages // 2]
        text = "\n".join(
            f"{m['role']}: {m['content']}" for m in old_messages
        )

Building Conversational Agents with Memory

Building Conversational Agents with Memory

The Memory Problem

Basic Conversation History

Problem: Context Window Limits

Sliding Window Strategy

Trade-offs

Summarisation Strategy

More in AI