What Large Language Models Actually Do

You have probably used ChatGPT, Claude, Gemini, or another AI chatbot. You type something in, and it writes back something that sounds remarkably human. It can write essays, explain quantum physics, translate languages, debug code, and compose poetry — often in seconds.

But what is actually happening when you talk to an AI? How does it "know" things? Does it think? Does it understand?

If you want to become good at prompt engineering — the skill of getting AI to produce useful, accurate output — you need a basic understanding of what is going on under the hood. You do not need a PhD in machine learning. You just need a clear mental model of what these systems are and, crucially, what they are not.

The Core Idea: Next-Token Prediction

At its heart, a large language model (LLM) does one thing: it predicts the next word (or more precisely, the next "token" — a chunk of text that might be a word, part of a word, or a punctuation mark).

Given a sequence of text, the model calculates a probability distribution over all possible next tokens and picks the most likely one (or samples from the distribution to add variety). Then it adds that token to the sequence and repeats the process.

That is it. The entire magic of ChatGPT, Claude, and every other LLM comes down to extremely sophisticated next-token prediction.

A Simple Example

If you type: "The capital of France is"

The model has learned from billions of text examples that the most likely next token is "Paris". It does not "know" geography. It has learned statistical patterns about which words tend to follow other words.

This distinction matters enormously for prompt engineering. When the model gives a wrong answer, it is not because it "forgot" a fact — it is because the statistical patterns in its training data pointed it toward a different completion.

How LLMs Are Trained

Training an LLM happens in two main phases:

Phase 1: Pre-training

The model is shown enormous amounts of text — books, websites, articles, code repositories, forums, academic papers. We are talking about hundreds of billions to trillions of words.

During pre-training, the model learns to predict the next token in these texts. It adjusts millions or billions of internal numerical values — called parameters — to get better at this prediction task. Frontier models like GPT-4 and Claude are estimated to have hundreds of billions to over a trillion parameters, though exact figures are not publicly disclosed.

Think of parameters as the "knobs" the model can turn to improve its predictions. More parameters generally means the model can capture more nuanced patterns in language.

Key insight: The model is not storing a database of facts. It is learning statistical patterns about how language works — which words follow which, how sentences are structured, what kind of answer typically follows what kind of question.

Phase 2: Fine-tuning and Alignment

Raw pre-trained models are not very useful as assistants. They might complete your sentence with something irrelevant, offensive, or nonsensical — because they are optimised for text prediction, not helpfulness.

To make them useful, developers fine-tune the model using:

Supervised fine-tuning: Human trainers write examples of good responses to prompts, and the model learns from these.
RLHF (Reinforcement Learning from Human Feedback): Human raters rank different model outputs from best to worst, and the model learns to produce outputs that humans prefer.

This is why modern AI assistants are generally helpful and polite — they have been trained to behave that way, not because they "want" to be helpful.

Parameters and Model Size

You will sometimes hear people refer to models by their parameter count: "a 7 billion parameter model" or "a 70 billion parameter model."

Model Size	Typical Capability
~1–7 billion parameters	Can handle simple tasks, basic conversation, limited reasoning
~13–70 billion parameters	Good at most tasks, reasonable accuracy, some complex reasoning
~100+ billion parameters	State-of-the-art performance on complex tasks, nuanced understanding of context

Bigger is not always better for every task. Smaller models are faster and cheaper to run. But for complex, nuanced tasks — like writing a detailed essay or solving a multi-step problem — larger models generally perform significantly better.

Training vs. Inference

Two key terms you should understand:

Training is the process of building the model — feeding it data, adjusting parameters, fine-tuning. This happens once (or periodically) and requires enormous computing resources. Training a frontier model can cost tens of millions of dollars and take weeks on thousands of specialised chips.
Inference is what happens when you use the model — you send a prompt, the model generates a response. This is much cheaper and faster than training, but still requires significant computing power.

When you chat with ChatGPT or Claude, you are doing inference. The model is not learning from your conversation (unless you have explicitly opted into a feature that feeds your data back into training, which most providers offer controls for).

What LLMs Can and Cannot Do

Understanding the mechanism helps you understand the strengths and weaknesses:

What LLMs are good at:

Generating fluent, coherent text in many styles and formats
Summarising and rephrasing existing information
Following patterns — if you show examples, they can match the style
Brainstorming and ideation — generating many possible ideas quickly
Explaining concepts at different levels of complexity
Translating between languages
Writing and explaining code in most popular programming languages

What LLMs struggle with:

Factual accuracy — they can and do make things up (called "hallucinations")
Mathematics — they can solve many problems but make errors on multi-step calculations
Counting and precise measurement — asking "how many r's in strawberry?" often gets a wrong answer
Truly novel reasoning — they pattern-match from training data rather than reasoning from first principles
Real-time or very recent information — they have a knowledge cutoff and do not browse the internet by default
Knowing what they do not know — they will confidently generate an answer even when they should say "I'm not sure"

Why This Matters for Prompt Engineering

If you think of an LLM as a brilliant expert who knows everything, you will be constantly disappointed and occasionally misled. If you think of it as a powerful text-prediction engine that can be guided to produce useful output, you will use it much more effectively.

Prompt engineering is the art of crafting your input to guide the model's predictions toward the output you actually want. The better you understand the mechanism, the better you can:

Write prompts that give the model enough context to make good predictions
Recognise when the model is likely to be unreliable (novel facts, precise calculations, very recent events)
Structure your prompts to activate the right patterns in the model's training
Verify and check outputs rather than trusting them blindly

In the next lesson, we will start with the practical fundamentals of writing clear, effective prompts.