You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
You have probably used ChatGPT, Claude, Gemini, or another AI chatbot. You type something in, and it writes back something that sounds remarkably human. It can write essays, explain quantum physics, translate languages, debug code, and compose poetry — often in seconds.
But what is actually happening when you talk to an AI? How does it "know" things? Does it think? Does it understand?
If you want to become good at prompt engineering — the skill of getting AI to produce useful, accurate output — you need a basic understanding of what is going on under the hood. You do not need a PhD in machine learning. You just need a clear mental model of what these systems are and, crucially, what they are not.
At its heart, a large language model (LLM) does one thing: it predicts the next word (or more precisely, the next "token" — a chunk of text that might be a word, part of a word, or a punctuation mark).
Given a sequence of text, the model calculates a probability distribution over all possible next tokens and picks the most likely one (or samples from the distribution to add variety). Then it adds that token to the sequence and repeats the process.
That is it. The entire magic of ChatGPT, Claude, and every other LLM comes down to extremely sophisticated next-token prediction.
If you type: "The capital of France is"
The model has learned from billions of text examples that the most likely next token is "Paris". It does not "know" geography. It has learned statistical patterns about which words tend to follow other words.
This distinction matters enormously for prompt engineering. When the model gives a wrong answer, it is not because it "forgot" a fact — it is because the statistical patterns in its training data pointed it toward a different completion.
Training an LLM happens in two main phases:
The model is shown enormous amounts of text — books, websites, articles, code repositories, forums, academic papers. We are talking about hundreds of billions to trillions of words.
During pre-training, the model learns to predict the next token in these texts. It adjusts millions or billions of internal numerical values — called parameters — to get better at this prediction task. Frontier models like GPT-4 and Claude are estimated to have hundreds of billions to over a trillion parameters, though exact figures are not publicly disclosed.
Think of parameters as the "knobs" the model can turn to improve its predictions. More parameters generally means the model can capture more nuanced patterns in language.
Key insight: The model is not storing a database of facts. It is learning statistical patterns about how language works — which words follow which, how sentences are structured, what kind of answer typically follows what kind of question.
Raw pre-trained models are not very useful as assistants. They might complete your sentence with something irrelevant, offensive, or nonsensical — because they are optimised for text prediction, not helpfulness.
To make them useful, developers fine-tune the model using:
This is why modern AI assistants are generally helpful and polite — they have been trained to behave that way, not because they "want" to be helpful.
You will sometimes hear people refer to models by their parameter count: "a 7 billion parameter model" or "a 70 billion parameter model."
| Model Size | Typical Capability |
|---|---|
| ~1–7 billion parameters | Can handle simple tasks, basic conversation, limited reasoning |
| ~13–70 billion parameters | Good at most tasks, reasonable accuracy, some complex reasoning |
| ~100+ billion parameters | State-of-the-art performance on complex tasks, nuanced understanding of context |
Bigger is not always better for every task. Smaller models are faster and cheaper to run. But for complex, nuanced tasks — like writing a detailed essay or solving a multi-step problem — larger models generally perform significantly better.
Two key terms you should understand:
Training is the process of building the model — feeding it data, adjusting parameters, fine-tuning. This happens once (or periodically) and requires enormous computing resources. Training a frontier model can cost tens of millions of dollars and take weeks on thousands of specialised chips.
Inference is what happens when you use the model — you send a prompt, the model generates a response. This is much cheaper and faster than training, but still requires significant computing power.
When you chat with ChatGPT or Claude, you are doing inference. The model is not learning from your conversation (unless you have explicitly opted into a feature that feeds your data back into training, which most providers offer controls for).
Understanding the mechanism helps you understand the strengths and weaknesses:
If you think of an LLM as a brilliant expert who knows everything, you will be constantly disappointed and occasionally misled. If you think of it as a powerful text-prediction engine that can be guided to produce useful output, you will use it much more effectively.
Prompt engineering is the art of crafting your input to guide the model's predictions toward the output you actually want. The better you understand the mechanism, the better you can:
In the next lesson, we will start with the practical fundamentals of writing clear, effective prompts.