You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Fixed-point binary forces you to choose, once and for all, how many bits sit each side of the point — fine for a known range, hopeless when values span from the astronomically large to the vanishingly small. Floating-point lets the binary point move, so a single fixed-width format can represent both. This lesson covers the mantissa and exponent, normalisation, converting to and from denary, the range-versus-precision trade-off, and the rounding errors floating-point inevitably introduces.
This lesson addresses the H446 1.4.1 Data Types content on real-number representation:
(This is a paraphrase of the specification content, not a verbatim quotation.)
Floating-point is the binary equivalent of scientific notation. In denary we write very large and very small numbers compactly by separating the significant digits from a power of ten:
3,400,000=3.4×106,0.00056=5.6×10−4
The power of ten effectively slides the decimal point to wherever it is needed, so the same notation handles both extremes. Binary floating-point does the identical thing in base 2, storing a number as:
N=mantissa×2exponent
The mantissa holds the significant bits (this fixes the precision), and the exponent says how far, and in which direction, to slide the binary point (this fixes the magnitude). Because the exponent can be large and positive or large and negative, one fixed-width format spans an enormous range — exactly the flexibility fixed-point lacks.
It is worth making the size of that advantage concrete. Recall from the previous lesson that a 16-bit fixed-point number in, say, 8.8 format can only reach about 127 at the top and resolves steps of about 0.004 — a single, narrow window. Now spend those same 16 bits as floating-point with an 8-bit exponent: the exponent alone ranges over roughly −128 to +127, so the magnitude can swing from about 2−128 (a number with around 38 leading zeros after the decimal point) up to about 2127 (a 38-digit number). The same number of bits now covers a span fixed-point could never approach. The catch — and the theme of this lesson — is that this range is bought with non-uniform precision: the representable values are dense near zero and increasingly sparse for large magnitudes, because a fixed-width mantissa provides a fixed number of significant figures, not a fixed step size.
A floating-point number is split into two fields, both stored in two's complement so each can be negative:
| Component | Purpose |
|---|---|
| Mantissa | The significant bits of the value; determines precision |
| Exponent | A signed power of 2; determines where the binary point sits, hence the magnitude/range |
Throughout this lesson we use OCR's common teaching format of an 8-bit mantissa and a 4-bit exponent, with the mantissa's binary point taken to lie immediately after its most significant (sign) bit:
| Field | Bits | Leftmost bit |
|---|---|---|
| Mantissa | 8 | sign bit, with binary point understood directly after it |
| Exponent | 4 | sign bit of a two's complement integer |
So the mantissa is read as a signed fixed-point fraction s.bbbbbbb (sign bit, then seven fractional bits), and the exponent is a signed integer in the range −8 to +7.
Two points about this format trip students up and are worth fixing now. First, the binary point in the mantissa is implied, not stored — there is no extra bit for it; everyone simply agrees it sits after the leading bit, so the eight stored bits are interpreted as s.bbbbbbb. Second, the mantissa being two's complement means its leading bit is a genuine sign bit with the negative weight −1 (since the column immediately left of the point is the −20=−1 place in this fraction layout); that is why a normalised negative number reads 1.0… rather than 0.1…. Different textbooks and exam papers occasionally place the point elsewhere or use a different mantissa width, so in an exam always work to the format the question specifies rather than assuming this one. The method is identical whatever the widths; only the place values and the exponent range change.
The procedure is: read the exponent, write the mantissa with its point after the first bit, then slide the point by the exponent — right for a positive exponent, left for a negative one — and finally evaluate the resulting fixed-point number (remembering the mantissa is two's complement, so a leading 1 means negative). A helpful way to remember the direction is that the exponent is the power of two multiplying the mantissa: multiplying by a positive power makes the number bigger, which means moving the point right; multiplying by a negative power makes it smaller, moving the point left. Tie the direction to "bigger or smaller" rather than memorising "right or left" in isolation, and you will not reverse it under pressure.
Mantissa 01101000, exponent 0101.
The exponent 01012=+5. Write the mantissa with its point after the sign bit:
0.1101000
Slide the point 5 places right (positive exponent):
0.1101000→011010.00
Evaluate as a positive fixed-point value:
16+8+2=26.0
Mantissa 11010000, exponent 0011.
The exponent 00112=+3. The mantissa starts with 1, so it is a negative two's complement value. Write it with the point after the sign bit:
1.1010000
Slide the point 3 places right:
1.1010000→1101.0000
Now evaluate 1101.0000 as a two's complement fixed-point number — the leading column carries the negative weight −8:
−8+4+1=−3.0
A common slip here is to read the mantissa as if it were unsigned (giving +13); the leading 1 must be treated as the sign throughout.
Mantissa 01100000, exponent 1110.
The exponent 1110 is a negative two's complement value: flip →0001, add 1 →0010=2, so the exponent is −2. Write the mantissa with its point after the sign bit:
0.1100000
A negative exponent slides the point left by 2:
0.1100000→0.0011000
Evaluate the fixed-point fraction:
0.125+0.0625=0.1875
So a negative exponent produces a small fractional value, exactly as ×10−n does in denary scientific notation. The two worked examples together show the whole point of floating-point: the same mantissa bits give very different magnitudes depending on the exponent, large or small.
The reverse procedure: convert to fixed-point binary, then normalise by sliding the point to just after the leading sign bit, recording how far you slid as the exponent, and finally write the exponent in two's complement. The mental model that prevents most errors is to think of it exactly as you would write a denary number in scientific notation — first get the digits, then decide where the point "really" is, then express that as a power. The only complications binary adds are that the mantissa is two's complement (so negatives need the flip-and-add-1 step) and that the exponent itself is stored in two's complement.
Step 1 — fixed-point binary: 13.5=1101.1.
Step 2 — write in normalised form 0.1…×2e. We need the mantissa to read 0.1…, so we place the point at the front: 0.11011. To recover 1101.1 from 0.11011 the point must move 4 places right, so the exponent is +4:
13.5=0.11011×24
Step 3 — pad the mantissa to 8 bits (zeros on the right do not change the value): 0.1101100→ mantissa =01101100.
Step 4 — exponent in 4-bit two's complement: +4=0100.
Result: mantissa 01101100, exponent 0100
Check by converting back: exponent +4 slides 0.1101100 four right to 1101.100=8+4+1+0.5=13.5. Correct.
Step 1: 9.5=1001.1, so −9.5 is its two's complement. As a fixed-point value we need a negative normalised mantissa starting 1.0…. Working with the magnitude first, 9.5=0.10011×24; negating the mantissa 0.1001100 (flip →1.0110011, add the least-significant 1 →1.0110100) gives the two's complement mantissa.
Step 2: mantissa =10110100, exponent =0100 (+4).
Check: slide 1.0110100 four right to 10110.100; read in two's complement with the −16 column: −16+4+2+0.5=−9.5. Correct. Negative conversions are fiddly, so always finish with this back-check.
A floating-point number is normalised when its mantissa begins with:
In both cases the first two bits differ — the sign bit and the bit after it are opposite. That is the quick test for whether a mantissa is normalised.
Normalisation maximises precision for a fixed number of mantissa bits. Leading zeros in a positive mantissa (or leading ones in a negative one) carry no information — they merely say "the value is small" — yet they occupy mantissa bits that could otherwise hold significant figures. By sliding those redundant leading bits out (and compensating with the exponent) we pack the maximum number of meaningful bits into the mantissa.
| Representation | Normalised? | Why |
|---|---|---|
| 0.1101×23 | Yes | Positive, starts 0.1 |
| 0.0011×25 | No | Two leading zeros waste two bits of precision |
| 1.0010×24 | Yes | Negative, starts 1.0 |
| 1.1100×22 | No | Leading ones waste precision |
A second benefit is uniqueness: without normalisation the same value has many representations (0.0011×25 and 0.1100×23 are equal), which complicates comparison. Normalisation gives each value a single canonical form.
Shift the mantissa left until the first two bits differ (i.e. it reads 0.1… or 1.0…), and decrease the exponent by 1 for each left shift. The reasoning: each left shift of the bits multiplies the mantissa by 2, so to keep the overall value mantissa×2e unchanged the exponent must drop by 1 for every shift. (Right shifts work the opposite way and increase the exponent, but normalisation only ever shifts left, removing redundant leading bits.)
The mantissa 0.0010110 starts 0.00… — not normalised. Shift the bits left by 2 until it reads 0.1…:
0.0010110→0.1011000
Each left shift decreases the exponent by 1, so subtract 2: 6−2=4=0100.
Normalised: mantissa 01011000, exponent 0100
The value is unchanged — 0.1011×24=0.0010110×26 — but the mantissa now uses all its bits for significant figures.
Normalise 1.1101000 with exponent 0101 (+5). A normalised negative mantissa must read 1.0… (the first two bits differ). Here it reads 1.1… — the first two bits are the same — so it is not yet normalised. Shift left by 1 so the second bit becomes 0:
1.1101000→1.1010000
That still reads 1.1…; shift left once more:
1.1010000→1.0100000
Now it reads 1.0… and is normalised. Two left shifts means the exponent drops by 2: 5−2=3=0011. So the normalised form is mantissa 10100000, exponent 0011. The test to remember is identical for both signs: keep shifting left until the two leading bits differ, decreasing the exponent each time.
The total width is fixed, so every bit given to the mantissa is a bit taken from the exponent and vice versa. This produces the central trade-off of floating-point:
| Give more bits to… | You gain… | You lose… |
|---|---|---|
| Mantissa | Precision — more significant figures, finer steps | Range — the exponent can reach fewer powers of 2 |
| Exponent | Range — far larger and far smaller magnitudes | Precision — fewer significant figures |
| Allocation | Mantissa | Exponent | Precision | Exponent range |
|---|---|---|---|---|
| 12M + 4E | 12 bits | 4 bits | High | −8 to +7 (small) |
| 8M + 8E | 8 bits | 8 bits | Medium | −128 to +127 (large) |
| 4M + 12E | 4 bits | 12 bits | Low | −2048 to +2047 (huge) |
There is no universally "best" split — it depends on the application. Graphics and scientific work usually favour more mantissa bits (accuracy matters and the magnitudes are moderate); representing physical constants across many orders of magnitude favours more exponent bits.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.