You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
So far we have represented whole numbers. But real-world quantities — voltages, temperatures, money, the results of division — are fractional. A finite register cannot hold every real number, so we must choose a scheme that trades range against precision. This lesson develops fixed-point binary fractions and then the far more flexible floating-point representation: mantissa and exponent (both in two's complement), normalisation, two-way conversion between denary and normalised floating point, the range-versus-precision trade-off, absolute and relative error, rounding, and underflow/overflow. Floating point is the single hardest sub-topic in data representation, so every conversion below is shown in full.
This lesson covers the real-number strand of the AQA A-Level Computer Science (7517) Fundamentals of data representation area:
This material links tightly to the rounding-error and precision content elsewhere in the specification and to the two's complement arithmetic from the previous lesson.
In fixed-point representation the binary point sits at a fixed, agreed column. Bits to the left have positive powers of two; bits to the right have negative powers:
2382242122012−1212−2412−3812−4161
The fractional place values in denary are 0.5,0.25,0.125,0.0625,…
With the point after 4 integer bits, evaluate 0101.1010:
| Bit | 0 | 1 | 0 | 1 | . | 1 | 0 | 1 | 0 |
|---|---|---|---|---|---|---|---|---|---|
| Value | 8 | 4 | 2 | 1 | 21 | 41 | 81 | 161 |
4+1+0.5+0.125=5.62510
Convert 6.7510. Handle the integer and fractional parts separately. Integer part: 6=0110. Fractional part by repeated multiplication by 2 — read the carried integer parts top-to-bottom:
| Step | Result | Carry (bit) |
|---|---|---|
| 0.75×2=1.5 | 0.5 | 1 |
| 0.5×2=1.0 | 0.0 | 1 |
| stop — fraction is 0 |
So 0.75=.11 and 6.75=0110.11002 (padding the fraction field). Check: 4+2+0.5+0.25=6.75. Correct.
Fixed point is simple and fast, but the position of the point is fixed — so the range and the smallest representable step are both locked. With 4 integer + 4 fraction bits the largest value is just under 16 and the finest resolution is 161=0.0625. You cannot represent a very large number and a very small one in the same format. Floating point solves this by letting the point move.
To make the limitation concrete, consider the same 8 bits split as fixed-point. The total range and resolution depend entirely on where we fix the point:
| Point position | Largest value | Smallest step (resolution) |
|---|---|---|
| 7 integer, 1 fraction | ≈127.5 | 21=0.5 |
| 4 integer, 4 fraction | ≈15.94 | 161=0.0625 |
| 1 integer, 7 fraction | ≈1.99 | 1281≈0.0078 |
Every row uses all 8 bits, yet each forces a stark compromise: a wide range (top row) buys coarse resolution, while fine resolution (bottom row) buys a tiny range. With the point fixed you must commit to one row in advance, even though real data may need both large and small magnitudes. Floating point sidesteps this by storing the point's position (the exponent) with each number, so the same format can represent 12,000 and 0.0003 — the freedom that makes it the standard for scientific and general-purpose real arithmetic.
A floating-point number, like scientific notation in denary (6.02×1023), has two parts:
value=mantissa×2exponent
The format used in this lesson. Unless a question states otherwise we use a 6-bit mantissa (1 sign bit + 5 fraction bits, point straight after the sign bit) and a 4-bit two's complement exponent. AQA questions always state the bit allocation; read it carefully because the answer depends on it.
So a stored pair like mantissa 0.10110, exponent 0011 means 0.101102×23 — shift the point 3 places right to get 0101.102=5.510.
Because the mantissa is two's complement, its leading bit is a sign bit with weight −1 (i.e. −20 with the point right after it). The remaining bits have weights 2−1,2−2,…:
−20−12−1212−2412−3812−41612−5321
A positive mantissa therefore begins 0.1... and a negative mantissa begins 1.0... once normalised — a fact we use constantly.
A number can be written in floating-point form in many ways (0.011×22=0.11×21=1.1×20, all equal to 1.5). Normalisation picks the one form that uses the mantissa bits most efficiently, giving maximum precision.
The rule for a two's complement mantissa:
01).10).In both cases the two most-significant bits differ. The idea: shift the point so the first significant bit sits just after the sign, wasting no leading bits that carry no information. Each shift left of the point increases the exponent by 1; each shift right decreases it by 1 (you are multiplying/dividing the mantissa by 2 and compensating in the exponent so the value is unchanged).
Exam Tip: "Normalise this number" almost always means shift until the first two mantissa bits differ, adjusting the exponent to keep the value the same. Show the shift and the matching exponent change explicitly — both attract marks.
A common short question gives a mantissa and asks whether it is normalised. Apply the two-bits-differ rule by inspection:
| Mantissa | First two bits | Normalised? | Why |
|---|---|---|---|
| 010110 | 01 | Yes | positive, bits differ |
| 001011 | 00 | No | leading 0s waste precision — shift left, decrease exponent |
| 101101 | 10 | Yes | negative, bits differ |
| 110100 | 11 | No | leading 1s waste precision — shift left, decrease exponent |
The rule of thumb: an un-normalised mantissa always begins with two identical bits (00 or 11). To normalise it you shift the mantissa left until the first two bits differ, decreasing the exponent by 1 for each shift (because each left shift doubles the mantissa, so the exponent must drop to keep the value constant). For example 001011 with exponent 5 becomes 010110 with exponent 4 after one left shift — same value, now normalised, one more significant bit retained.
Step 1 — convert to fixed-point binary. 9=1001 and 0.5=.1, so 9.5=1001.12.
Step 2 — write in the form mantissa ×2e with the point after the sign bit. Move the point to just after the leading (sign) position. Currently the point is after 4 integer bits; to reach 0.1001 1 form we move it 4 places left, so the exponent is +4:
9.5=1001.12=0.100112×24
Step 3 — write the mantissa with its sign bit (positive → leading 0) and the exponent in two's complement. Mantissa (6 bits) =0.10011; exponent 4=0100 in 4-bit two's complement. Check normalisation: first two mantissa bits are 01 ✓ (positive, normalised).
mantissa 010011,exponent 0100
Step 1 — fixed-point binary. Repeated multiplication: 0.40625×2=0.8125 (carry 0); 0.8125×2=1.625 (carry 1); 0.625×2=1.25 (carry 1); 0.25×2=0.5 (carry 0); 0.5×2=1.0 (carry 1). Reading the carries top-to-bottom: 0.40625=0.011012. (Check: 41+81+321=0.25+0.125+0.03125=0.40625 ✓.)
Step 2 — normalise. The first significant bit is in the 2−2 column, so we must shift the point one place right to make the pattern 0.1101, decreasing the exponent by 1 to −1:
0.40625=0.011012×20=0.11012×2−1
Step 3 — encode. Mantissa =0.11010 (positive, pattern 01 ✓); exponent −1=1111 in 4-bit two's complement.
mantissa 011010,exponent 1111
Start from the positive mantissa 0.10011 with exponent 4, and negate the mantissa in two's complement (the exponent is unchanged). Invert and add one to 010011:
| Step | Mantissa bits |
|---|---|
| + mantissa | 0 1 0 0 1 1 |
| invert | 1 0 1 1 0 0 |
| add 1 | 1 0 1 1 0 1 |
So −9.5 has mantissa 101101, exponent 0100. Check normalisation: first two bits 10 ✓ (negative, normalised). Decoding back: −1+41+161+321=−0.65625, and −0.65625×24=−10.5… that is not −9.5, which warns us to re-derive carefully rather than negate the rounded mantissa. Negating 0.10011 exactly: its value is 0.59375; two's complement negation gives 1.01101 with value −1+0.25+0.125+0.03125=−0.59375 ✓, and −0.59375×24=−9.5 ✓. The mantissa is therefore 101101 — the earlier decode slip came from mis-reading a bit, underlining why you must always decode your answer to verify.
Decode mantissa 010110, exponent 0011.
Step 1 — value of the exponent. 00112=+3.
Step 2 — value of the mantissa (two's complement, point after sign bit). Leading bit 0 → positive: 0⋅(−1)+21+0+81+161+0=0.5+0.125+0.0625=0.6875.
Step 3 — apply the exponent (shift the point 3 places right):
0.6875×23=0.6875×8=5.510
Now a negative example: mantissa 100110, exponent 0010.
Mantissa value (leading 1 → the −1 column contributes): −1+41+161=−1+0.25+0.0625=−0.6875. Exponent =+2. So:
−0.6875×22=−0.6875×4=−2.7510
Exam Tip: Decode the exponent first, then the mantissa as a two's complement fraction, then shift. Writing the mantissa column weights (−1,21,41,…) above the bits prevents the most common slip — forgetting that the leading column is negative.
A floating-point format has a fixed total of bits to split between mantissa and exponent, and this split is a direct trade-off:
graph LR
A["Fixed total bits"] --> B["More exponent bits"]
A --> C["More mantissa bits"]
B --> D["Larger range (very big and very small)"]
B --> E["Lower precision (fewer significant figures)"]
C --> F["Higher precision (more significant figures)"]
C --> G["Smaller range"]
You cannot improve both at once with a fixed word length — increasing one necessarily shrinks the other. This is why real formats (IEEE 754 single precision: 23-bit mantissa, 8-bit exponent; double precision: 52-bit mantissa, 11-bit exponent) choose the split deliberately for their use-case.
Because most real numbers cannot be represented exactly in a finite mantissa, the stored value is an approximation. We quantify the error two ways.
Suppose a format can only store 31 as the rounded value 0.333251953125 (a particular 12-bit mantissa). The true value is 0.33333.
absolute error=∣0.333251953125−0.333333…∣≈0.0000814 relative error=0.333333…0.0000814≈0.000244=0.0244%
The same absolute error matters far more for a small value than a large one, which is exactly why relative error is the more meaningful measure for floating point: a given mantissa width gives roughly constant relative precision across the whole range, because normalisation always packs the same number of significant bits behind the point.
When a value needs more fraction bits than the mantissa provides, the surplus bits are discarded. Truncation simply drops them (always biasing toward zero in magnitude); rounding to nearest chooses the closer representable value (smaller average error). Rounding can occasionally increase the magnitude enough to require re-normalisation.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.