Floating-Point Representation

This lesson covers how computers represent real numbers using floating-point representation. You need to understand the mantissa and exponent, normalisation, converting between floating-point and denary, and the trade-off between precision and range.

Why Floating-Point?

Fixed-point binary has a fixed binary point position, limiting either range or precision. Floating-point allows the binary point to "float" — move to different positions — enabling representation of very large and very small numbers.

This is similar to scientific notation in denary:

3,400,000 = 3.4 x 10^6
0.00056 = 5.6 x 10^-4

In binary floating-point:

The number is stored as a mantissa x 2^exponent

Structure of a Floating-Point Number

A floating-point number consists of two parts:

Component	Purpose
Mantissa	Stores the significant digits (precision) of the number
Exponent	Stores the power of 2 (determines where the binary point sits)

Both the mantissa and exponent are stored in two's complement so that negative values can be represented.

Example Format: 8-bit Mantissa, 4-bit Exponent

[M M M M M M M M] [E E E E]
 ^                  ^
 mantissa            exponent
 (sign bit)          (sign bit, two's complement)

Converting Floating-Point to Denary

Steps

Identify the mantissa and exponent bits
Convert the exponent from two's complement to denary
Place the binary point after the first bit of the mantissa
Shift the binary point right (positive exponent) or left (negative exponent) by the exponent value
Calculate the denary value from the resulting fixed-point binary

Worked Example

Mantissa: 01101000, Exponent: 0101 (i.e., +5)

Step 1: Write mantissa with binary point after bit 0 (the MSB): 0.1101000

Step 2: Exponent = 0101 = +5

Step 3: Shift binary point 5 places right: 0.1101000 -> 011010.00

Step 4: Calculate denary: 16 + 8 + 2 = 26.0

Worked Example: Negative Mantissa

Mantissa: 11010000, Exponent: 0011 (+3)

Step 1: Mantissa with binary point: 1.1010000

Step 2: Shift right by 3: 1101.0000

Step 3: This is a two's complement number. MSB is 1 so it is negative. Using place values: -8 + 4 + 1 = -3.0

Converting Denary to Floating-Point

Steps

Convert the denary number to a fixed-point binary number
Determine how many places the binary point must be shifted to place it after the MSB
The number of shifts becomes the exponent (right shifts = positive, left shifts = negative)
The resulting binary (with point after the MSB) is the mantissa
Convert the exponent to two's complement binary

Worked Example: Convert 13.5 to Floating-Point (8-bit mantissa, 4-bit exponent)

Step 1: 13.5 in binary = 1101.1 Step 2: Write as 0.11011 x 2^4 (shifted the point 4 places right to normalise)

Wait — for positive numbers, the normalised form is 0.1xxxxx (MSB of mantissa is 0, next bit is 1).

So: 13.5 = 1101.1 = 0.110110 x 2^(4)... Actually, let us be precise:

Place binary point at the start: 0.11011 would require shifting 4 places right to get 1101.1

So mantissa = 01101100 (padded to 8 bits), exponent = 0100 (+4)

Result: Mantissa = 01101100, Exponent = 0100

Normalisation

A floating-point number is normalised when the mantissa begins with:

0.1... for positive numbers
1.0... for negative numbers (two's complement)

Floating-Point Representation

Floating-Point Representation

Why Floating-Point?

Structure of a Floating-Point Number

Example Format: 8-bit Mantissa, 4-bit Exponent

Converting Floating-Point to Denary

Steps

Worked Example

Worked Example: Negative Mantissa

Converting Denary to Floating-Point

Steps

Worked Example: Convert 13.5 to Floating-Point (8-bit mantissa, 4-bit exponent)

Normalisation

Why Normalise?

More in Computer Science