OCR A-Level Computer Science: Data Representation — Complete Revision Guide (H446)
OCR A-Level Computer Science: Data Representation — Complete Revision Guide
Everything a computer stores or processes is ultimately a pattern of bits, and data representation is the topic that explains how those bits come to mean a number, a character, an image, a sound or a secret message. It is the most calculation-heavy area of OCR A-Level Computer Science (H446): once you can convert fluently between binary, denary and hexadecimal, add and subtract binary numbers, represent negatives with two's complement, normalise a floating-point value, and reason about how images and audio are sampled and compressed, a large bank of reliably scorable marks opens up. This module rewards practised technique over memorised facts.
In the H446 specification this material draws together two areas: data types and representation (module 1.4.1) and the exchanging and protecting of data, including compression, encryption and error checking. It is examined in Component 01: Computer Systems, where the questions are dominated by "show your working" conversions and arithmetic, alongside explanation items on how sampling parameters affect file size and quality, why a particular compression method suits a particular data type, and how encryption and error-detection schemes work. Accuracy and clear working are everything here — a correct method with one arithmetic slip still earns method marks, but only if the working is laid out.
This is Course 3 of 11 on the LearningBro OCR A-Level Computer Science learning path. The course, Data Representation, opens with number systems and binary arithmetic, develops two's complement, fixed-point and floating-point representations, then covers how characters, images and sound are encoded, before closing on compression, encryption and error detection. It builds on the hardware of Processors & Hardware and underpins the logic-circuit arithmetic of Boolean Algebra & Logic.
Guide Overview
The Data Representation course is built as ten lessons that move from number systems and binary arithmetic through signed and fractional representations, into the encoding of characters, images and sound, then close on compression, encryption and error detection.
- Number Systems
- Binary Arithmetic
- Two's Complement and Fixed Point
- Floating Point
- Character Encoding
- Image Representation
- Sound Representation
- Data Compression
- Encryption
- Error Detection
Number Systems
The number systems lesson establishes the three bases H446 works in — denary (base 10), binary (base 2) and hexadecimal (base 16) — and the conversions between them. Binary is how the hardware stores data; hexadecimal is a compact human-readable shorthand in which each hex digit maps exactly onto four binary bits (a nibble), which is why memory addresses, colour codes and machine code are so often written in hex.
The conversions to drill are denary to binary (repeated division by two, or subtracting place values), binary to denary (summing the place values of the set bits), and the binary-to-hex grouping in nibbles that makes hex conversion almost instant. The key fact that pays off repeatedly is the place-value table — 128, 64, 32, 16, 8, 4, 2, 1 for an 8-bit number — and the relationship that n bits represent 2^n distinct values. This vocabulary of bases is the substrate for every later lesson in the module, and the nibble-to-hex mapping reappears directly in the colour codes of image representation.
| Denary | Binary (4-bit) | Hex |
|---|---|---|
| 0 | 0000 | 0 |
| 5 | 0101 | 5 |
| 10 | 1010 | A |
| 15 | 1111 | F |
Binary Arithmetic
The binary arithmetic lesson develops addition and shifting on binary numbers. Binary addition follows the same column method as denary, with the rules that 1 + 1 = 10 (write 0, carry 1) and 1 + 1 + 1 = 11 (write 1, carry 1). The examinable hazard is overflow: when the result of adding two numbers needs more bits than the register holds, the carry out of the most significant bit is lost and the stored answer is wrong — a condition the processor must detect and flag.
The lesson also covers binary shifts. A logical shift left by one place multiplies an unsigned value by two (a zero enters at the right); a shift right by one place divides by two (bits fall off the right, and for an arithmetic right shift the sign bit is preserved). Being able to state the multiply/divide effect of a shift, and to recognise the rounding that a right shift causes when a bit is lost, is a frequent short-answer requirement. These operations connect forward to the hardware that performs them — the adders built in Boolean Algebra & Logic.
Two's Complement and Fixed Point
The two's complement and fixed point lesson develops how negative numbers and fractions are stored. Two's complement is the standard scheme for signed integers: the most significant bit carries a negative place value, so an 8-bit two's complement number represents values from -128 to +127. To negate a number you invert every bit and add one; to subtract, you add the two's complement of the number being subtracted. The advantages examiners expect are that there is a single representation of zero and that addition and subtraction use the same circuitry — which is exactly why the hardware adders of the Boolean module work for signed values without modification.
Fixed-point binary extends binary to fractions by placing an implied binary point at a fixed position, with bits to its right carrying place values of 1/2, 1/4, 1/8 and so on. Fixed point is simple and gives consistent precision across its range, but its range and precision are both limited by the fixed split between the integer and fractional parts — the limitation that motivates the floating-point representation covered next. Practise converting denary fractions to fixed-point binary and back, and negating values in two's complement, until both are automatic.
Floating Point
The floating point lesson develops the representation that trades some precision for a far wider range, mirroring scientific notation in binary. A floating-point number is stored as a mantissa (the significant digits, a signed fraction in two's complement) and an exponent (a signed power of two, also in two's complement) that says where the binary point sits. The value is the mantissa multiplied by two raised to the exponent.
The central skill is normalisation: adjusting the mantissa and exponent so the mantissa's most significant bits are the standard form (for a positive number, 0.1...; for a negative number in two's complement, 1.0...), which maximises the precision available in the fixed number of mantissa bits. The examinable trade-off, which carries marks in extended answers, is that allocating more bits to the mantissa increases precision but reduces the range, while allocating more bits to the exponent increases the range but reduces precision for a fixed total word length. Be ready to convert a denary value to a normalised floating-point pattern, convert a pattern back to denary, and explain the precision/range trade-off in context.
Character Encoding
The character encoding lesson develops how text is represented as numbers through agreed character sets. ASCII in its common form uses 7 bits to encode 128 characters — the upper- and lower-case letters, digits, punctuation and control codes — which is sufficient for English text but cannot cover the world's writing systems. Unicode was introduced to give every character in every script a unique code point, using variable-width encodings such as UTF-8 (which is backward-compatible with ASCII for the first 128 code points) to represent a vastly larger character set.
Two facts recur in questions. First, the ordering of codes is deliberate: digits are contiguous and the alphabet is contiguous, so arithmetic on character codes works predictably (and a fixed gap separates corresponding upper- and lower-case letters). Second, the trade-off between ASCII and Unicode is range versus storage — Unicode represents far more characters but a character may take more than one byte. The recurring exam pattern is to compute the storage for a string given the bits per character, or to explain why Unicode was needed.
Image Representation
The image representation lesson develops how bitmap images are stored as a grid of pixels, each pixel a binary colour value. Three properties are examined. Resolution is the number of pixels (often given as width by height); colour depth (or bit depth) is the number of bits used per pixel, which fixes the number of distinct colours as 2 raised to the colour depth; and metadata is the additional stored information (dimensions, colour depth, format) needed to reconstruct the image correctly.
The standard calculation is image file size: number of pixels multiplied by colour depth in bits, converted to bytes. Increasing resolution or colour depth improves image quality but increases file size proportionally — the trade-off that motivates the compression covered later. The lesson links the four-bits-per-hex-digit fact from the number systems lesson to hexadecimal colour codes, where a 24-bit colour is written as six hex digits (two each for red, green and blue). Practise the file-size calculation and the colour-depth-to-colour-count relationship until both are automatic.
Sound Representation
The sound representation lesson develops how a continuous analogue sound wave is captured as discrete binary samples. Sampling measures the amplitude of the wave at regular intervals; the sample rate (samples per second, in hertz) sets how often this happens, and the sample resolution (bit depth, bits per sample) sets how precisely each amplitude is recorded. A higher sample rate captures higher-frequency detail and a higher sample resolution captures amplitude more accurately, both improving fidelity at the cost of a larger file.
The calculation to master is sound file size: sample rate multiplied by sample resolution multiplied by duration (and by the number of channels for stereo). The conceptual point examiners reward is the link between sampling parameters and the faithfulness of the reconstruction — too low a sample rate loses high-frequency content, too low a resolution introduces quantisation error. As with images, the quality-versus-size trade-off here is the direct motivation for the compression techniques in the next lesson.
Data Compression
The data compression lesson develops the two families of technique for reducing file size, and crucially when each is appropriate. Lossless compression reduces file size without discarding any information, so the original is perfectly reconstructable; H446 examines two methods. Run-length encoding (RLE) replaces runs of repeated values with a single value and a count, which works well on data with long runs (such as simple graphics) but can enlarge data with little repetition. Dictionary coding replaces recurring patterns with shorter codes held in a dictionary, suiting text with repeated words or phrases.
Lossy compression achieves much greater size reductions by permanently discarding information judged least perceptible — appropriate for photographs, audio and video where a perfect reconstruction is unnecessary, but unacceptable for text, program code or any data where every bit matters. The examinable skill is selection and justification: lossless for text and executables where fidelity is essential; lossy for media where the size saving outweighs imperceptible quality loss; RLE for runs, dictionary coding for repeated patterns. Be ready to work a small RLE example by hand and to argue which method fits a given file type.
| Method | Lossless? | Best for | Weakness |
|---|---|---|---|
| Run-length encoding | Yes | Long runs of repeats (simple images) | Enlarges low-repetition data |
| Dictionary coding | Yes | Text with repeated patterns | Dictionary overhead |
| Lossy | No | Photos, audio, video | Irreversible quality loss |
Encryption
The encryption lesson develops how data is protected from unauthorised reading, distinguishing the two cryptographic models. Symmetric encryption uses a single shared key for both encryption and decryption; it is fast and efficient but requires the key to be exchanged securely, which is itself the hard problem. Asymmetric (public-key) encryption uses a mathematically linked key pair — a public key that anyone may use to encrypt, and a private key kept secret that alone can decrypt — which solves the key-distribution problem because the public key can be shared openly.
The lesson also introduces the headline application: secure communication over a network, the foundation for protocols revisited in Networks. It distinguishes encryption from related ideas it is easy to confuse — encryption scrambles data so only an authorised party can read it, whereas hashing produces a fixed-length fingerprint that cannot be reversed. The examinable points are the symmetric/asymmetric distinction, why asymmetric encryption solves key exchange, and the appropriate use of each.
Error Detection
The error detection lesson develops the techniques that catch corruption introduced when data is transmitted or stored. A parity bit adds a single bit set to make the total number of 1s even (even parity) or odd (odd parity); a single-bit error flips the count and is detected, though two simultaneous errors cancel and pass undetected. A checksum computes a value from the data block, sends it alongside, and recomputes it on receipt; a mismatch signals corruption. A check digit (such as those in barcodes and ISBNs) is a digit calculated from the others to validate manually entered data.
The lesson also covers majority voting and the use of parity blocks to locate as well as detect an error, and it connects error checking to the wider data-exchange theme alongside the encryption lesson. The examinable distinction is between schemes that merely detect an error and those that can also locate or correct one, and the limitations of each — single parity detecting only odd numbers of bit errors being the classic example. Being able to compute a parity bit or a simple check digit and explain what the scheme will and will not catch is the core of the marks.
How to Revise Data Representation
Data representation rewards drilling over reading more than any other H446 module, because almost every mark is earned by executing a technique correctly with clear working. Build a personal worksheet that cycles through the core conversions and calculations — denary/binary/hex conversion, binary addition with overflow, two's complement negation and subtraction, fixed-point and floating-point conversion with normalisation, and the file-size calculations for images and sound — and work a few of each every revision session until the methods are automatic and the working lays itself out without thought.
For the conceptual half of the module — character sets, compression, encryption and error detection — anchor each on its examinable distinction and trade-off: ASCII versus Unicode (range versus storage), lossless versus lossy and RLE versus dictionary coding (fidelity versus size, by data type), symmetric versus asymmetric encryption (speed versus key distribution), and detect-only versus locate/correct error schemes. A one-line discriminator plus a one-line "use it when" for each is enough to handle the justification questions reliably.
Start at the Data Representation course and work through all ten lessons in order, from number systems to error detection. Once the conversions and calculations are fluent, the binary arithmetic feeds straight into the hardware that performs it in Boolean Algebra & Logic and the encryption and error-checking content connects forward to Networks on the OCR A-Level Computer Science path.