You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Sound in the physical world is a smooth, continuously varying pressure wave — an analogue signal that takes infinitely many values over time. A computer can store only discrete binary patterns, so before any sound can be recorded, edited or streamed it must be converted into numbers. That conversion, analogue-to-digital conversion, and its reverse, are governed by two parameters whose meaning and trade-offs sit at the heart of this topic: sample rate and sample resolution (bit depth). This lesson develops the sampling process precisely — including the Nyquist theorem that tells us how fast we must sample — works the sound-file-size calculation end to end, and contrasts sampled audio with the radically different event-based approach of MIDI.
This lesson covers the sound-representation strand of the AQA A-Level Computer Science (7517) Fundamentals of data representation area:
These ideas mirror image sampling and connect to compression and to units of information.
An analogue signal varies continuously: at every instant it has a precise value, and between any two values there are infinitely many intermediate ones. A microphone's electrical output, mirroring air-pressure variations, is analogue. A digital signal is discrete: it is defined only at particular instants and can take only a finite set of values (ultimately binary patterns).
The mismatch is fundamental — a computer cannot store a true continuum — so capturing analogue sound digitally necessarily involves approximation in two dimensions at once:
Understanding sound representation is really understanding these two approximations and the parameters that control them.
There are good reasons to accept this approximation rather than keep sound analogue. A digital representation can be copied perfectly any number of times (each copy is just the same numbers), is robust to noise (a slightly degraded bit pattern can still be read as the correct values, whereas an analogue signal accumulates hiss and distortion with every copy or transmission), can be processed, edited and compressed by software, and can be stored and transmitted on the same digital infrastructure as all other data. Analogue formats (vinyl, magnetic tape) capture the waveform continuously and so can in principle hold infinite detail, but they degrade physically over time and with each copy. The whole edifice of modern audio — streaming, editing, error-checking, compression — rests on first converting sound to numbers, which is why the analogue-to-digital step is so foundational.
Sampling is the process of measuring the amplitude of the analogue signal at regular, fixed time intervals and recording each measurement as a number. Each measurement is a sample.
The sample rate (or sampling frequency) is the number of samples taken per second, measured in hertz (Hz). CD-quality audio uses 44 100 samples per second, written 44.1 kHz. The sample rate and the interval between samples are reciprocals:
sample interval=sample rate1,e.g. 441001≈22.7 μs between samples
A higher sample rate captures more detail of the waveform and therefore reproduces higher frequencies more faithfully — but produces more samples and so a larger file. This is the first of the two central trade-offs.
flowchart LR
A["Continuous analogue<br/>sound wave"] --> B["Sample at regular<br/>intervals (sample rate)"]
B --> C["Quantise each sample to<br/>nearest level (bit depth)"]
C --> D["Store sequence of<br/>binary sample values"]
D --> E["DAC reconstructs an<br/>analogue wave for the speaker"]
How fast is fast enough? The Nyquist theorem (also called the Nyquist–Shannon sampling theorem) gives the precise answer:
To capture a signal faithfully, the sample rate must be at least twice the highest frequency present in the signal.
sample rate≥2×fmax
The threshold 2fmax is the Nyquist rate, and half the sample rate is the Nyquist frequency — the highest frequency a given sample rate can faithfully represent.
This explains the 44.1 kHz CD standard exactly. Human hearing extends to roughly 20 kHz, so to capture the full audible range we need a sample rate of at least 2×20,000=40,000 Hz; 44.1 kHz sits just above this with a margin for filtering.
If a signal does contain frequencies above the Nyquist frequency and we sample too slowly, those high frequencies are not simply lost — they masquerade as lower frequencies that were never there, a corruption called aliasing. (This is the same effect that makes a fast-spinning wheel appear to rotate slowly or backwards on film, where the frame rate is the "sample rate".) To prevent it, real ADCs apply a low-pass anti-aliasing filter to remove frequencies above the Nyquist frequency before sampling.
Exam Tip: State Nyquist as an inequality with the factor of two: sample rate ≥2fmax. A common application question gives the highest frequency and asks for the minimum sample rate (double it), or gives the sample rate and asks for the highest faithfully captured frequency (halve it). Mention aliasing as the consequence of breaching it for the evaluation marks.
The sample resolution — also bit depth — is the number of bits used to store each sample. Because n bits encode 2n patterns, the bit depth fixes how many distinct amplitude levels a sample can take:
number of amplitude levels=2(bit depth)
| Bit depth | Amplitude levels | Typical use |
|---|---|---|
| 8 | 28=256 | low-quality / telephony-era audio |
| 16 | 216=65,536 | CD-quality audio |
| 24 | 224≈16.7 million | professional studio recording |
A higher bit depth lets each sample be rounded to a finer amplitude level, reducing the rounding error called quantisation error (the difference between the true analogue amplitude and the nearest available level). With only 256 levels (8-bit), the rounding is coarse and audible as a low-level hiss or graininess called quantisation noise; with 65 536 levels (16-bit) the error is imperceptible for most music. So bit depth controls amplitude accuracy, just as sample rate controls time/frequency accuracy — two independent dials, both costing storage. This is the second central trade-off.
Suppose a sample's true analogue amplitude is 0.734 of full scale and we use a 3-bit depth (only 23=8 levels: 0.000, 0.143, 0.286, …, 1.000 in steps of 71≈0.143). The nearest level to 0.734 is 0.714 (=75), so we store that. The quantisation error is ∣0.734−0.714∣=0.020. Doubling to a 4-bit depth (16 levels, step 151≈0.067) gives a nearest level of 0.733, cutting the error roughly in half. Each extra bit halves the step size and so halves the worst-case quantisation error — the formal reason "more bits = more accurate".
The hardware that performs sampling and quantising is the analogue-to-digital converter (ADC); the reverse is the digital-to-analogue converter (DAC).
Recording (ADC): a microphone converts sound pressure to a continuous voltage; the ADC samples that voltage at the chosen sample rate and quantises each sample to the nearest level expressible in the chosen bit depth, outputting a stream of binary numbers stored in the file. The full recording chain is therefore: sound wave → microphone (pressure → voltage) → anti-aliasing filter (remove frequencies above the Nyquist frequency) → sample → quantise → binary stream. Each stage is a deliberate engineering step, and the two adjustable parameters — sample rate and bit depth — are chosen here, fixing the file's fidelity and size before a single byte is stored.
Playback (DAC): the DAC reads the stored numbers and converts each back to a voltage, producing a stepped ("staircase") waveform — because the stored values are discrete, the raw output jumps from one sample level to the next rather than varying smoothly. A low-pass reconstruction filter then smooths these steps into a continuous analogue signal, an amplifier boosts it, and the loudspeaker recreates the pressure wave for the listener. A higher sample rate and bit depth make the reconstructed wave a closer match to the original: more samples mean smaller time-steps in the staircase, and more bits mean finer amplitude-steps, so the smoothed result hugs the original waveform more tightly. The whole point of the reconstruction filter is that, provided the sampling obeyed Nyquist, the original waveform can be recovered faithfully from the samples — the discrete stored data genuinely is enough to rebuild the continuous sound within the captured frequency range.
It is worth stressing that ADC and DAC are exact inverses in role but neither is perfectly lossless in practice: sampling discards everything between sample instants, and quantising rounds every amplitude. The art of audio engineering is choosing parameters generous enough that what is discarded falls below the threshold of human perception, so the reconstructed sound is indistinguishable from the original even though it is not bit-for-bit identical to the analogue source.
The size of an uncompressed sampled-audio file follows directly from its parameters. Every sample takes the same number of bits, samples arrive at a fixed rate for a fixed duration, and stereo or multi-channel audio stores one such stream per channel:
file size (bits)=sample rate×duration×bit depth×channels
Divide by 8 for bytes, then by 1024 per binary-prefix step. (The number of channels is 1 for mono, 2 for stereo.)
Sample rate 44 100 Hz, duration 10 s, bit depth 16, channels 1.
Step 1 — total samples: 44,100×10=441,000 samples.
Step 2 — total bits (× bit depth × channels): 441,000×16×1=7,056,000 bits.
Step 3 — bytes (÷8): 7,056,000÷8=882,000 bytes.
Step 4 — KiB (÷1024): 882,000÷1024≈861.3 KiB.
Sample rate 44 100 Hz, duration 3×60=180 s, bit depth 16, channels 2.
Step 1 — bits:
44,100×180×16×2=254,016,000 bits
Step 2 — bytes (÷8):
254,016,000÷8=31,752,000 bytes
Step 3 — mebibytes (÷ 220=1,048,576):
1,048,57631,752,000≈30.3 MiB
So a single uncompressed three-minute stereo track is about 30 MiB — which is precisely why audio is normally stored compressed (MP3, AAC), the subject of the compression lesson.
Exam Tip: Convert minutes to seconds first (a frequent slip), and remember to multiply by the number of channels — stereo doubles the size. As with images, the product is in bits, so divide by 8 for bytes. Show every factor in the formula explicitly; the method marks are for rate × duration × depth × channels.
A 1 MiB budget must hold 30 seconds of mono 16-bit audio — what sample rate is possible? Rearrange:
1 MiB=8×220=8,388,608 bits sample rate=30×16×18,388,608≈17,476 Hz
By Nyquist that captures frequencies up to about 8.7 kHz — adequate for speech but not full-range music, neatly tying the size formula back to the Nyquist limit.
For streaming and transmission it is often more useful to express the demand as a bit rate — the number of bits per second — rather than a total size. The bit rate drops the duration from the formula, leaving:
bit rate (bits/s)=sample rate×bit depth×channels
For CD-quality stereo audio that is 44,100×16×2=1,411,200 bits per second — about 1.41 Mbit/s. This single figure tells you the bandwidth a network link must sustain to stream the audio uncompressed in real time, and multiplying it by the duration in seconds returns the total file size. It also makes the value of compression vivid: a 128 kbit/s MP3 stream carries the "same" music at roughly one-eleventh of the raw bit rate, which is exactly why compressed formats made streaming over typical home connections practical. Expressing audio as a bit rate connects the representation directly to the networking topic, where every link has a finite capacity measured in bits per second.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.