Representing Sound

Sound in the physical world is a smooth, continuously varying pressure wave — an analogue signal that takes infinitely many values over time. A computer can store only discrete binary patterns, so before any sound can be recorded, edited or streamed it must be converted into numbers. That conversion, analogue-to-digital conversion, and its reverse, are governed by two parameters whose meaning and trade-offs sit at the heart of this topic: sample rate and sample resolution (bit depth). This lesson develops the sampling process precisely — including the Nyquist theorem that tells us how fast we must sample — works the sound-file-size calculation end to end, and contrasts sampled audio with the radically different event-based approach of MIDI.

Spec Mapping

This lesson covers the sound-representation strand of the AQA A-Level Computer Science (7517) Fundamentals of data representation area:

The difference between analogue and digital signals.
Sampling — measuring a continuous signal at regular intervals — and the sample rate (in hertz).
The Nyquist theorem — the relationship between sample rate and the highest frequency that can be faithfully captured.
Sample resolution / bit depth — the number of bits per sample and its effect on accuracy.
The analogue-to-digital (ADC) and digital-to-analogue (DAC) conversion processes.
The sound-file-size calculation (rate × duration × bit depth × channels).
MIDI — an event-based representation, and its advantages over sampled audio.

These ideas mirror image sampling and connect to compression and to units of information.

Analogue versus Digital

An analogue signal varies continuously: at every instant it has a precise value, and between any two values there are infinitely many intermediate ones. A microphone's electrical output, mirroring air-pressure variations, is analogue. A digital signal is discrete: it is defined only at particular instants and can take only a finite set of values (ultimately binary patterns).

The mismatch is fundamental — a computer cannot store a true continuum — so capturing analogue sound digitally necessarily involves approximation in two dimensions at once:

Time is made discrete by sampling — taking measurements only at regular intervals rather than continuously.
Amplitude is made discrete by quantising — rounding each measurement to the nearest of a finite set of levels.

Understanding sound representation is really understanding these two approximations and the parameters that control them.

There are good reasons to accept this approximation rather than keep sound analogue. A digital representation can be copied perfectly any number of times (each copy is just the same numbers), is robust to noise (a slightly degraded bit pattern can still be read as the correct values, whereas an analogue signal accumulates hiss and distortion with every copy or transmission), can be processed, edited and compressed by software, and can be stored and transmitted on the same digital infrastructure as all other data. Analogue formats (vinyl, magnetic tape) capture the waveform continuously and so can in principle hold infinite detail, but they degrade physically over time and with each copy. The whole edifice of modern audio — streaming, editing, error-checking, compression — rests on first converting sound to numbers, which is why the analogue-to-digital step is so foundational.

Sampling and Sample Rate

Sampling is the process of measuring the amplitude of the analogue signal at regular, fixed time intervals and recording each measurement as a number. Each measurement is a sample.

The sample rate (or sampling frequency) is the number of samples taken per second, measured in hertz (Hz). CD-quality audio uses 44 100 samples per second, written 44.1 kHz. The sample rate and the interval between samples are reciprocals:

$\text{sample interval} = \frac{1}{\text{sample rate}}, \qquad \text{e.g. } \frac{1}{44100} \approx 22.7\ \mu s \text{ between samples}$

A higher sample rate captures more detail of the waveform and therefore reproduces higher frequencies more faithfully — but produces more samples and so a larger file. This is the first of the two central trade-offs.

flowchart LR
  A["Continuous analogue<br/>sound wave"] --> B["Sample at regular<br/>intervals (sample rate)"]
  B --> C["Quantise each sample to<br/>nearest level (bit depth)"]
  C --> D["Store sequence of<br/>binary sample values"]
  D --> E["DAC reconstructs an<br/>analogue wave for the speaker"]

The Nyquist theorem

How fast is fast enough? The Nyquist theorem (also called the Nyquist–Shannon sampling theorem) gives the precise answer:

To capture a signal faithfully, the sample rate must be at least twice the highest frequency present in the signal.

$\text{sample rate} \geq 2 \times f_{\text{max}}$

The threshold $2 f_{\max}$ is the Nyquist rate, and half the sample rate is the Nyquist frequency — the highest frequency a given sample rate can faithfully represent.

This explains the 44.1 kHz CD standard exactly. Human hearing extends to roughly 20 kHz, so to capture the full audible range we need a sample rate of at least $2 \times 20{,}000 = 40{,}000$ Hz; 44.1 kHz sits just above this with a margin for filtering.

If a signal does contain frequencies above the Nyquist frequency and we sample too slowly, those high frequencies are not simply lost — they masquerade as lower frequencies that were never there, a corruption called aliasing. (This is the same effect that makes a fast-spinning wheel appear to rotate slowly or backwards on film, where the frame rate is the "sample rate".) To prevent it, real ADCs apply a low-pass anti-aliasing filter to remove frequencies above the Nyquist frequency before sampling.

Exam Tip: State Nyquist as an inequality with the factor of two: sample rate $\geq 2 f_{\max}$ . A common application question gives the highest frequency and asks for the minimum sample rate (double it), or gives the sample rate and asks for the highest faithfully captured frequency (halve it). Mention aliasing as the consequence of breaching it for the evaluation marks.

Sample Resolution (Bit Depth)

The sample resolution — also bit depth — is the number of bits used to store each sample. Because $n$ bits encode $2^n$ patterns, the bit depth fixes how many distinct amplitude levels a sample can take:

$\text{number of amplitude levels} = 2^{\,(\text{bit depth})}$

Bit depth	Amplitude levels	Typical use
8	$2^8 = 256$	low-quality / telephony-era audio
16	$2^{16} = 65{,}536$	CD-quality audio
24	$2^{24} \approx 16.7$ million	professional studio recording

A higher bit depth lets each sample be rounded to a finer amplitude level, reducing the rounding error called quantisation error (the difference between the true analogue amplitude and the nearest available level). With only 256 levels (8-bit), the rounding is coarse and audible as a low-level hiss or graininess called quantisation noise; with 65 536 levels (16-bit) the error is imperceptible for most music. So bit depth controls amplitude accuracy, just as sample rate controls time/frequency accuracy — two independent dials, both costing storage. This is the second central trade-off.

Worked illustration: quantisation error

Suppose a sample's true analogue amplitude is 0.734 of full scale and we use a 3-bit depth (only $2^3 = 8$ levels: 0.000, 0.143, 0.286, …, 1.000 in steps of $\tfrac17 \approx 0.143$ ). The nearest level to 0.734 is 0.714 ( $= \tfrac57$ ), so we store that. The quantisation error is $|0.734 - 0.714| = 0.020$ . Doubling to a 4-bit depth (16 levels, step $\tfrac{1}{15} \approx 0.067$ ) gives a nearest level of 0.733, cutting the error roughly in half. Each extra bit halves the step size and so halves the worst-case quantisation error — the formal reason "more bits = more accurate".

The ADC and DAC Processes

The hardware that performs sampling and quantising is the analogue-to-digital converter (ADC); the reverse is the digital-to-analogue converter (DAC).

Recording (ADC): a microphone converts sound pressure to a continuous voltage; the ADC samples that voltage at the chosen sample rate and quantises each sample to the nearest level expressible in the chosen bit depth, outputting a stream of binary numbers stored in the file. The full recording chain is therefore: sound wave → microphone (pressure → voltage) → anti-aliasing filter (remove frequencies above the Nyquist frequency) → sample → quantise → binary stream. Each stage is a deliberate engineering step, and the two adjustable parameters — sample rate and bit depth — are chosen here, fixing the file's fidelity and size before a single byte is stored.

Playback (DAC): the DAC reads the stored numbers and converts each back to a voltage, producing a stepped ("staircase") waveform — because the stored values are discrete, the raw output jumps from one sample level to the next rather than varying smoothly. A low-pass reconstruction filter then smooths these steps into a continuous analogue signal, an amplifier boosts it, and the loudspeaker recreates the pressure wave for the listener. A higher sample rate and bit depth make the reconstructed wave a closer match to the original: more samples mean smaller time-steps in the staircase, and more bits mean finer amplitude-steps, so the smoothed result hugs the original waveform more tightly. The whole point of the reconstruction filter is that, provided the sampling obeyed Nyquist, the original waveform can be recovered faithfully from the samples — the discrete stored data genuinely is enough to rebuild the continuous sound within the captured frequency range.

It is worth stressing that ADC and DAC are exact inverses in role but neither is perfectly lossless in practice: sampling discards everything between sample instants, and quantising rounds every amplitude. The art of audio engineering is choosing parameters generous enough that what is discarded falls below the threshold of human perception, so the reconstructed sound is indistinguishable from the original even though it is not bit-for-bit identical to the analogue source.

The Sound-File-Size Calculation

The size of an uncompressed sampled-audio file follows directly from its parameters. Every sample takes the same number of bits, samples arrive at a fixed rate for a fixed duration, and stereo or multi-channel audio stores one such stream per channel:

$\text{file size (bits)} = \text{sample rate} \times \text{duration} \times \text{bit depth} \times \text{channels}$

Divide by 8 for bytes, then by 1024 per binary-prefix step. (The number of channels is 1 for mono, 2 for stereo.)

Worked example 1 — 10 seconds of mono CD-quality audio

Sample rate 44 100 Hz, duration 10 s, bit depth 16, channels 1.

Step 1 — total samples: $44{,}100 \times 10 = 441{,}000$ samples.

Step 2 — total bits (× bit depth × channels): $441{,}000 \times 16 \times 1 = 7{,}056{,}000$ bits.

Step 3 — bytes (÷8): $7{,}056{,}000 \div 8 = 882{,}000$ bytes.

Step 4 — KiB (÷1024): $882{,}000 \div 1024 \approx 861.3$ KiB.

Worked example 2 — 3 minutes of stereo CD audio (full method)

Sample rate 44 100 Hz, duration $3 \times 60 = 180$ s, bit depth 16, channels 2.

Step 1 — bits:

$44{,}100 \times 180 \times 16 \times 2 = 254{,}016{,}000 \text{ bits}$

Step 2 — bytes (÷8):

$254{,}016{,}000 \div 8 = 31{,}752{,}000 \text{ bytes}$

Step 3 — mebibytes (÷ $2^{20} = 1{,}048{,}576$ ):

$\frac{31{,}752{,}000}{1{,}048{,}576} \approx 30.3 \text{ MiB}$

So a single uncompressed three-minute stereo track is about 30 MiB — which is precisely why audio is normally stored compressed (MP3, AAC), the subject of the compression lesson.

Exam Tip: Convert minutes to seconds first (a frequent slip), and remember to multiply by the number of channels — stereo doubles the size. As with images, the product is in bits, so divide by 8 for bytes. Show every factor in the formula explicitly; the method marks are for rate × duration × depth × channels.

Working backwards

A 1 MiB budget must hold 30 seconds of mono 16-bit audio — what sample rate is possible? Rearrange:

$1 \text{ MiB} = 8 \times 2^{20} = 8{,}388{,}608 \text{ bits}$ $\text{sample rate} = \frac{8{,}388{,}608}{30 \times 16 \times 1} \approx 17{,}476 \text{ Hz}$

By Nyquist that captures frequencies up to about 8.7 kHz — adequate for speech but not full-range music, neatly tying the size formula back to the Nyquist limit.

Bit rate: size as a flow

For streaming and transmission it is often more useful to express the demand as a bit rate — the number of bits per second — rather than a total size. The bit rate drops the duration from the formula, leaving:

$\text{bit rate (bits/s)} = \text{sample rate} \times \text{bit depth} \times \text{channels}$

For CD-quality stereo audio that is $44{,}100 \times 16 \times 2 = 1{,}411{,}200$ bits per second — about 1.41 Mbit/s. This single figure tells you the bandwidth a network link must sustain to stream the audio uncompressed in real time, and multiplying it by the duration in seconds returns the total file size. It also makes the value of compression vivid: a 128 kbit/s MP3 stream carries the "same" music at roughly one-eleventh of the raw bit rate, which is exactly why compressed formats made streaming over typical home connections practical. Expressing audio as a bit rate connects the representation directly to the networking topic, where every link has a finite capacity measured in bits per second.

Representing Sound

Representing Sound

Spec Mapping

Analogue versus Digital

Sampling and Sample Rate

The Nyquist theorem

Sample Resolution (Bit Depth)

Worked illustration: quantisation error

The ADC and DAC Processes

The Sound-File-Size Calculation

Worked example 1 — 10 seconds of mono CD-quality audio

Worked example 2 — 3 minutes of stereo CD audio (full method)

Working backwards

Bit rate: size as a flow

The Sample-Rate Trade-off Made Concrete

More in Computer Science