You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
Sound is a continuous, analogue phenomenon, yet computers store only discrete bits. This lesson explains how analogue-to-digital conversion (ADC) turns a sound wave into numbers through sampling, how sampling rate and sample resolution (bit depth) control fidelity, what the Nyquist idea tells us about the rate we must choose, how to calculate an audio file's size, and how a DAC turns the numbers back into sound.
This lesson addresses the H446 1.4.1 Data Types content on representing sound:
(This is a paraphrase of the specification content, not a verbatim quotation.)
Sound is a continuous wave of pressure variation in the air, which a microphone converts into a continuously varying voltage — an analogue signal that can take any value at any instant. A computer, however, can only store a finite set of discrete numbers, so the analogue signal must be digitised: measured at particular moments and rounded to particular values.
| Aspect | Analogue | Digital |
|---|---|---|
| Signal | Continuous in both time and value | Discrete in both (sampled and quantised) |
| Fidelity | The original itself | An approximation of the original |
| Storage | Physical medium (vinyl groove, tape magnetism) | Binary numbers |
| Copying | Each copy adds noise/degrades | Bit-for-bit perfect copies |
Digitising loses information in principle — the smooth wave is replaced by a staircase of samples — but if the rate and resolution are high enough, the approximation is indistinguishable to the human ear, while gaining the durability and perfect copyability of digital data.
The word staircase is worth holding onto. A digital recording is literally a sequence of flat steps: it holds each sampled value constant until the next sample. The two ways that staircase departs from the smooth original map exactly onto the two parameters — the width of each step is set by the sampling rate (more samples ⇒ narrower steps ⇒ better in time), and the height resolution of each step is set by the bit depth (more bits ⇒ finer steps ⇒ better in value). Improving fidelity therefore always means narrowing the steps in one or both directions, at the cost of more data.
The pay-off for accepting that approximation is substantial and examinable:
| Advantage of digital | Why |
|---|---|
| Perfect copies | Bits can be duplicated exactly; an analogue copy (e.g. tape-to-tape) adds noise each generation |
| Robust transmission | Digital signals can be regenerated cleanly; an analogue signal accumulates noise |
| Easy editing | Samples are just numbers — cut, mix, apply effects, undo |
| Compression possible | Redundant or inaudible data can be removed (the next lesson) |
| Durable storage | No physical wear like a vinyl groove or a magnetised tape |
Sampling is measuring the amplitude (instantaneous height, i.e. loudness) of the analogue wave at regular time intervals and recording each measurement as a binary number. The full capture pipeline is:
flowchart LR
A["Sound wave<br/>(air pressure)"] --> B["Microphone<br/>(wave to voltage)"]
B --> C["Sample at fixed<br/>intervals"]
C --> D["Quantise to nearest<br/>level (round)"]
D --> E["Store each sample<br/>as binary"]
So digitisation is really two roundings, and you must keep them distinct:
| Parameter | Definition | Effect |
|---|---|---|
| Sampling rate | Samples taken per second (hertz, Hz) | Higher captures higher frequencies more faithfully |
| Sample resolution (bit depth) | Bits used to store each sample | Higher gives more amplitude levels, less quantisation noise |
| Duration | Length of the audio in seconds | Longer means more samples, larger file |
| Channels | 1 = mono, 2 = stereo | More channels multiply the data |
To see the two roundings in action, suppose a 3-bit ADC offers 23=8 amplitude levels, numbered 0–7, evenly spaced across the input voltage range. At five successive sample instants the analogue wave has these true heights (on the same 0–7 scale), and each is rounded to the nearest level:
| Sample # | True value | Nearest level (stored) | Quantisation error |
|---|---|---|---|
| 1 | 2.3 | 2 | −0.3 |
| 2 | 4.8 | 5 | +0.2 |
| 3 | 6.6 | 7 | +0.4 |
| 4 | 3.1 | 3 | −0.1 |
| 5 | 0.4 | 0 | −0.4 |
The stored sequence is 2,5,7,3,0, written in binary as 010 101 111 011 000. The quantisation error column is the price of rounding: every stored value is up to half a level away from the truth. Add a fourth bit (24=16 levels) and the steps halve, so the worst-case error halves too — directly less quantisation noise. This is the concrete meaning of "more bits give a more faithful amplitude".
The sampling rate is how often per second the amplitude is measured, in hertz. A higher rate places sample points closer together in time, so rapid changes in the wave — which correspond to high-frequency (high-pitched) sounds — are captured rather than missed.
| Sampling rate | Quality | Typical use |
|---|---|---|
| 8{,}000 Hz (8 kHz) | Telephone | Voice calls |
| 22{,}050 Hz | AM-radio | Low-quality audio |
| 44{,}100 Hz (44.1 kHz) | CD | Music CDs |
| 48{,}000 Hz | Broadcast/DVD | Video production |
| 96{,}000 Hz | Studio | Professional recording |
Picture a wave that rises and falls quickly. If samples are taken far apart in time, the few points captured can be joined into many different curves — including a wrong, slower one — so the rapid wiggles are lost or misread. Taking samples closer together pins the curve down at more points, so the reconstruction follows the original faithfully. The number of samples grows in direct proportion to the rate:
samples=sampling rate×duration (s)
So a 5-second clip at 44,100 Hz contains 44,100×5=220,500 samples, whereas the same 5 seconds at the telephone rate of 8,000 Hz contains only 8,000×5=40,000 samples — roughly five times fewer points describing the same wave, which is why phone-quality audio sounds dull and band-limited compared with CD.
Sample resolution, also called bit depth, is the number of bits used to store each sample. Because each bit doubles the number of distinct values, the number of amplitude levels follows the familiar power of two:
amplitude levels=2bfor a bit depth of b bits
| Bit depth | Levels =2b | Quality |
|---|---|---|
| 8 bits | 28=256 | Low — audible noise |
| 16 bits | 216=65,536 | CD quality |
| 24 bits | 224=16,777,216 | Studio quality |
| 32 bits | 232≈4.3 billion | Professional mastering |
Quantisation is the rounding of each measured amplitude to the nearest of these levels; the gap between the true value and the stored value is the quantisation error. With too few levels the staircase is coarse and the rounding errors are large and audible — a hiss known as quantisation noise. More bits mean finer steps, smaller errors and a smoother, quieter result. A neat way to remember the division of labour: the rate decides how faithfully pitch/frequency is captured, while the bit depth decides how faithfully loudness/amplitude is captured.
Bit depth also fixes the dynamic range — the span between the quietest and loudest sound the format can distinguish. Each extra bit doubles the number of amplitude levels, which means it can resolve a sound half as loud as before, adding roughly 6 dB of range per bit. (The figure comes from 20×log102≈6 dB; you are not expected to derive it, only to know that more bits widen the gap between the softest and loudest representable signal.)
| Bit depth | Levels =2b | Approx. dynamic range |
|---|---|---|
| 8 bits | 256 | ≈8×6=48 dB |
| 16 bits | 65,536 | ≈16×6=96 dB |
| 24 bits | 16,777,216 | ≈24×6=144 dB |
The practical consequence: with only 8 bits the loudest sound is just 256 steps above the quietest, so soft passages sit close to the quantisation noise floor and the hiss is audible. With 16 bits the floor is pushed about 48 dB lower, so a CD can carry both a whisper-quiet intro and a full orchestra without the quiet parts being swamped by noise. So bit depth answers two linked questions — how finely each sample is resolved (quantisation error) and how wide a loud-to-quiet range the recording can span (dynamic range).
How fast must we sample? The Nyquist (Nyquist–Shannon) sampling theorem gives the answer:
sampling rate≥2×highest frequency in the sound
Intuitively, to capture a wave that goes up and down, you must catch at least its peak and its trough each cycle — two samples per cycle. Sample any slower and the reconstructed wave is wrong.
Rearranging the theorem gives an equally useful form — the highest frequency a given sample rate can reproduce is half the rate (this half-the-sample-rate value is called the Nyquist frequency):
highest reproducible frequency=2sampling rate
A voice recorder samples at 8,000 Hz. What is the highest frequency it can faithfully reproduce?
fmax=28,000=4,000 Hz
This is fine for intelligible speech (most of which lies below 3.4 kHz, which is exactly why the telephone network historically used an 8 kHz rate) but it discards the brightness and sibilance of music, whose energy reaches up to 20 kHz. To capture a sound containing a 15 kHz component without aliasing, the rate must satisfy rate≥2×15,000=30,000 Hz, so 8,000 Hz is far too low and that component would alias.
A cymbal produces overtones up to about 18,000 Hz. The minimum sampling rate to capture it faithfully is:
rate≥2×18,000=36,000 Hz
so a 44,100 Hz CD rate (giving a Nyquist frequency of 22,050 Hz) comfortably captures it, whereas a 22,050 Hz rate (Nyquist frequency 11,025 Hz) would not — the 18 kHz overtones exceed its limit and alias into the audible band as spurious lower tones.
Human hearing reaches roughly 20{,}000 Hz (20 kHz). The Nyquist minimum is therefore:
2×20,000=40,000 Hz
CD audio uses 44,100 Hz — a little above the minimum, giving headroom for the filters that remove frequencies above the hearing range. This is why the apparently odd figure 44.1 kHz is "CD quality". Its Nyquist frequency, 44,100÷2=22,050 Hz, sits just above the limit of human hearing, so every audible frequency is captured with margin to spare.
If the rate is below twice the highest frequency, aliasing occurs: frequencies above the Nyquist limit are mis-captured and reappear as false lower-frequency tones that were never in the original, audible as distortion. To prevent it, an anti-aliasing (low-pass) filter removes too-high frequencies before sampling. Aliasing is the headline penalty for sampling too slowly, and a frequent exam discriminator.
Worked example — a tone aliasing. Suppose a system samples at 8,000 Hz (Nyquist frequency 4,000 Hz) but the input contains a 5,000 Hz tone — above the limit. The tone is undersampled and folds back, appearing as a false tone at 8,000−5,000=3,000 Hz, a pitch that was never present. The listener hears a spurious 3 kHz whistle. The fix is the anti-aliasing filter, which removes everything above 4 kHz before the samples are taken, so nothing can fold back. The everyday visual analogue is the "wagon-wheel effect" in film, where a fast-spinning wheel appears to turn slowly or backwards — the camera's frame rate is below twice the rotation rate, so motion aliases.
The raw size of sampled audio is the product of all four factors:
file size (bits)=sampling rate×bit depth×duration (s)×channels
with mono =1 channel and stereo =2. Convert with ÷8 for bytes, then ÷1024 for KB and again for MB.
A 3-minute stereo track at CD quality (44,100 Hz, 16-bit):
duration=3×60=180 s
bits=44,100×16×180×2=254,016,000 bits
bytes=8254,016,000=31,752,000 bytes
KB=102431,752,000≈31,007.8 KB
MB=102431,007.8≈30.3 MB
About 30 MB for one song — which is exactly why music is normally stored compressed (MP3/AAC), foreshadowing the next lesson.
A 10-second mono voice memo sampled at 8,000 Hz with 8-bit resolution:
bits=8,000×8×10×1=640,000 bits
bytes=8640,000=80,000 bytes≈78.1 KB
The contrast — 30 MB versus 78 KB — shows how dramatically rate, depth and channels drive size.
Subscribe to continue reading
Get full access to this lesson and all 10 lessons in this course.