wheels generally seem like they’re going backward in films? Or why an inexpensive digital recording sounds harsh and metallic in comparison with the unique sound? Each of those share the identical root trigger — aliasing. It’s one of the crucial elementary ideas in sign processing, and but many of the explanations on the market both oversimplify it (“just use 44.1 kHz and you’ll be fine”) or dump a wall of math with out constructing any instinct behind this.
This text goals at overlaying aliasing from scratch: ranging from the only visible analogy that anybody can perceive, after which going deep into the mathematics of how frequencies fold, why the Nyquist restrict exists, how the DFT mirrors work, and what occurs while you break the foundations. If you happen to work with audio in AI/ML pipelines (suppose MFCC preprocessing, SyncNet, speech fashions), there’s a devoted part in the direction of the top connecting aliasing on to the workflows. However first, allow us to construct the muse for understanding aliasing correctly, consider me it’s very easy to construct the instinct behind this, the mathematics used would simply be a device to justify the instinct.
I’ve spent an excellent period of time working palms on with audio knowledge preprocessing and mannequin coaching, largely coping with speech knowledge. So whereas this text builds every part from first rules, lots of the instinct and sensible observations right here come from really operating into these items in actual pipelines, not simply textbook studying
That is going to be an in depth learn, and it will provide you with a full image of what aliasing is with first-principles pondering, a sensible utility the place we see the results of aliasing, and there will even be deep math for many who take pleasure in seeing equations, in addition to a promise that there can be no AI slop right here; to generate all of the media/photographs which are used for this submit, Gemini Nano Banana Professional was used.
What’s Aliasing?
Aliasing is a selected sort of distortion that occurs once we convert steady analog alerts into digital ones. It happens once we don’t pattern quick sufficient to seize the sign’s true behaviour. The phrase “Alias” actually means a false identify or identification — in audio, a excessive frequency takes on the false identification of a decrease frequency as a result of it wasn’t captured quick sufficient.
This isn’t only a blurry or noisy sound. It really creates fully new, pretend tones that had been by no means a part of the unique recording. For instance, a really excessive sound like 15 kHz can present up as a decrease sound like 5 kHz. A shiny cymbal shimmer can flip right into a uninteresting, muddy rumble. In easy phrases the excessive frequency hides itself and seems as a decrease frequency — that’s why it’s known as an alias, as a result of the sound is pretending to be one thing else
Understanding why this occurs requires understanding how digital programs seize sound within the first place, so let’s begin with probably the most intuitive visible analogy which is the well-known Wagon Wheel Impact.
The Wagon Wheel Impact: Why Quick Spinning Wheels Seem to Rotate Backward on Movie
Earlier than we contact any math or audio waveforms, let’s perceive aliasing visually by the wagon wheel impact, one thing most of us have seen in films.

Think about a automobile wheel spinning ahead very quick. A digital camera information this at a set pace, say 24 frames per second. Between two consecutive frames, the wheel spins virtually a full circle transferring from the 12 o’clock place all the way in which round to 11 o’clock (330° of rotation ahead).
Now right here’s the important thing perception: our mind (and the mathematics) is lazy. It assumes the article took the shortest path. As an alternative of seeing the lengthy journey ahead (330° clockwise), we understand the spoke transferring barely backward from 12 to 11 (simply 30° counter clockwise).
The ahead spinning wheel seems to rotate backward. This backward movement is the alias of the true movement: a false illustration brought on by inadequate sampling (the digital camera’s body fee was too sluggish to seize the precise pace of rotation).
The core precept: simply as a digital camera should shoot quick sufficient to seize a spinning wheel appropriately, a digital audio system should pattern quick sufficient to seize excessive frequency sounds. When it doesn’t, these frequencies tackle a false identification — they alias.
Aliasing in Sound: A Foundational Precept
Whereas the wagon wheel impact is only a cool visible trick in films, in audio it’s a catastrophe.
The quick spinning wheel corresponds to a excessive frequency sound wave, and the digital camera’s body fee corresponds to the audio sampling fee. The analogy maps completely:
- Quick wheel spin → Excessive frequency sound
- Digicam body fee → Audio sampling fee
- Obvious backward rotation → False decrease frequency (the alias)
Excessive frequencies are important for readability in audio — just like the “s” and “t” sounds in speech, or the shimmer of cymbals. If we don’t pattern quick sufficient, these crisp sounds flip into low frequency noise artifacts. A cymbal crash incorporates frequencies as much as 20,000 Hz. If sampled at solely 30,000 Hz, frequencies above 15,000 Hz will alias down — turning shiny, shimmering highs into muddy, unnatural rumbles.
Because of this CD audio makes use of 44,100 Hz as its sampling fee — to securely seize frequencies as much as 22,050 Hz, which covers the complete vary of human listening to with some headroom
For many who are unaware of the Nyquist theorem, some phrases or strains might not make sense proper now, and that’s fully fantastic. When you learn the article until the top, every part will begin to make sense. The Nyquist theorem can be defined later in reference to aliasing.
The Answer: The Nyquist Shannon Sampling Theorem
The rule to forestall aliasing is outlined by the Nyquist Shannon Sampling Theorem, and it’s non negotiable in digital audio.
The sampling frequency (f_s) have to be larger than twice the best frequency current within the sign (f_max). That is expressed as: f_s > 2 × f_max
The “Why” behind the 2x rule: A sound wave is a cycle with a constructive half (peak) and a adverse half (trough). To outline this cycle with out ambiguity, it’s essential to seize a minimum of two samples per cycle — one to file the “up” movement and one to file the “down” movement. Something lower than 2 samples per cycle, and the system can not distinguish between completely different frequencies — they develop into aliases of one another.
The frequency at precisely half the sampling fee is known as the Nyquist frequency: it’s the theoretical most frequency we will seize with out info loss.
For a sampling fee of 44,100 Hz, the Nyquist frequency is 22,050 Hz. For 48,000 Hz, it’s 24,000 Hz. Any frequency above the Nyquist restrict will fold again and seem as a decrease frequency — that’s aliasing
Case Research 1: Undersampling — The 20 kHz / 15 kHz Instance
Let’s see what occurs when the Nyquist rule is damaged with a concrete numerical instance.
Setup: Think about a excessive frequency sound wave at 15,000 Hz (15 kHz). We pattern it with a sampling fee of 20,000 Hz (20 kHz).
The Nyquist frequency right here is 20,000 / 2 = 10,000 Hz. Our sign at 15 kHz is above this restrict: we’re already violating the theory.
The sampling frequency is 20,000 / 15,000 = ~1.33x the sign’s frequency. That is sooner than the sign, however lower than the required 2x fee. Taking only one.33 samples per cycle offers inadequate knowledge. The system tries to reconstruct the wave by connecting these awkwardly spaced dots utilizing the only, “shortest path” attainable — similar to the mind does with the wagon wheel.
The Consequence: The unique 15 kHz tone is misplaced. As an alternative, it’s incorrectly recorded as a brand new, false 5 kHz tone.
The alias frequency is calculated as: |f_signal − f_s| = |15,000 − 20,000| = 5,000 Hz
This 5 kHz tone is the alias — incorrect frequency that was by no means within the unique sound. It’s fully pretend, and as soon as it’s there, it’s everlasting. You can’t filter it out as a result of it now lives at a reliable frequency. That 5 kHz alias is indistinguishable from an actual 5 kHz tone.
Case Research 2: Appropriate Sampling — The >30 kHz Instance
Now let’s see how the Nyquist theorem solves the issue.
Setup: Identical 15 kHz sound wave. To obey the Nyquist theorem, we should pattern at a fee larger than 2 × 15 kHz = 30 kHz. Let’s use the CD customary of 44,100 Hz (44.1 kHz).
A sampling fee of 44.1 kHz offers ~2.94 samples per cycle (44,100 / 15,000), which is effectively above the 2x minimal. That is greater than sufficient info to seize the wave’s defining traits — its peak, trough, and the form in between.
The Consequence: The anomaly is eradicated. There is just one distinctive 15 kHz wave that may match by the captured pattern factors. The “shortest path” now appropriately represents the unique wave, and an correct digital recording is made. No alias, no distortion, no pretend frequencies.
Understanding the Folding Graph
Now that we’ve got the instinct, let’s perceive crucial visualisation in aliasing — the folding graph, that can begin unfolding the mathematical understanding behind aliasing. This graph reveals precisely what occurs to each attainable enter frequency when it will get sampled at a given sampling fee.
What Does This Graph Imply?

Let’s take a concrete instance the place our sampling fee f_s = 1,000 Hz (1 kHz). This implies our Nyquist frequency is f_s / 2 = 500 Hz.
- Unique Frequency (X-axis): The true frequency of the analog sign in the true world — earlier than any sampling happens. That is what the sound or sign really is.
- Reconstructed Frequency (Y-axis): The frequency that seems after sampling: what the digital system thinks the sign is.
In an ideal world, the reconstructed frequency would all the time equal the unique frequency: we’d simply see a straight diagonal line going up without end. However that’s not what occurs.
The Folding Graph: Protected Zone vs Aliasing Zone

This graph tells the entire story of aliasing in a single image. Let’s break it down:
The Diagonal (0 – 500 Hz) The Protected Zone: Within the secure zone, enter frequency equals output frequency completely. A 200 Hz sign reconstructs as 200 Hz, linear, predictable and devoted copy. All the things beneath the Nyquist frequency is captured appropriately.
The Peak (500 Hz) The Nyquist Frequency: That is precisely half the sampling fee. The theoretical most frequency we will seize with out info loss.
The Fold (> 500 Hz) The Aliasing Zone: That is the place issues break. Above the Nyquist frequency, frequencies don’t proceed ascending — they fold again. Larger inputs produce decrease outputs. That is aliasing: the frequency spectrum reflecting like a mirror on the Nyquist boundary, this mirroring idea is vital and have additional utility in plotting frequency area graphs
The graph types a zigzag sample. The frequency goes up linearly to 500 Hz, then folds again all the way down to 0, then again as much as 500, and so forth. Each frequency above Nyquist maps to some frequency beneath Nyquist — making a false identification.
Strolling By way of the Circumstances on the Folding Graph
Let’s stroll by three particular instances on the folding graph with f_s = 1,000 Hz it is going to give crystal clear readability.
Case 1: Capturing f = 500 Hz (On the Nyquist Restrict)

At precisely f_s / 2, we seize one pattern at every peak and one at every trough — the naked minimal to determine that an oscillation exists. That is what “minimum viable sampling” appears to be like like.
The reconstruction types a triangle wave, not a sine wave. We lose waveform constancy, however critically we protect the elemental frequency. The system is aware of a 500 Hz sign is there, however it might probably’t seize its actual form. That is the sting case — technically the sign is captured, however simply barely (excessive case).
On the folding graph, 500 Hz sits proper on the peak. That is the Nyquist boundary — one foot within the secure zone, one foot within the aliasing zone.
Case 2: Capturing f = 1,000 Hz (Sign Equals Sampling Fee)

When enter frequency equals the sampling fee, we take precisely one pattern per wave cycle. Every pattern captures the identical section place, making the sign seem stationary — a flat line at DC (0 Hz).
On the folding graph, hint 1,000 Hz on the x-axis: it maps to 0 Hz on the y-axis. The unique 1 kHz sign has been fully destroyed — it doesn’t simply alias to a improper frequency, it disappears completely into silence.
On the small triangle inset within the diagram, the pink dot at 1 kHz on the x-axis sits proper on the backside (0 Hz) of the folding graph. The sign has been folded all the way in which again to zero.
Case 3: Capturing f = 700 Hz (The Mirror Equation)

That is the case the place correct false sign we’ll see. 700 Hz is above our Nyquist frequency of 500 Hz, so aliasing happens.
The Mirror Equation: The alias frequency is the reflection of the enter throughout the Nyquist frequency (f_alias = f_s − f_input = 1000 − 700 = 300 Hz)
We will additionally give it some thought as: 700 Hz is 200 Hz above Nyquist (500 Hz), so the alias seems 200 Hz beneath.
The diagram on the best reveals this superbly: the unique 700 Hz sign (in grey/blue) is sampled, and the reconstructed sign (in pink) comes out as 300 Hz. The pattern factors are an identical for each frequencies, the digital system can not distinguish between them.
An important property: Discover that 700 + 300 = 1000 = f_s. Any frequency and its alias all the time sum to the sampling fee. They’re equidistant from the Nyquist frequency (500 Hz) — one sits 200 Hz above, the opposite 200 Hz beneath. The Nyquist frequency acts because the axis of symmetry, like a mirror.
Now from right here on this article is the purpose the place we dive deep into aliasing and its utility in Fourier Transforms; individuals who know the fundamentals of DSP principle and Fourier Remodel may have an edge in understanding the applying of aliasing within the frequency area or in Fourier Remodel iIn quick, Fourier Remodel is the mathematical device used to transform uncooked audio in time area to frequency area).
Actual-World Sound: It’s By no means a Single Frequency
All the things we’ve mentioned to date makes use of clear, single frequency sine waves. However real-world audio isn’t that easy.
In keeping with Fourier’s theorem, any complicated sound will be understood as a mixture of many sine waves, every with a special frequency and amplitude. A sound from an instrument, like a piano, consists of:
- The Elementary Frequency: That is the bottom frequency that determines the pitch of the notice we hear (for instance, ~261 Hz for Center C).
- Harmonics (or Overtones): These are a sequence of upper frequency sine waves which are multiples of the elemental. The distinctive mixture and loudness of those harmonics create the sound’s distinctive timbre — because of this a violin enjoying Center C sounds fully completely different from a flute enjoying the identical notice.
The Nyquist Theorem’s Focus: The Highest Frequency
To precisely file a fancy sound, we should seize not simply its elementary pitch however all of the excessive frequency harmonics that give it richness and element.
Subsequently, the Nyquist theorem’s rule is utilized to the only highest frequency current within the sound combination, not the elemental.
Instance: A violin performs a notice with a elementary of 1,000 Hz. Its sound consists of essential harmonics that stretch all the way in which as much as 18,000 Hz. To seize the total, shiny sound of the violin, the sampling fee have to be: f_sampling > 2×18,000 Hz i.e f_sampling >36,000 Hz.
A normal fee like 44,100 Hz is used to securely seize the complete audible frequency vary.
If we selected a sampling fee that solely glad the elemental (say, something above 2,000 Hz) all these harmonics above the Nyquist frequency would fold again and create aliases — the violin would sound distorted, metallic, and unnatural.
Oversampling Decrease Frequencies for Excessive Constancy
A key consequence of this highest frequency rule is that every one decrease frequencies within the sign are massively oversampled, resulting in an especially top quality digital recording.
If a sampling fee is quick sufficient to appropriately seize probably the most speedy vibration, it’s mechanically greater than ample for all slower vibrations.
Instance utilizing a 44,100 Hz sampling fee:
- For the best frequency (e.g 20,000 Hz) we pattern at ~2.2 instances its frequency — safely assembly the Nyquist minimal.
- For a decrease, elementary frequency (e.g 500 Hz) we pattern at ~88 instances its frequency.
This vital oversampling of the elemental and midrange frequencies ensures they’re captured with distinctive precision, leading to a strong digital audio sign. The decrease the frequency relative to the sampling fee, the extra faithfully it’s captured.
The DFT Mirror and Redundancy: Why Half the Spectrum is a Ghost
Now let’s go deeper and perceive aliasing from the angle of the Discrete Fourier Remodel (DFT), which is how we really analyse frequencies in a digital sign. This part is vital for anybody working with FFTs (Quick Fourier Transforms) in observe — whether or not in audio processing, speech evaluation, or ML pipelines.


The Discrete Fourier Remodel produces N complicated coefficients for N enter samples. Because of the math of complicated exponentials, the output is all the time conjugate symmetric for real-valued alerts. This implies: X[k] = X∗[N−k]
The place X[k] is the DFT coefficient at bin okay, and X*[N-k] is the complicated conjugate of the coefficient at bin (N-k).
What this implies virtually:
The Nyquist frequency (precisely f_s / 2) sits at bin index okay = N/2. That is the axis of symmetry (the mirror). okay = N/2 → F(N/2) = sr/2 = Nyquist Frequency.
Bins from N/2+1 to N−1 comprise no new info. They’re simply reflections of bins 1 to N/2−1. The ghost half is a mathematical artifact, not actual frequency content material.
Within the DFT magnitude spectrum diagram above (with f_s = 22,050 Hz as proven), every part to the best of the Nyquist boundary (11,025 Hz) is the redundant mirror: a ghost copy that provides no info. The frequency content material is actual and helpful solely as much as the Nyquist frequency.
In observe, we discard the best half. FFT libraries usually present an rfft (actual FFT) perform that returns solely bins 0 to N/2, halving reminiscence and computation. If you name np.fft.rfft() in Python or any equal, that is precisely what’s occurring — it provides you the helpful half and throws away the ghost.
That is additionally why while you see frequency plots of audio alerts, they usually solely go as much as the Nyquist frequency — as a result of every part above it’s both a mirror of what’s beneath (within the DFT output) or an alias (if the sign wasn’t correctly band restricted earlier than sampling).
Additionally I want to say right here: From my private expertise working with speech knowledge for mannequin coaching — I’ve largely handled human speaking/speech audio, and actually, I didn’t really feel a lot of a distinction between 16 kHz, 24 kHz, and 48 kHz. Sure, as you improve the sampling fee, the speech does develop into a bit extra enhanced, however it’s minute — sufficient to identify a tiny distinction in the event you’re listening fastidiously, however nothing dramatic. For speech, 16 kHz captures just about every part that issues.
Aliasing in AI/ML Audio Pipelines
If you happen to work with audio in machine studying — whether or not it’s speech recognition, speaker verification, lip sync fashions like SyncNet and Wav2Lip, or any audio classification job — aliasing isn’t just a theoretical idea. It immediately impacts the standard of options you extract and due to this fact the efficiency of your mannequin.
MFCC Preprocessing and Aliasing
MFCCs (Mel-Frequency Cepstral Coefficients) are the commonest audio options utilized in ML pipelines. The MFCC pipeline works like this: uncooked audio → pre emphasis → framing → windowing → FFT → Mel filter financial institution → DCT → MFCCs.
The FFT step is the place aliasing issues. In case your enter audio was recorded at a sampling fee that’s too low for its frequency content material, or in the event you downsample the audio earlier than function extraction with out making use of an anti aliasing filter first, these aliased frequencies will present up in your FFT output and pollute your Mel filter financial institution energies. The MFCC options you extract will comprise phantom frequency info that wasn’t within the unique sound — and your mannequin will study from noise.
SyncNet and Audio Preprocessing
Within the SyncNet article that I’ve written earlier than, the audio stream expects 0.2 seconds of audio which works by preprocessing to supply a 13 × 20 MFCC matrix (13 DCT coefficients × 20 time steps at 100 Hz MFCC frequency). This matrix is the enter to the audio CNN stream.
If the audio fed into SyncNet’s pipeline has aliasing results — say, as a result of somebody downsampled from 48 kHz to 16 kHz with out correct filtering — these issues can be embedded within the MFCC options. The audio CNN will then study correlations between these phantom frequencies and the video stream, degrading the mannequin’s skill to precisely measure audio-visual sync.
On issues I’ve labored in audio, I want to write some sensible takeaways beneath.
Sensible Takeaway for ML Engineers
Everytime you’re working with audio in an ML pipeline:
- All the time apply an anti-aliasing filter earlier than downsampling. Libraries like
librosadeal with this internally while you uselibrosa.resample(), however in the event you’re doing guide downsampling (like taking each Nth pattern), you’re introducing aliasing. - Concentrate on the Nyquist frequency at your working sampling fee. If you happen to’re working at 16 kHz (frequent for speech), your Nyquist is 8 kHz — any speech content material above 8 kHz is misplaced or aliased.
- Larger sampling charges aren’t all the time higher for ML, 44.1 kHz recording downsampled correctly to 16 kHz will give cleaner options than a 44.1 kHz recording processed immediately — as a result of the mannequin doesn’t want info above 8 kHz for many speech duties, and the additional frequency bins simply add noise to the function house.
Conclusion
Aliasing is a kind of ideas that sit on the intersection of class and catastrophe. The mathematics behind it’s superbly easy —frequencies fold across the Nyquist boundary like reflections in a mirror, and any frequency above half the sampling fee takes on the false identification of a decrease frequency. However the penalties of not understanding it are harsh — everlasting distortion, phantom frequencies, and corrupted alerts that no quantity of post-processing can repair.
We coated the total image on this article: from the wagon wheel impact as a visible anchor, to the Nyquist Shannon theorem that defines the sampling rule, to the folding graph that reveals precisely how each frequency maps after sampling, to the DFT mirror that explains the symmetry from a mathematical perspective. The thread connecting all of those is identical: sampling is a lossy course of if carried out incorrectly, and aliasing is the particular method wherein that info loss manifests.
Whether or not you’re recording music, processing speech for an ML mannequin, or constructing audio-visual sync programs — understanding aliasing at this depth provides you the muse to make knowledgeable selections about sampling charges, filter design, and have extraction that can immediately influence the standard of your output.
I want to thank Google Nano banana professional to assist me create these inventive artwords that I’ve used within the articles, and grammarly.
Ultimately, Thanks for the persistence, be happy to ping to ask something associated right here:
My Contact Particulars
E-mail – [email protected]
Twitter –
GitHub –
LinkedIn –



