Assumption of Smoothness
Linear and polynomial regression both rest on an assumption that the function we are trying to recover is smooth. Typically, given $n$ sample points, the system is underdetermined (infinitely many curves can exactly fit the points), but smoothness narrows the possibilities down to a single best fit by minimizing the complexity between samples.
The cost of that prior is that we can resolve at most $n$ distinct features from $n$ samples. The model's capacity is bounded by $n$, so it cannot extrapolate more than $n$ features or interpolate more than $n-1$ bends between samples.
For most real datasets that bound is beneficial. The underlying behavior bends gradually and a low-order curve captures it. This however breaks down for anything intrinsically periodic. An idealized checkerboard alternates light and dark squares in every direction without end; sampling it along any line produces a sequence of values that flip sign infinitely often. No polynomial of finite degree captures that behavior.
To represent periodic structure faithfully we need a basis whose elements are themselves periodic, and so we can shift the view from polynomial regression to Fourier analysis.
Representing Periodicity
The simplest non-decaying periodic function on the real line is $\sin(2\pi f t)$, where $f$ is frequency and $t$ is time/position. Any reasonable periodic function can be approximated by a sum of sines and cosines at integer-multiple frequencies. For a function $g(t)$ with period $T$,
and as $K$ grows the partial sum converges to $g$ at every point where $g$ is continuous. The sines and cosines at frequencies $k/T$ form a complete basis for periodic functions on $[0, T]$.
In practice we package the sine and cosine of the same frequency into a single complex exponential, $e^{i 2\pi f t} = \cos(2\pi f t) + i \sin(2\pi f t)$. The amplitude and phase of the wave end up encoded in the magnitude and argument of one complex coefficient. This gives:
For an $N \times N$ image, the natural frequencies (cycles per image) along each axis are the whole numbers up to $N/2$. This is because the highest frequency wave that can be represented on an image is one that can complete a full cycle within two pixels (ie. alternating between bright and dark every pixel). Anything faster does not have enough room to show its shape between pixels. This upper limit is called the Nyquist frequency. The lower limit is zero, which represents a constant value with no oscillation.
The Discrete Fourier Transform
The 2D Discrete Fourier Transform turns an $N \times N$ image $f[x, y]$, where $x$ and $y$ are pixel coordinates, into an $N \times N$ matrix of complex weights. $f[x, y]$ gives the individual pixel value at coordinates $x$ and $y$.
Each $F[u, v]$ tells you how much of the wave $e^{i 2\pi(ux + vy)/N}$ is present, and its squared magnitude $|F[u, v]|^2$, called the spectral energy at frequency $(u, v)$, measures how much of the image's total energy sits in that one wave.
The inverse transform rebuilds the image by summing the waves back together:
This is a change of basis. The image still has $N^2$ degrees of freedom; we have just rewritten them as frequency coefficients instead of pixel intensities. The reason to bother is that the new representation makes structure explicit. Smooth images concentrate energy in a few low-frequency coefficients, while sharp images spread energy across many high-frequency ones. Keeping only the lowest frequencies (those inside a circular cutoff radius $R$) yields a smooth approximation; keeping all of them is exact.
There is an additional property buried in the basis. The exponentials $e^{i 2\pi(ux + vy)/N}$ are themselves periodic with period $N$ in both directions, so any image reconstructed from a Fourier expansion inherits that periodicity, effectively treating the image as one tile of an infinite wallpaper:
If the image's edges happen to match (it fades to zero on every side, say), the wallpaper is seamless. If they do not, the tiling injects step discontinuities at every boundary, those edges show up as high-frequency energy in $F$, and low-pass filtering them out produces the ringing we will see in the demo below. The clearest example is the Quadrant preset: a single image that is white in the top-left and bottom-right quadrants and black in the others.
Other Image Reconstruction Techniques
Sine waves describe periodic structure, but most images are not periodic. The three bases below each turn the same $N^2$ pixels into $N^2$ coefficients, but each one is most efficient on a different kind of structure: Legendre on global polynomial smoothness, Chebyshev on the same smoothness with boundary-weighted error, and Haar on local edge structure at multiple scales.
Legendre polynomials $P_k(t)$ form an orthogonal basis on $[-1, 1]$ under the standard (unweighted) inner product. A degree-$d$ separable Legendre expansion gives the best polynomial approximation to the image, in the squared-error sense, on the unit square. They are global polynomials, so every coefficient depends on every pixel. Inside the domain they fit smoothly, and they make no assumption that the image tiles periodically. Outside the domain they diverge: high-degree fits suffer from Runge's phenomenon, where small errors near the boundary amplify into large excursions just past it.
Chebyshev polynomials $T_k(t)$ are also orthogonal on $[-1, 1]$, but under the weighted inner product:
The weighting concentrates information near the endpoints, which gives Chebyshev a useful property called the minimax theorem: among all polynomials of a given degree, the Chebyshev truncation minimizes the largest pointwise error rather than the average squared error. In practice this means Chebyshev spreads error more uniformly across the domain than Legendre. The cost: $T_k(t)$ outside $[-1, 1]$ grows like $\cosh(k \cdot \mathrm{arccosh}\, t)$, exponentially in $k$. Extrapolation past the domain grows even faster than with Legendre.
Haar wavelets are a different idea entirely. Instead of global functions, the Haar basis is built from translated and dilated copies of a single square pulse, so every coefficient depends only on a small localized region. The transform produces a multi-scale pyramid: a coarse approximation, plus detail subbands at each finer scale that capture horizontal, vertical, and diagonal edges. Because the basis is localized, a sharp edge influences only a few coefficients, not the whole spectrum. There is no periodicity assumption either, since the basis lives entirely inside the image and is defined by finite-support filters. The tradeoff is that the basis functions themselves are discontinuous, so coarse reconstructions have a characteristic blocky, staircase appearance.
None of these three bases is periodic. None of them suffer from Gibbs ringing. But each pays its own price: polynomial divergence outside the domain, weighted-inner-product subtleties, or staircase artifacts at coarse scales. The demo below lets you switch between all four bases on the same image so you can see directly what each one is good and bad at.
Interactive Demo
Image decomposition
3D height-map
Reconstruction Formula
Interpreting the visualization
With the Fourier basis, the ringing in a low-pass reconstruction is called the Gibbs phenomenon. Near any jump discontinuity, a truncated Fourier series overshoots the true value by approximately 9% of the jump height, regardless of how many terms are included. The overshoot does not vanish as you add terms; it just gets narrower. In 2D the effect produces concentric rings around every sharp edge.
When the image is not naturally periodic, the DFT also rings around the artificial edges at tile boundaries. The Quadrant preset is the extreme case: its periodic extension is an infinite checkerboard, and most of its spectral energy sits at the corners of frequency space (the $(N/2, N/2)$ components), because the checkerboard oscillates at the Nyquist frequency in both directions simultaneously. Small filter radii cannot reach that energy, so they produce a smooth blob with no checkerboard structure at all. As $r$ approaches 128, the step-like edges begin to emerge; the corner energy needs $r \approx 181$ (the diagonal Nyquist) for exact reconstruction.
The Gaussian preset shows the opposite story. Its smooth fall-off near the edges means the periodic extension has almost no discontinuity, almost no high-frequency energy, and almost no Gibbs ringing. The connection between spatial smoothness and spectral compactness is everywhere in Fourier analysis: smooth images have rapidly decaying spectra, and their low-frequency approximations are correspondingly accurate.
A common remedy when working with non-periodic data is windowing: multiplying the image by a smooth taper (Hann, Tukey, or a Gaussian envelope) before transforming, so the implicit periodic extension is nearly continuous. The cost is a small smearing in the frequency domain, but the dramatic reduction in ringing is usually worth it.
The demo's "Show periodic extension" toggle, and its basis-dependent equivalents, reveal what happens outside the original image domain. Each basis tells a different story.
Fourier extends periodically by construction. The reconstructed image outside $[0, N)^2$ is a verbatim copy of the inside, tiled in a $2 \times 2$ pattern. The red dashed lines mark the tile boundaries.
Legendre extends as whatever polynomial was fit inside. Inside the domain the fit looks reasonable. Just past the boundary, high-degree fits diverge fast under Runge's phenomenon, and the values shoot up or down into the overshoot bands.
Chebyshev extends with $\cosh$ growth, faster still than Legendre. The exponential factor in $k$ means even modestly high degrees produce enormous values just past the boundary.
Haar does not extend at all. The basis is defined inside the image only, so there is nothing to evaluate outside it.
Reconstructions, especially extrapolated ones, can produce pixel values outside the natural 0 to 255 grayscale range. The demo colors these explicitly. Red shading marks pixels above 255 (overshoot), and blue shading marks pixels below 0 (undershoot). Saturation ramps up with the magnitude, so a small overshoot is faint and an extreme one is fully saturated.
Gibbs ringing inside the domain produces overshoot of its own. Truncated Fourier reconstructions of a black-to-white step always overshoot the jump by roughly 9%, so you will see a thin red rim just inside the bright side and a thin blue rim just inside the dark side. These are real reconstruction errors, not just extrapolation artifacts.
In the 3D surface view, out-of-range pixels also tint the surface using the same convention. They can be hidden entirely with the "Hide overshoot regions" toggle, which drops every triangle whose vertices are not all in the in-range band. Hiding is sometimes the only way to read the in-range structure cleanly, because high-degree Legendre or Chebyshev extrapolation can produce magnitudes large enough to stretch triangles across the entire viewport.
The center panel of the demo changes per basis, but every variant answers the same question: where is this basis concentrating the image's energy?
For Fourier, it shows the magnitude spectrum on a log scale, with the DC (zero-frequency) component at the center and increasing frequencies radiating outward. Bright pixels mean strong frequency components. The blue circle marks the low-pass filter radius: frequencies inside the circle are kept, those outside are zeroed. Smooth images concentrate energy near the center; sharp images spread it outward.
For Legendre and Chebyshev, it shows the coefficient matrix $|C[j, k]|$, where $j$ is the degree in $y$ and $k$ is the degree in $x$. The top-left corner is the $(0, 0)$ coefficient (a constant), and degrees grow downward and rightward. The blue rectangle marks the coefficients retained at the current degree cutoff; everything outside is dimmed. Smooth images concentrate energy near the top-left; oscillatory images spread it further out.
For Haar, it shows the wavelet pyramid: the coarsest approximation sits in the top-left, with detail subbands surrounding it in concentric shells. At each level the detail bands capture vertical edges (LH), horizontal edges (HL), and diagonal detail (HH) at that scale. Each subband is normalized independently so detail at every scale remains visible. The blue rectangle marks the coarsest levels retained at the current setting; subbands inside it are used, the rest are zeroed before the inverse transform.
In every case the blue marker traces what the slider keeps and what it discards, and the visual concentration of energy tells you whether the chosen basis is a good or bad match for the current image.
The Legendre and Chebyshev bases are built from one-dimensional polynomials, one family per axis. The 2D building block is a basis product: a polynomial of degree $k$ in the horizontal coordinate times a polynomial of degree $j$ in the vertical coordinate, $P_k(\tilde x)\,P_j(\tilde y)$ for Legendre and $T_k(\tilde x)\,T_j(\tilde y)$ for Chebyshev. The tilde marks the pixel coordinate rescaled to the polynomials' natural domain $[-1, 1]$. Degree $(0, 0)$ is a flat constant; increasing $k$ adds oscillations across the image, increasing $j$ adds them down it. The reconstruction is a weighted sum of these products up to the chosen degree.
The weight $C[j, k]$ is the projection of the image onto one basis product: how strongly the picture resembles that degree-$(j, k)$ pattern. Because the basis is separable, the 2D projection factorizes into a one-dimensional transform along $x$ followed by one along $y$, which is exactly how the demo computes it. Large low-degree weights mean a smooth image that the basis captures with few terms; energy spread to high degrees means the basis is struggling to match sharp features.
The two families differ in the inner product that defines the projection. Legendre polynomials are orthogonal under a uniform weight across $[-1, 1]$, so every position counts equally. Chebyshev polynomials are orthogonal under the weight $1/\sqrt{1 - \tilde x^2}$, which emphasizes the edges of the domain; sampling at midpoint nodes avoids the singularity at $\tilde x = \pm 1$. That edge emphasis is why Chebyshev resists the wild boundary swings (Runge's phenomenon) that high-degree Legendre fits suffer from.
Summary
Every technique on this page rewrites the same image in a different basis, a set of building-block patterns, and keeps only some of the weights. Fourier uses global waves, Legendre and Chebyshev use polynomials, and Haar uses localized wavelets. The transform itself loses nothing; the approximation appears only when the cutoff discards coefficients.
The DFT additionally assumes the image repeats forever, tiling it into an infinite grid. When the image is not naturally periodic, the seams between tiles act like sharp edges: they push energy into the corners of the spectrum and ring around every discontinuity through the Gibbs phenomenon. Smoothness and spectral compactness are two sides of one coin. A smooth image has a rapidly decaying spectrum and rebuilds well from a few terms, while a sharp edge spreads energy across all frequencies.
No single basis is best. A basis suits an image when it concentrates the image's energy into a few large coefficients, which is exactly what the center panel lets you see. The bases also behave very differently outside the original domain: Fourier tiles periodically, polynomials diverge under Runge's phenomenon or $\cosh$ growth, and Haar does not extend at all. Choosing a basis means choosing which structure you can represent cheaply and which artifacts you can tolerate.
Image compression is the familiar case (JPEG uses a cosine transform, JPEG 2000 uses wavelets), but multiplying spectra is convolution, so the FFT also powers fast filtering, audio equalization, and the convolutions inside neural networks. MRI and CT scanners collect their raw data directly in the Fourier domain and invert it to form an image, and X-ray crystallography measures the Fourier magnitudes of a molecule. Wireless standards such as Wi-Fi and 4G/5G carry data on FFT frequency bins (OFDM). Chebyshev and Legendre polynomials underpin spectral methods that solve differential equations to very high accuracy, and wavelets drive multiresolution denoising and edge detection.
Compression, imaging, communication, and numerical computation are all variations on one idea: represent data in a basis whose patterns match its structure, then keep the few coefficients that carry the most information.