Assumption of Smoothness

Linear and polynomial regression both rest on an assumption that the function we are trying to recover is smooth. Typically, given $n$ sample points, the system is underdetermined (infinitely many curves can exactly fit the points), but smoothness narrows the possibilities down to a single best fit by minimizing the complexity between samples.

The cost of that prior is that we can resolve at most $n$ distinct features from $n$ samples. The model's capacity is bounded by $n$, so it cannot extrapolate more than $n$ features or interpolate more than $n-1$ bends between samples.

For most real datasets that bound is beneficial. The underlying behavior bends gradually and a low-order curve captures it. This however breaks down for anything intrinsically periodic. An idealized checkerboard alternates light and dark squares in every direction without end; sampling it along any line produces a sequence of values that flip sign infinitely often. No polynomial of finite degree captures that behavior.

To represent periodic structure faithfully we need a basis whose elements are themselves periodic, and so we can shift the view from polynomial regression to Fourier analysis.

Representing Periodicity

The simplest non-decaying periodic function on the real line is $\sin(2\pi f t)$, where $f$ is frequency and $t$ is time/position. Any reasonable periodic function can be approximated by a sum of sines and cosines at integer-multiple frequencies. For a function $g(t)$ with period $T$,

$$g(t) \approx \sum_{k=0}^{K} a_k \cos(2\pi k t / T) + b_k \sin(2\pi k t / T)$$

and as $K$ grows the partial sum converges to $g$ at every point where $g$ is continuous. The sines and cosines at frequencies $k/T$ form a complete basis for periodic functions on $[0, T]$.

In practice we package the sine and cosine of the same frequency into a single complex exponential, $e^{i 2\pi f t} = \cos(2\pi f t) + i \sin(2\pi f t)$. The amplitude and phase of the wave end up encoded in the magnitude and argument of one complex coefficient. This gives:

$$g(t) \approx \sum_{k=0}^{K} c_k e^{i 2\pi k t / T}$$

For an $N \times N$ image, the natural frequencies (cycles per image) along each axis are the whole numbers up to $N/2$. This is because the highest frequency wave that can be represented on an image is one that can complete a full cycle within two pixels (ie. alternating between bright and dark every pixel). Anything faster does not have enough room to show its shape between pixels. This upper limit is called the Nyquist frequency. The lower limit is zero, which represents a constant value with no oscillation.

The Discrete Fourier Transform

The 2D Discrete Fourier Transform turns an $N \times N$ image $f[x, y]$, where $x$ and $y$ are pixel coordinates, into an $N \times N$ matrix of complex weights. $f[x, y]$ gives the individual pixel value at coordinates $x$ and $y$.

$$F[u, v] = \sum_{x=0}^{N-1} \sum_{y=0}^{N-1} f[x, y]\,e^{-i 2\pi(ux + vy)/N}$$
Note
The summations span $0$ to $N-1$ instead of $0$ to $N/2$. Indices $0$ to $N/2$ are those natural frequencies; indices $N/2$ to $N-1$ are their complex conjugates. For a real image, the conjugate half is redundant, carrying the same magnitude information as the positive half, but it is still required in the reconstruction sum so the imaginary parts cancel out and the rebuilt image comes out real-valued.

Each $F[u, v]$ tells you how much of the wave $e^{i 2\pi(ux + vy)/N}$ is present, and its squared magnitude $|F[u, v]|^2$, called the spectral energy at frequency $(u, v)$, measures how much of the image's total energy sits in that one wave.

The inverse transform rebuilds the image by summing the waves back together:

$$f[x, y] = \frac{1}{N^2}\sum_{u=0}^{N-1} \sum_{v=0}^{N-1} F[u, v]\,e^{\,i 2\pi(ux + vy)/N}$$

This is a change of basis. The image still has $N^2$ degrees of freedom; we have just rewritten them as frequency coefficients instead of pixel intensities. The reason to bother is that the new representation makes structure explicit. Smooth images concentrate energy in a few low-frequency coefficients, while sharp images spread energy across many high-frequency ones. Keeping only the lowest frequencies (those inside a circular cutoff radius $R$) yields a smooth approximation; keeping all of them is exact.

There is an additional property buried in the basis. The exponentials $e^{i 2\pi(ux + vy)/N}$ are themselves periodic with period $N$ in both directions, so any image reconstructed from a Fourier expansion inherits that periodicity, effectively treating the image as one tile of an infinite wallpaper:

$$f_{\text{ext}}[x, y] = f\bigl[x \bmod N,\; y \bmod N\bigr]$$

If the image's edges happen to match (it fades to zero on every side, say), the wallpaper is seamless. If they do not, the tiling injects step discontinuities at every boundary, those edges show up as high-frequency energy in $F$, and low-pass filtering them out produces the ringing we will see in the demo below. The clearest example is the Quadrant preset: a single image that is white in the top-left and bottom-right quadrants and black in the others.

Other Image Reconstruction Techniques

Sine waves describe periodic structure, but most images are not periodic. The three bases below each turn the same $N^2$ pixels into $N^2$ coefficients, but each one is most efficient on a different kind of structure: Legendre on global polynomial smoothness, Chebyshev on the same smoothness with boundary-weighted error, and Haar on local edge structure at multiple scales.

Legendre polynomials $P_k(t)$ form an orthogonal basis on $[-1, 1]$ under the standard (unweighted) inner product. A degree-$d$ separable Legendre expansion gives the best polynomial approximation to the image, in the squared-error sense, on the unit square. They are global polynomials, so every coefficient depends on every pixel. Inside the domain they fit smoothly, and they make no assumption that the image tiles periodically. Outside the domain they diverge: high-degree fits suffer from Runge's phenomenon, where small errors near the boundary amplify into large excursions just past it.

Chebyshev polynomials $T_k(t)$ are also orthogonal on $[-1, 1]$, but under the weighted inner product:

$$\langle f,\, g \rangle_w = \int_{-1}^{1} \frac{f(t)\,g(t)}{\sqrt{1 - t^2}}\, dt$$

The weighting concentrates information near the endpoints, which gives Chebyshev a useful property called the minimax theorem: among all polynomials of a given degree, the Chebyshev truncation minimizes the largest pointwise error rather than the average squared error. In practice this means Chebyshev spreads error more uniformly across the domain than Legendre. The cost: $T_k(t)$ outside $[-1, 1]$ grows like $\cosh(k \cdot \mathrm{arccosh}\, t)$, exponentially in $k$. Extrapolation past the domain grows even faster than with Legendre.

Haar wavelets are a different idea entirely. Instead of global functions, the Haar basis is built from translated and dilated copies of a single square pulse, so every coefficient depends only on a small localized region. The transform produces a multi-scale pyramid: a coarse approximation, plus detail subbands at each finer scale that capture horizontal, vertical, and diagonal edges. Because the basis is localized, a sharp edge influences only a few coefficients, not the whole spectrum. There is no periodicity assumption either, since the basis lives entirely inside the image and is defined by finite-support filters. The tradeoff is that the basis functions themselves are discontinuous, so coarse reconstructions have a characteristic blocky, staircase appearance.

None of these three bases is periodic. None of them suffer from Gibbs ringing. But each pays its own price: polynomial divergence outside the domain, weighted-inner-product subtleties, or staircase artifacts at coarse scales. The demo below lets you switch between all four bases on the same image so you can see directly what each one is good and bad at.

Interactive Demo

Demo 1

Image decomposition

Original image
Frequency spectrum (log magnitude)
Reconstruction
Image
r = 20
Options
Demo 2

3D height-map

Source
Camera
60
Display

Reconstruction Formula

Interpreting the visualization

With the Fourier basis, the ringing in a low-pass reconstruction is called the Gibbs phenomenon. Near any jump discontinuity, a truncated Fourier series overshoots the true value by approximately 9% of the jump height, regardless of how many terms are included. The overshoot does not vanish as you add terms; it just gets narrower. In 2D the effect produces concentric rings around every sharp edge.

When the image is not naturally periodic, the DFT also rings around the artificial edges at tile boundaries. The Quadrant preset is the extreme case: its periodic extension is an infinite checkerboard, and most of its spectral energy sits at the corners of frequency space (the $(N/2, N/2)$ components), because the checkerboard oscillates at the Nyquist frequency in both directions simultaneously. Small filter radii cannot reach that energy, so they produce a smooth blob with no checkerboard structure at all. As $r$ approaches 128, the step-like edges begin to emerge; the corner energy needs $r \approx 181$ (the diagonal Nyquist) for exact reconstruction.

The Gaussian preset shows the opposite story. Its smooth fall-off near the edges means the periodic extension has almost no discontinuity, almost no high-frequency energy, and almost no Gibbs ringing. The connection between spatial smoothness and spectral compactness is everywhere in Fourier analysis: smooth images have rapidly decaying spectra, and their low-frequency approximations are correspondingly accurate.

A common remedy when working with non-periodic data is windowing: multiplying the image by a smooth taper (Hann, Tukey, or a Gaussian envelope) before transforming, so the implicit periodic extension is nearly continuous. The cost is a small smearing in the frequency domain, but the dramatic reduction in ringing is usually worth it.

Summary

Every technique on this page rewrites the same image in a different basis, a set of building-block patterns, and keeps only some of the weights. Fourier uses global waves, Legendre and Chebyshev use polynomials, and Haar uses localized wavelets. The transform itself loses nothing; the approximation appears only when the cutoff discards coefficients.

The DFT additionally assumes the image repeats forever, tiling it into an infinite grid. When the image is not naturally periodic, the seams between tiles act like sharp edges: they push energy into the corners of the spectrum and ring around every discontinuity through the Gibbs phenomenon. Smoothness and spectral compactness are two sides of one coin. A smooth image has a rapidly decaying spectrum and rebuilds well from a few terms, while a sharp edge spreads energy across all frequencies.

No single basis is best. A basis suits an image when it concentrates the image's energy into a few large coefficients, which is exactly what the center panel lets you see. The bases also behave very differently outside the original domain: Fourier tiles periodically, polynomials diverge under Runge's phenomenon or $\cosh$ growth, and Haar does not extend at all. Choosing a basis means choosing which structure you can represent cheaply and which artifacts you can tolerate.

Image compression is the familiar case (JPEG uses a cosine transform, JPEG 2000 uses wavelets), but multiplying spectra is convolution, so the FFT also powers fast filtering, audio equalization, and the convolutions inside neural networks. MRI and CT scanners collect their raw data directly in the Fourier domain and invert it to form an image, and X-ray crystallography measures the Fourier magnitudes of a molecule. Wireless standards such as Wi-Fi and 4G/5G carry data on FFT frequency bins (OFDM). Chebyshev and Legendre polynomials underpin spectral methods that solve differential equations to very high accuracy, and wavelets drive multiresolution denoising and edge detection.

Compression, imaging, communication, and numerical computation are all variations on one idea: represent data in a basis whose patterns match its structure, then keep the few coefficients that carry the most information.