Overview

Principal Component Analysis (PCA) is a way to reorganize data around the directions where it varies most. If the centered data matrix is $X \in \mathbb{R}^{n \times d}$, PCA finds orthogonal directions that explain as much variance as possible, ordered from most important to least important. One axis points along the dominant direction of variation, the other points along the remaining orthogonal variation. This makes PCA useful for visualization, dimensionality reduction, denoising, and understanding the geometry of a dataset.

PCA and SVD

For centered data, PCA is most cleanly computed through the singular value decomposition (SVD), which decomposes the data matrix $X$ into three simpler matrices: $U$ contains the left singular vectors, $\Sigma$ is a diagonal matrix of singular values, and $V^T$ contains the right singular vectors.

$$X = U\Sigma V^T$$

First, $V^T$ rotates feature space, meaning rotating the coordinate system used to describe the points rather than the points themselves, into the principal-direction basis, $V$, aligning the original axes with the principal directions. $\Sigma$ rescales those axes according to how strongly the data varies along each direction, and $U$ describes how the samples project onto those axes. The full product rebuilds the original centered data matrix.

Covariance eigendecomposition and PCA as a change of basis

Covariance measures how coordinates vary together, so we can use the unbiased sample covariance formula $C=\frac{1}{n-1}X^T X$ to package the shape of the data cloud into a symmetric operator whose dominant eigenvectors point along the directions of greatest variance. Substituting the SVD into the covariance formula gives us:

$$\frac{1}{n-1}X^T X = \frac{1}{n-1}(U\Sigma V^T)^T(U\Sigma V^T) = \frac{1}{n-1}V\Sigma \underbrace{U^T U}_{=\,I}\Sigma V^T = V\left(\frac{\Sigma^2}{n-1}\right)V^T$$

The covariance eigendecomposition, $C = V\Lambda V^T$, where $\Lambda = \frac{\Sigma^2}{n-1}$ is the diagonal matrix of eigenvalues. The eigenvectors are the right singular vectors, $V$, and the eigenvalues are the squared singular values scaled by $\frac{1}{n-1}$:

Similar to above, $V^T$ rotates the original data into the principal basis, $\Lambda$ stretches along those axes according to the variance explained by each direction, and $V$ rotates back to the original feature space. This re-expresses the data in terms of its principal components ordered by their explained variance.

Visual step through

Generate a dataset, inspect the SVD of the data matrix $X$, and step through the PCA-related transformation sequence. The right plot shows how the covariance operator acts on a unit circle or sphere and on the principal directions themselves.

Data / principal components
Covariance operator
Data matrix and singular values
-
Covariance eigendecomposition
-

Dimensionality reduction and reconstruction

After rotating into the principal basis, because PCA orders coordinates by explained variance, keeping only the first few components gives a lower-dimensional representation that preserves as much variance as possible among all orthogonal linear projections of that rank.

$$X_{\text{rank-}k} = U_k\Sigma_k V_k^T$$

Thus we can effectively compress the data while keeping as much of the dominant structure as possible. When only one or two components explain most of the variance, the reconstructed data can remain visually and statistically close to the original while using fewer degrees of freedom.

Conclusion

PCA is ultimately a geometric and algebraic reorganization of the data. SVD gives a numerically stable way to compute it from the data matrix $X$, the principal directions come from the right singular vectors, and the explained variances come from the squared singular values. Conjugating the diagonal stretch by the eigenvector matrix explains why PCA can be understood as a rotation into a natural coordinate system, a variance-weighted scaling, and a rotation back.