What It Is
A photogrammetry pipeline that reconstructs the 3D geometry of hand-sized objects from a set of 2D photographs. The object sits on a calibration platform with known reference points, and the system recovers its 3D structure to within a 0.2 cm error margin. Everything is implemented in Python using geometrical optics principles and linear algebra — no black-box reconstruction libraries.
Why I Built It
3D reconstruction sits at the intersection of optics, linear algebra, and geometry in a way that makes the math feel tangible. Every matrix has a physical meaning: a projection matrix encodes how a camera maps 3D space onto a 2D sensor, a rotation matrix describes camera orientation, and solving a system of equations literally triangulates a point in space. I wanted to work through that full pipeline myself rather than calling a library function that hides it.
How It Works
The pipeline follows the classical multi-view reconstruction approach:
Camera calibration uses the known reference points on the platform to compute the intrinsic and extrinsic parameters of each camera viewpoint. The intrinsic matrix captures focal length, principal point, and any lens distortion. The extrinsic matrix captures the camera's position and orientation relative to the platform. These are recovered by solving a system of correspondences between known 3D points and their observed 2D projections.
Feature matching identifies corresponding points across multiple images of the same object. When the same physical point appears in two or more photographs, its 2D coordinates in each image provide the constraints needed for triangulation.
Triangulation is where the geometry comes together. Given a point's 2D position in two calibrated images, each observation defines a ray from the camera center through the image plane into 3D space. The 3D point lies at (or near) the intersection of these rays. In practice, rays never intersect perfectly due to measurement noise, so the problem becomes a least-squares minimization — finding the 3D point that minimizes its reprojection error across all views.
Matrix decomposition is central to nearly every step. Calibration involves solving overdetermined linear systems via SVD. The projection matrix is factored into intrinsic and extrinsic components using QR decomposition. Triangulation reduces to solving homogeneous systems where the solution is the null space of a measurement matrix. Python's NumPy handles the numerical heavy lifting, but understanding what each decomposition does geometrically was the real challenge.
The Calibration Platform
The dedicated platform has precisely placed reference points at known 3D coordinates. This is what makes the whole system tractable — without known correspondences, you'd need to solve the much harder problem of simultaneous structure and motion estimation (full SfM). With the platform, camera calibration becomes a well-conditioned linear problem.
Accuracy
The target was 0.2 cm error on hand-sized objects, which we achieved. The main sources of error are feature localization noise in the images (sub-pixel accuracy matters), lens distortion that isn't fully corrected, and the geometric configuration of the camera viewpoints — wider baselines give better depth resolution but make feature matching harder.
What I Learned
The biggest insight was how sensitive 3D reconstruction is to the numerical conditioning of the linear systems involved. Poorly conditioned matrices (from near-parallel camera views, for example) amplify small measurement errors into large 3D errors. Understanding when and why SVD or QR decomposition is the right tool — and what the singular values tell you about the geometry of your setup — was the most valuable takeaway.
I also developed a better intuition for projective geometry: how points at infinity work, why homogeneous coordinates simplify everything, and how the fundamental matrix encodes the epipolar constraint between two views.