This project explores the feasibility of reconstructing three-dimensional geometry from monocular endoscopic imagery using learning-based depth estimation.
The approach used implements a pipeline that combines a state-of-the-art monocular depth estimation model, Depth-Anything-V2, with camera intrinsic parameters to generate a set of discrete 3D points from each captured frame. These sets are then aligned and fused into a common coordinate frame. A surface reconstruction technique is then used to enclose the fused points within a watertight surface. Synthetic datasets are used to evaluate depth prediction accuracy and geometric consistency, while real-world phantom data is employed to assess model generalisation under practical imaging conditions.
The results demonstrate that geometric structures can be recovered from monocular endoscopy, enabling frame-to-frame alignment and surface reconstruction. They also highlight the sensitivity of such approaches to domain shifts and imaging artefacts. This work provides insight into the challenges of metric Monocular Depth Estimation in endoscopy and outlines future directions for improving robustness and clinical applicability.