Jul 10, 2025
Figure 1: 360° videos of urban objects, from the Urban3D sample dataset
OVERVIEW
Urban3D is a large-scale dataset of real-world, object-centric multiview videos and 3D reconstructions, captured to advance machine learning research in autonomous systems, smart infrastructure, and urban scene understanding.
The dataset contains over 10,000 multiview videos across urban categories, including traffic signage, dumpsters, construction equipment, e-scooters, road hazards, and more.
Each object in the dataset is derived from a single video sequence processed through a custom pipeline. This pipeline extracts high-resolution RGB frames, performs Structure-from-Motion (SfM) using COLMAP, and generates key assets for 3D reconstruction and neural rendering. Outputs include COLMAP-derived camera poses, a COLMAP database with feature matches, sparse 3D point clouds, and distortion-aware intrinsics and extrinsics. An instance-ready transforms.json is provided for NeRF pipelines, and the dataset supports Gaussian Splatting through optimized SIFT extraction and optional image undistortion. The original video file is retained for traceability and simulation. This structure ensures clean, calibrated geometry for NeRF training, Gaussian splat generation, and broader simulation workflows.
DATASET STATISTICS
Quality: 1080p & 4K at 60fps
Quantity: 10,000 multiview videos (180°-360°)
GAUSSIAN SPLAT EXAMPLE
Figure 2 shows a 360° video captured on a sine wave motion taken from the Urban3D sample dataset which captures a white-blue fire hydrant on a concrete sidewalk. Figure 3 shows an example of the generated gaussian splat of the object using Polycam.
Figure 2: 360° video of a white-blue fire hydrant
Figure 3: Gaussian splat generated using Polycam, based on the 360° fire hydrant video
GET IN TOUCH
To request the full dataset, please contact us at team@zeroframe.ai or find a time with our team using the button below. You can find the sample dataset on Hugging Face, using the button on the top of the page.
GET IN TOUCH