Sept 16, 2025
Figure 1: Example segments from EgoWild featuring different household activities
OVERVIEW
EgoWild is a large-scale dataset of egocentric video sequences capturing dexterous, in-the-wild human interactions across diverse real-world environments. Collected using head-mounted wide-lens GoPro cameras (270° FOV), the dataset provides high-resolution recordings ideal for training embodied agents to perceive and act in natural, unstructured settings.
If you're interested in a similar dataset with the addition of motion capture gloves, feel free to check out EgoWild-X.
Unlike lab-constrained datasets, EgoWild emphasizes everyday household and assembly tasks performed without scripts or controlled conditions. Recordings include:
Cooking: chopping, stirring, opening jars, pouring
Cleaning & Laundry: wiping, folding, ironing, vacuuming, picking up clutter
Assembly & Repair: building furniture, screwing parts, tightening bolts
Daily Routines: typing, writing, opening doors, using laptops
Utility: unpacking, carrying bags, moving furniture
Objects span from deformables (clothes, sponges, food) to rigid/articulated items (drawers, appliances, tools), with activities taking place in kitchens, garages, offices, and outdoor settings—complete with clutter, occlusion, and natural social presence.
EgoWild supports research in:
Learning manipulation and interaction policies from egocentric visual data
Understanding object affordances in realistic, task-driven contexts
Advancing embodied AI systems for open-world, everyday environments
The task-rich, untrimmed nature of EgoWild makes it uniquely suited for training agents to learn from the same types of continuous, unscripted activities humans perform daily.
DATASET STATISTICS
Resolution: 4K
Fps: 60fps
GET IN TOUCH
To request the full sample dataset, please contact us at db@zeroframe.ai or use the button below.
REQUEST SAMPLE DATASET