Sign in to view source links and access this dataset
Description
Stera-10M is an open egocentric multimodal dataset for embodied AI, robotics, world models, and spatial intelligence. It contains 200 hours of synchronized first-person recordings across 500+ sessions from 20 contributors in 20+ unique environments, with 10 million RGB frames, LiDAR depth, and ARKit data. The dataset was captured end-to-end on commodity iPhone Pro hardware through the open Stera platform and is authored by fpvlabs.
Use Cases
Training world models for robotics based on synchronized first-person RGB and LiDAR depth data.
Developing spatial intelligence algorithms using the dataset's egocentric recordings from 20+ unique environments.
Benchmarking embodied AI agents on tasks requiring multimodal sensor fusion from commodity hardware.
Researching human-environment interaction patterns from 500+ sessions of first-person activity.
Strengths
Contains 200 hours of synchronized first-person recordings.
Includes 10 million RGB frames alongside LiDAR depth and ARKit data.
Captured across 500+ sessions from 20 contributors in 20+ unique environments.
Recorded end-to-end on commodity iPhone Pro hardware, suggesting real-world applicability.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-05-15 14:58:39; freshness should be verified.
Provenance
Source
fpvlabs via the open Stera platform.
Collection Method
Captured end-to-end on commodity iPhone Pro hardware.
Freshness
Last updated 2026-05-15 14:58:39.
License is unknown; terms of use must be verified before application.