SpatialVID: 1-10 Million Videos with Spatial Annotations for 3D Generation

Name: SpatialVID: 1-10 Million Videos with Spatial Annotations for 3D Generation
Creator: SpatialVID
Published: 2025-09-08T03:27:11
Keywords: Task Categoriesimage To 3d, Librarypolars, Size Categories1 Mn10 M, Languageen, Licensecc By Nc Sa 40, Task Categoriestext To 3d, Modalitytext, Task Categoriesimage To Video, CSV, Modalitytabular, Librarymlcroissant, Librarydatasets, Librarypandas, Task Categoriesother, Regionus, Task Categoriestext To Video, Arxiv250909676

by SpatialVIDUpdated 4mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

SpatialVID provides between 1 million and 10 million video records paired with spatial annotations, developed by researchers at Nanjing University and the Chinese Academy of Science for CVPR 2026. The data supports multi-modal generative tasks by linking video sequences with 3D spatial metadata and English text descriptions.

Use Cases

Training text-to-3d models using spatial annotations to define geometric structure
Developing image-to-video generation pipelines that require spatial consistency
Analyzing video-text alignment using the provided English language descriptions

Strengths

Scale of 1 to 10 million records
Expert-authored for CVPR 2026
Includes spatial annotations for 3D-aware generation

Limitations

Restricted to non-commercial use under CC BY-NC-SA 4.0 license
Potential for high storage and compute requirements due to 1M+ video records

Provenance

Source: Nanjing University and Institute of Automation, Chinese Academy of Science (Arxiv 2509.09676)
Collection Method: Annotated
Freshness: Last updated March 2026; associated with CVPR 2026 publication.

Released under CC BY-NC-SA 4.0 license. Requires citation of the CVPR 2026 paper (Arxiv 2509.09676).

CSV Task Categoriesimage To 3d Librarypolars Size Categories1 Mn10 M Languageen Licensecc By Nc Sa 40 Task Categoriestext To 3d Modalitytext Task Categoriesimage To Video Modalitytabular Librarymlcroissant Librarydatasets Librarypandas Task Categoriesother Regionus Task Categoriestext To Video Arxiv250909676

Related Datasets

Quality Score

C41

Description

43

Source

36

Reputation

57

Access

22

Community

19.8K downloads

39 likes

0 views

Dataset Info

Author: SpatialVID
Created: Sep 8, 2025
Updated: Mar 1, 2026
Last synced: Jul 1, 2026

Access

22

Community

19.8K downloads

39 likes

0 views

Dataset Info

Author: SpatialVID
Created: Sep 8, 2025
Updated: Mar 1, 2026
Last synced: Jul 1, 2026

SpatialVID: 1-10 Million Videos with Spatial Annotations for 3D Generation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info