Sign in to view source links and access this dataset
Description
Vript is a fine-grained video-text dataset constructed by Mutonix, containing 12,000 annotated high-resolution videos split into approximately 400,000 clips. The annotation is inspired by video scripts, detailing scene content, shot types, and camera movements. The dataset was last updated on June 11, 2024.
Use Cases
Training video-to-text models based on fine-grained scene descriptions.
Developing video generation models based on structured script annotations.
Researching the relationship between camera movements, shot types, and narrative content.
Strengths
Contains 12,000 high-resolution videos.
Provides fine-grained annotations for approximately 400,000 video clips.
Annotation structure includes scene content, shot type, and camera movement details.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full dataset is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Mutonix
Collection Method
Constructed and annotated, likely from curated video sources.
Freshness
Last updated 2024-06-11 10:38:10.
License is unknown; terms of use must be verified before application.