4.1 million subject–text–video triples form this dataset for subject-driven video generation. Created by HiDream-ai, it was last updated in April 2026. It includes instance segmentation, face detection, quality scores, and timeline annotations.
Use Cases
- Train subject-to-video models using the 4.1M subject–text–video triples.
- Evaluate video generation quality using the provided multi-dimensional quality scores.
- Segment subjects in video frames using the included instance detection and segmentation annotations.
- Analyze temporal event structure in videos using the timeline annotations.
- Detect and process faces within video subjects using the face detection data.
Strengths
- Contains 4.1 million data triples, indicating significant scale.
- Includes multiple annotation types: instance segmentation, face detection, quality scores, and timeline events.
Limitations
- Specific row counts, column details, and file formats are not provided.
- The geographic and temporal coverage of the source videos is unknown.
Provenance
- Source
- HiDream-ai
- Collection Method
- null
- Time Range
- null
- Freshness
- null
- Geography
- null