Datapoint AI collected ~91,000 human ranking labels for text-to-video generation models. The dataset contains rankings for 5 videos per prompt across 3 quality dimensions, as judged by 15 annotators per dimension. It was last updated on Hugging Face in April 2026.
Use Cases
- Benchmark text-to-video model performance based on human preference rankings.
- Train reward models for video generation based on multi-dimensional quality labels.
- Analyze the correlation between different quality dimensions in human video evaluation.
Strengths
- Contains ~91,000 human ranking labels, providing a substantial evaluation corpus.
- Features rankings from 15 annotators per data point across 3 quality dimensions.
- Compares outputs from 18 different text-to-video models on the same prompts.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- datapointai via Hugging Face
- Collection Method
- Human ranking labels collected from real annotators via Datapoint AI.
- Freshness
- Last updated 2026-04-09 15:57:59; freshness should be verified.