Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MSR-VTT contains 10,000 video clips paired with 200,000 descriptive captions. The dataset, originally created by Microsoft Research, is a standard benchmark for text-video retrieval and captioning tasks. It was last updated on the platform in August 2025.
License information is not provided in the input. The dataset uses a specific train/test split (1K-A) that must be adhered to for benchmark comparisons.