Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
178,510 caption entries and 960,792 open-ended question-answer pairs were compiled by lmms-lab for training the LLaVA-Video model. This multimodal dataset aggregates video-language data from five primary sources. The dataset card was last updated in October 2024.
Usage is restricted to academic research and education purposes only. Users must check OpenAI's Usage Policy for the GPT-4 generated portions.