Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
ShareGPT4Video provides 4.8 million multi-modal video captions generated via GPT-4-Vision to improve modality alignment in Large Video-Language Models. Developed by the ShareGPT4Video team in 2024, the collection includes a specific 40,000-record subset for fine-grained visual perception tasks.
Users must adhere to the CC BY-NC 4.0 license which prohibits commercial use; data is provided in JSONL format.