Skip to content

Loading...

MSR-VTT: 10,000 Video Clips with 200,000 Captions for Text-Video Retrieval | DataSalon