Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
The ActivityNet Captions dataset contains 20,000 videos, each annotated with an average of 3.65 temporally localized descriptive sentences, resulting in 100,000 total sentences. Each sentence describes a unique video segment and has an average length of 13.48 words. The dataset was created by Leyo.
License is listed as 'other'; users must verify specific terms before use. The dataset is monolingual (English).