VideoNet is a dataset highlighted at CVPR 2026 for studying domain-specific action recognition and in-context video learning in Vision-Language Models (VLMs). The dataset includes benchmark MP4 files and JSONL files containing question-and-answer pairs. It was uploaded by author 'raivn' and last updated on May 6, 2026.
Use Cases
- Benchmarking Vision-Language Models on domain-specific action recognition tasks based on the provided video and Q&A pairs.
- Training models for in-context video learning based on the structured benchmark JSONL files.
- Studying the performance of VLMs on video-based question answering based on the dataset's described structure.
Strengths
- Dataset is associated with a CVPR 2026 Highlight paper, indicating academic relevance.
- Includes both video files (MP4s) and structured benchmark files (JSONLs) for evaluation.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count, file formats beyond MP4/JSONL, and license information are unknown.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Freshness
- Last updated 2026-05-06 07:52:08; freshness should be verified.