Name: MedHorizon: 340 Full-Procedure Clinical Videos for Long-Context Evaluation
Creator: mlvbench-review
Published: 2026-04-27T14:05:54
Keywords: Qa Evaluation, Benchmark, Healthcare, Medical Video, Long Context, Video, Multimodal Benchmark, Clinical Procedures, Multimodal

Description

MedHorizon provides 340 full-procedure clinical videos paired with 1,253 multiple-choice questions for evaluating multimodal AI models. The benchmark emphasizes two challenging properties: extremely sparse evidence retrieval and multi-hop reasoning across observations distributed throughout lengthy procedures. It was created by mlvbench-review and last updated on Hugging Face in May 2026.

Use Cases

Benchmarking model performance on long-context medical video understanding based on full-procedure videos.
Evaluating sparse evidence retrieval capabilities based on the benchmark's emphasis on finding key moments in long videos.
Testing multi-hop reasoning over distributed observations based on questions requiring reasoning across a full procedure.
Training or fine-tuning models for clinical video question answering based on the provided QA pairs.

Strengths

Contains 340 full-procedure clinical videos, providing a substantial corpus for long-context evaluation.
Includes 1,253 multiple-choice QA pairs specifically designed to test sparse retrieval and multi-hop reasoning.
Focuses on a distinct, challenging benchmark scenario not captured by short-clip medical video datasets.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and dataset size are unknown, which may limit suitability assessment.
The dataset is provided as a test-only split, which may restrict its use for training.

Provenance

Source: mlvbench-review on Hugging Face.
Collection Method: Likely curated for benchmarking purposes; specific gathering method is unknown.
Freshness: Last updated 2026-05-07 04:02:36; freshness should be verified.

License is unknown; users should verify terms before use.

Video Multimodal Qa Evaluation Benchmark Healthcare Medical Video Long Context Multimodal Benchmark Clinical Procedures

MedHorizon: 340 Full-Procedure Clinical Videos for Long-Context Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info