Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MedHorizon provides 340 full-procedure clinical videos paired with 1,253 multiple-choice questions for evaluating multimodal AI models. The benchmark emphasizes two challenging properties: extremely sparse evidence retrieval and multi-hop reasoning across observations distributed throughout lengthy procedures. It was created by mlvbench-review and last updated on Hugging Face in May 2026.
License is unknown; users should verify terms before use.