ViDoRe V3 Pharmaceuticals is a corpus of slides from the FDA, intended for long-document understanding tasks. It is one of 10 corpora comprising the ViDoRe v3 benchmark for RAG evaluation on visually-rich documents. The dataset was authored by 'vidore' and last updated on 2026-01-15.
Use Cases
- Evaluate RAG system performance on pharmaceutical documents based on the benchmark's 3099 queries.
- Train models for long-document understanding tasks based on the corpus of FDA slides.
- Benchmark multilingual document retrieval based on the dataset's translation into 6 languages.
- Analyze information extraction from regulatory slide decks based on the FDA source material.
Strengths
- Part of a larger benchmark with 10 datasets and 26,000 total pages.
- Includes 3099 queries with human-verified relevant pages.
- Queries are translated into 6 languages, supporting multilingual evaluation.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count for this specific dataset is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- FDA slides.
- Collection Method
- Curated as part of the ViDoRe v3 benchmark.
- Time Range
- null
- Freshness
- Last updated 2026-01-15 07:53:49; freshness should be verified.
- Geography
- null