Mpdocvqa Corpus is a multimodal dataset published on HuggingFace by author AHS-uni. The dataset was last updated on June 8, 2025. Its specific content and scale are unknown from the provided metadata.
Use Cases
- Training multimodal models for document-based visual question answering (inferred from domain, verify after download)
- Benchmarking the performance of vision-language models on structured documents (inferred from domain, verify after download)
- Fine-tuning models for information extraction from visually rich documents (inferred from domain, verify after download)
Strengths
- Published on HuggingFace, a major platform for AI datasets.
- Last updated on 2025-06-08 13:27:07, indicating recent maintenance.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Row count, file formats, and column definitions are unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- AHS-uni
- Freshness
- 2025-06-08 13:27:07