Name: Multimodal Hallucination Benchmark for Visual Question Answering
Creator: Shengcao1006
Published: 2023-09-25T04:27:58
Keywords: Task Categoriesimage To Text, Benchmark Evaluation, Languageen, Task Categoriesvisual Question Answering, Size Categoriesn1 K, Large Multimodal Models, Benchmark, Computer Vision, Regionus, Multimodal Hallucination, Licenseapache 20, Visual Question Answering, Multimodal

Description

96 challenging questions based on images from OpenImages form this evaluation benchmark for hallucination in Large Multimodal Models. It includes ground-truth answers and image contents. The dataset was created by Shengcao1006 and uploaded in November 2023.

Use Cases

Benchmark model performance on hallucination using the 96 question-answer pairs and corresponding images.
Analyze failure modes in LMMs by comparing generated answers against the provided ground-truth answers.
Train or fine-tune models to reduce hallucination using the curated image content descriptions and questions.
Develop new evaluation metrics for multimodal hallucination based on the benchmark's structured challenges.

Strengths

96 specifically designed challenging questions for targeted evaluation.
Includes ground-truth answers and image content for precise scoring.

Limitations

Small scale with only 96 data points, limiting statistical power for broad conclusions.
Image source is limited to OpenImages, which may not represent all visual domains.

Provenance

Source: Hugging Face, created by Shengcao1006.
Collection Method: Curated benchmark; questions based on images from OpenImages.
Time Range: null
Freshness: Last updated in November 2023.
Geography: null

License is listed as Apache 2.0 on the platform, but confirmation from the original description is advised. The dataset is designed solely for evaluation, not for training large-scale models.

Multimodal Task Categoriesimage To Text Benchmark Evaluation Languageen Task Categoriesvisual Question Answering Size Categoriesn1 K Large Multimodal Models Benchmark Computer Vision Regionus Multimodal Hallucination Licenseapache 20 Visual Question Answering

Multimodal Hallucination Benchmark for Visual Question Answering

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info