MERRIN: A Human-Annotated Benchmark for Multimodal Reasoning in Noisy Web Environments

Name: MERRIN: A Human-Annotated Benchmark for Multimodal Reasoning in Noisy Web Environments
Creator: HanNight
Published: 2026-04-14T21:16:31
Keywords: Evidence Retrieval, Web Data, Ai Benchmark, Benchmark, Human Annotated, Audio, Multimodal Reasoning, Multimodal

by HanNightUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

MERRIN is a human-annotated benchmark for evaluating search-augmented agents on multi-hop reasoning over noisy, multimodal web sources. It measures agents' ability to identify relevant modalities, retrieve evidence from the open web, and reason over conflicting sources spanning text, images, video, and audio. The dataset was created by HanNight and was last updated on 2026-04-16.

Use Cases

Benchmarking multimodal search-augmented agents based on the described multi-hop reasoning tasks.
Evaluating an agent's ability to identify relevant modalities without explicit cues, as described in the dataset's purpose.
Testing retrieval and reasoning over noisy, conflicting, and incomplete sources spanning text, images, video, and audio.

Strengths

Human-annotated benchmark, which suggests a level of curated quality for evaluation.
Designed to measure specific, complex agent capabilities: modality identification, evidence retrieval, and reasoning over noise.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and license are unknown, which may limit suitability assessment.

Provenance

Source: HanNight via Hugging Face.
Collection Method: Human-annotated, likely gathered from noisy web sources.
Freshness: Last updated 2026-04-16 02:20:14; freshness should be verified.

License is unknown; users must verify permissions before use.

Audio Multimodal Evidence Retrieval Web Data Ai Benchmark Benchmark Human Annotated Multimodal Reasoning

Related Datasets

Quality Score

D39

Description

39

Source

44

Reputation

39

Access

26

Community

14 downloads

1 likes

0 views

Dataset Info

Author: HanNight
Created: Apr 14, 2026
Updated: Apr 16, 2026
Last synced: Apr 27, 2026

Access

26

Community

14 downloads

1 likes

0 views

Dataset Info

Author: HanNight
Created: Apr 14, 2026
Updated: Apr 16, 2026
Last synced: Apr 27, 2026

MERRIN: A Human-Annotated Benchmark for Multimodal Reasoning in Noisy Web Environments

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info