Name: MINTEval: Benchmark for Memory in Multi-Target Agent Systems
Creator: dinobby
Published: 2026-05-19T05:12:43
Keywords: Long Horizon Agents, Code Revision, Benchmark, Text, Memory Evaluation, Multi Turn Dialogue

Description

MINTEval is an analytical benchmark for evaluating memory under multi-target interference in long-horizon agent systems. It was introduced in a research paper and published on Hugging Face by the author 'dinobby' on 2026-05-19. Each example presents a sequence of contexts followed by questions that require reasoning over that history, covering four domains: state tracking, multi-turn dialogue, Wikipedia revisions, and GitHub commits.

Use Cases

Benchmarking agent memory performance based on sequences of events and edits.
Evaluating reasoning over long conversation histories based on multi-turn dialogue contexts.
Testing an AI's ability to track changes and answer questions based on Wikipedia revision histories.
Assessing memory for code evolution and commit histories in software development contexts.

Strengths

Covers four distinct domains (state tracking, multi-turn dialogue, Wikipedia revisions, GitHub commits) for varied evaluation.
Designed specifically for evaluating memory under multi-target interference, a defined research challenge.
Introduced in a dedicated research paper, suggesting a formal methodological foundation.

Limitations

Description metadata is limited; actual data quality, size, and structure require manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and file formats are unknown, which may limit suitability assessment.

Provenance

Source: Hugging Face dataset uploaded by author 'dinobby'.
Collection Method: Introduced as a benchmark in a research paper; specific data collection method is not detailed in the provided input.
Freshness: Last updated 2026-05-19 15:56:00; freshness should be verified.

License is unknown; users must verify terms of use before application.

Text Long Horizon Agents Code Revision Benchmark Memory Evaluation Multi Turn Dialogue

MINTEval: Benchmark for Memory in Multi-Target Agent Systems

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info