A collection of bug candidates identified through metamorphic testing of tool-using LLM agents. The dataset is an artifact for the AgentMorph paper, authored by Anonymous2535k and last updated on May 7, 2026. It contains cleaned Stage 3 bug candidates derived from mutated task trajectories.
Use Cases
- Benchmarking LLM agent robustness based on trajectory-level metamorphic testing
- Identifying failure patterns in tool-using agents based on invariant violations
- Developing new testing methodologies for AI agents based on intent-preserving task mutations
Strengths
- Focuses on a specific testing methodology (metamorphic testing) for LLM agents
- Provides cleaned bug candidates from Stage 3 of the AgentMorph process
- Serves as a direct artifact for a research paper, indicating a defined purpose
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- huggingface
- Collection Method
- Generated via metamorphic testing of tool-using LLM agents, where tasks are mutated to preserve intent and trajectories are compared.
- Freshness
- Last updated 2026-05-07 06:17:51; freshness should be verified