Deep Research Benchmarks: AgentHarness Evaluation Data for Apodex-1.0

Name: Deep Research Benchmarks: AgentHarness Evaluation Data for Apodex-1.0
Creator: apodex
Published: 2026-06-07T04:13:36
Keywords: Ai Benchmarks, Machine Learning Evaluation, Agent Evaluation, Research Benchmarks, Multimodal

by apodexUpdated 26d ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

A password-protected bundle of public deep-research benchmarks used by AgentHarness to evaluate the Apodex-1.0 AI model in standard ReAct mode. The dataset was authored by 'apodex' and last updated on 2026-06-08. Its specific contents and scale are not detailed in the provided metadata.

Use Cases

Benchmarking autonomous AI agents based on the described deep-research tasks.
Evaluating the performance of the Apodex-1.0 model in a ReAct reasoning mode.
Comparing agent performance using the standard evaluation suite from AgentHarness.

Strengths

Designed for a specific, stated purpose: evaluating Apodex-1.0 in ReAct mode.
Has a clear provenance, being the public benchmarks used by the AgentHarness framework.
Last update timestamp is precisely recorded as 2026-06-08 15:54:10.

Limitations

Description metadata is limited; actual data quality, structure, and scale require manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and license information are unknown, which may limit suitability assessment.

Provenance

Source: huggingface
Collection Method: Likely compiled by the author 'apodex' for model evaluation.
Freshness: Last updated 2026-06-08 15:54:10; freshness should be verified.

Download requires a specific wget/unzip command with a password containing special characters (*, (, )); single quotes around the password are mandatory.

Multimodal Ai Benchmarks Machine Learning Evaluation Agent Evaluation Research Benchmarks

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

40

Access

26

Community

26 downloads

1 likes

0 views

Dataset Info

Author: apodex
Created: Jun 7, 2026
Updated: Jun 8, 2026
Last synced: Jun 25, 2026

Access

26

Community

26 downloads

1 likes

0 views

Dataset Info

Author: apodex
Created: Jun 7, 2026
Updated: Jun 8, 2026
Last synced: Jun 25, 2026

Deep Research Benchmarks: AgentHarness Evaluation Data for Apodex-1.0

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info