DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Agents Last Exam: Metadata for 147 Professional Task Cards | DataSalon

Home EducationAgents Last Exam: Metadata for 147 Professional Task Cards

Education

Agents Last Exam: Metadata for 147 Professional Task Cards

Name: Agents Last Exam: Metadata for 147 Professional Task Cards
Creator: agents-last-exam
Published: 2026-05-07T05:11:54
Keywords: Computer Use, Benchmark Evaluation, Task Metadata, Benchmark, Ai Agents, Tabular

by agents-last-exam·Updated 11d ago

Available on 1 platform

Description

147 task cards form the metadata layer of the Agents Last Exam benchmark for evaluating computer-use agents. This version 1.0 release includes titles, prompts, taxonomy, and input-file descriptors for each task. The dataset is published by agents-last-exam on HuggingFace and was last updated on June 5, 2026.

Use Cases

Benchmarking agent performance on long-horizon professional tasks based on the provided task prompts and taxonomy.
Analyzing the distribution of task types and complexity based on the task card metadata.
Filtering or selecting specific task categories for targeted agent evaluation based on the taxonomy labels.
Preparing input data for agent simulations based on the referenced input-file descriptors.

Strengths

Contains metadata for 147 distinct evaluation tasks.
Provides a structured taxonomy for categorizing tasks, which suggests systematic organization.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: agents-last-exam on HuggingFace
Freshness: Last updated 2026-06-05 04:06:13; freshness should be verified.

This is a metadata-only release; the actual task input data is a separate companion dataset.

Tabular Computer Use Benchmark Evaluation Task Metadata Benchmark Ai Agents

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

31 downloads

1 likes

0 views

Dataset Info

Author: agents-last-exam
Created: May 7, 2026
Updated: Jun 5, 2026
Last synced: Jun 16, 2026

Access

Community

31 downloads

1 likes

0 views

Dataset Info

Author: agents-last-exam
Created: May 7, 2026
Updated: Jun 5, 2026
Last synced: Jun 16, 2026

Agents Last Exam: Metadata for 147 Professional Task Cards

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info