DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

AA-Briefcase-Lite: Frontier Agentic Evaluation of Professional Workflows | DataSalon

Home Machine LearningAA-Briefcase-Lite: Frontier Agentic Evaluation of Professional Workflows

Machine Learning

AA-Briefcase-Lite: Frontier Agentic Evaluation of Professional Workflows

Name: AA-Briefcase-Lite: Frontier Agentic Evaluation of Professional Workflows
Creator: ArtificialAnalysis
Published: 2026-06-18T21:09:53
Keywords: Agentic Evaluation, Benchmarking, Benchmark, Knowledge Work, Text, Professional Workflows, Synthetic

by ArtificialAnalysis·Updated 12h ago

Available on 1 platform

Description

AA-Briefcase-Lite is a public example scenario for Artificial Analysis' frontier agentic evaluation of realistic, long-horizon knowledge work. The dataset extends frontier model benchmarking beyond coding and short-form reasoning to the professional deliverables knowledge workers produce day to day. It consists of four private scenarios in which agents complete realistic professional workflows across data science.

Use Cases

Benchmarking frontier AI models on professional deliverables based on the described scenarios.
Evaluating agentic performance on long-horizon knowledge work tasks.
Comparing model capabilities across realistic professional workflows in data science.

Strengths

Focuses on realistic professional workflows, extending benchmarking beyond coding.
Created by ArtificialAnalysis, a known entity in AI evaluation.
Last updated on 2026-06-19, indicating recent maintenance.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: ArtificialAnalysis
Freshness: Last updated 2026-06-19 02:03:22.

Text Agentic Evaluation Benchmarking Benchmark Knowledge Work Professional Workflows Synthetic

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

3 likes

0 views

Dataset Info

Author: ArtificialAnalysis
Created: Jun 18, 2026
Updated: Jun 19, 2026
Last synced: Jun 19, 2026

Access

Community

3 likes

0 views

Dataset Info

Author: ArtificialAnalysis
Created: Jun 18, 2026
Updated: Jun 19, 2026
Last synced: Jun 19, 2026

AA-Briefcase-Lite: Frontier Agentic Evaluation of Professional Workflows

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info