DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Curt Benchmarks: Language Evaluation Suite for AI Agents | DataSalon

Home Machine LearningCurt Benchmarks: Language Evaluation Suite for AI Agents

Machine Learning

Curt Benchmarks: Language Evaluation Suite for AI Agents

Name: Curt Benchmarks: Language Evaluation Suite for AI Agents
Creator: therikkening
Published: 2026-06-11T22:18:03
Keywords: Programming Language, Benchmarking, Benchmark, Ai Agents, Text, Code Generation, Natural Language Processing, Synthetic

by therikkening·Updated 10d ago

Available on 1 platform

Description

curt is a machine-first programming language designed for AI agents with a focus on output-token cost. This dataset contains the complete evaluation record for language version 0.2, including benchmark suites, model-generated programs, and reference materials. The dataset was created by therikkening and was last updated on June 12, 2026.

Use Cases

Benchmarking AI agent performance based on the held-out benchmark suites
Analyzing model-generated program outputs based on the frozen verbatim records
Comparing language implementations based on the reference corpus with Python twins
Studying formal language grammars based on the included formal grammars

Strengths

Contains the complete evaluation record for language version 0.2
Includes two held-out benchmark suites for evaluation
Provides every model-generated program produced during evaluation, frozen verbatim

Limitations

Description metadata is limited; actual data quality requires manual inspection after download
Column-level documentation is absent; field semantics must be inferred after download

Provenance

Source: therikkening
Freshness: Last updated 2026-06-12 08:45:18; freshness should be verified

Text Programming Language Benchmarking Benchmark Ai Agents Code Generation Natural Language Processing Synthetic

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

215 downloads

1 likes

0 views

Dataset Info

Author: therikkening
Created: Jun 11, 2026
Updated: Jun 12, 2026
Last synced: Jun 19, 2026

Access

Community

215 downloads

1 likes

0 views

Dataset Info

Author: therikkening
Created: Jun 11, 2026
Updated: Jun 12, 2026
Last synced: Jun 19, 2026

Curt Benchmarks: Language Evaluation Suite for AI Agents

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info