AceCode V2 122K: AI-Enhanced Programming Questions and Test Cases

Name: AceCode V2 122K: AI-Enhanced Programming Questions and Test Cases
Creator: TIGER-Lab
Published: 2025-05-11T22:39:11
Keywords: Text, Ai Training, Code Generation, Programming Questions, Test Cases

by TIGER-LabUpdated 9mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

TIGER-Lab's AceCoder-V2-122k dataset contains over 147,000 programming questions and test cases, an improvement over the V1 version. The dataset was created by rewriting questions and test cases using OpenAI's o1-mini and filtering them with Qwen Coder 2.5 32B Instruct. It was last updated on August 14, 2025.

Use Cases

Training code generation models based on the dataset's programming questions.
Evaluating model performance on code synthesis tasks using the provided test cases.
Benchmarking the robustness of code models against the filtered test suite.
Studying the impact of AI-assisted rewriting on dataset quality for programming tasks.

Strengths

Contains 147,220 questions before filtering, indicating a substantial scale.
Questions and test cases were rewritten by OpenAI's o1-mini, suggesting an attempt at quality enhancement.
Underwent a filtering process using Qwen Coder 2.5 32B Instruct, which may improve reliability.

Limitations

Description metadata is limited; actual data quality, column definitions, and row count require manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.

Provenance

Source: TIGER-Lab
Collection Method: Questions and test cases were rewritten by OpenAI's o1-mini and filtered using Qwen Coder 2.5 32B Instruct.
Time Range: null
Freshness: Last updated 2025-08-14 14:39:50.
Geography: null

null

Text Ai Training Code Generation Programming Questions Test Cases

Related Datasets

Quality Score

C43

Description

49

Source

41

Reputation

47

Access

26

Community

116 downloads

8 likes

0 views

Dataset Info

Author: TIGER-Lab
Created: May 11, 2025
Updated: Aug 14, 2025
Last synced: May 8, 2026

Access

26

Community

116 downloads

8 likes

0 views

Dataset Info

Author: TIGER-Lab
Created: May 11, 2025
Updated: Aug 14, 2025
Last synced: May 8, 2026

AceCode V2 122K: AI-Enhanced Programming Questions and Test Cases

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info