TIGER-Lab's AceCoder-V2-122k dataset contains over 147,000 programming questions and test cases, an improvement over the V1 version. The dataset was created by rewriting questions and test cases using OpenAI's o1-mini and filtering them with Qwen Coder 2.5 32B Instruct. It was last updated on August 14, 2025.
Use Cases
- Training code generation models based on the dataset's programming questions.
- Evaluating model performance on code synthesis tasks using the provided test cases.
- Benchmarking the robustness of code models against the filtered test suite.
- Studying the impact of AI-assisted rewriting on dataset quality for programming tasks.
Strengths
- Contains 147,220 questions before filtering, indicating a substantial scale.
- Questions and test cases were rewritten by OpenAI's o1-mini, suggesting an attempt at quality enhancement.
- Underwent a filtering process using Qwen Coder 2.5 32B Instruct, which may improve reliability.
Limitations
- Description metadata is limited; actual data quality, column definitions, and row count require manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- TIGER-Lab
- Collection Method
- Questions and test cases were rewritten by OpenAI's o1-mini and filtered using Qwen Coder 2.5 32B Instruct.
- Time Range
- null
- Freshness
- Last updated 2025-08-14 14:39:50.
- Geography
- null