Sign in to view source links and access this dataset
Description
A filtered subset of the TACO dataset, last updated in April 2025, containing only verified programming solutions that pass all test cases. The dataset, created by author likaixin, includes 12,898 problems and 1,043,251 solutions, with a 71.03% correct ratio after removing failing solutions and problems with no correct answer.
Use Cases
Training code generation models on verified, correct solutions.
Benchmarking the reliability of AI-generated code based on test case pass rates.
Analyzing patterns in programming errors by comparing the original and verified datasets.
Studying the characteristics of problems that have at least one correct solution.
Strengths
Contains 1,043,251 verified solutions, providing a substantial corpus of correct code.
Filtered to a 71.03% correct ratio, improving data reliability for training.