Structured Code Optimization, Bug-Fix, and AST Parsing Dataset for ML Training
by Jamie Davis·Updated 11d ago
4.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Jamie Davis provides a dataset of structured JSON objects pairing raw source code with optimized equivalents, bug fixes, and invalid syntax examples. The dataset includes pre-computed complexity scores, execution tracking, and input-output verification arrays. It was last updated on 2026-05-28 and is engineered to train automated program repair tools, parsers, and static analyzers.
Use Cases
Train automated program repair models based on the pairing of buggy and fixed code examples.
Develop code parsers and static analyzers based on the provided Abstract Syntax Tree (AST) parsing examples.
Benchmark code optimization algorithms based on the structured input-output verification arrays.
Train models to detect and correct invalid syntax based on the provided invalid syntax examples.
Strengths
Dataset is structured with pre-computed complexity scores and verification arrays.
Data is specifically engineered for training automated program repair tools and parsers.
The dataset is published under the open CC-BY-4.0 license.
Limitations
The dataset is very small at 4.5 KB, indicating limited scope.
Row count and column-level documentation are unknown, limiting suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Jamie Davis via figshare
Freshness
Last updated 2026-05-28 18:57:39.
Data is provided in TXT format; users must parse the JSON objects contained within.