Sign in to view source links and access this dataset
Description
Approximately 1,500 commits with code diffs, sourced from CommitPackFT, provide a benchmark for evaluating model retrieval tasks. The dataset, created by cassanof, covers 13 languages including Python, JavaScript, Go, and Rust. It was last updated on Hugging Face in April 2024.
Use Cases
Benchmarking code retrieval models based on commit instructions and diffs.
Training models to predict code changes based on natural language instructions.
Analyzing edit patterns across multiple programming languages mentioned in the description.
Strengths
Contains approximately 1,500 commits, providing a substantial sample for analysis.
Covers 13 distinct programming languages, offering linguistic diversity.
Explicitly designed for a specific evaluation task (retrieving a diff given its instruction).
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Built from CommitPackFT.
Collection Method
Code for reproduction is provided in the description.
Time Range
null
Freshness
Last updated 2024-04 28 23:35:38; freshness should be verified.
Geography
null
License is unknown; terms of use must be verified before download.