Encompassing Abstract Syntax Trees (ASTs) from 450,000 Python method definitions extracted from 13,587 different GitHub repositories. It was created by teams at Stanford as part of the Open Graph Benchmark for graph machine learning tasks.
Use Cases
- Train graph neural networks on Abstract Syntax Tree structures to classify or generate Python methods.
- Analyze code patterns and method structures across 13,587 different GitHub repositories.
- Use the AST representations for tasks like code summarization or clone detection.
Strengths
- Contains 450,000 Python method definitions, providing a substantial corpus for analysis.
- Sourced from 13,587 different GitHub repositories, offering diversity in code origin.
Limitations
- The specific features and structure of the ASTs are not detailed in the provided input.
- The dataset's temporal coverage and update frequency are unknown.
Provenance
- Source
- GitHub CodeSearchNet, specifically from popular projects on GitHub.
- Collection Method
- Methods were extracted and processed into Abstract Syntax Trees by teams at Stanford.
- Time Range
- null
- Freshness
- null
- Geography
- null