Encompassing programming language tokens curated for the CodeXGLUE benchmark to support the task of next-token prediction. It provides structured code sequences designed to evaluate model performance using token-level accuracy metrics.
Use Cases
- Train a model to predict the next code token given a context of previous tokens.
- Benchmark the token-level accuracy of code generation models.
- Implement code completion features for software development environments using the token sequences.
Strengths
- Includes tokenized source code sequences for next-token prediction tasks.
- Uses token-level accuracy as the standardized evaluation metric for model performance.
- Part of the Microsoft CodeXGLUE collection, a benchmark for code understanding and generation.