39,000 synthetic instruction-following examples for generating simple code across 8 programming languages. The dataset is structured in the Alpaca format and is available in both Indonesian and English, created by Sandroeth and last updated on 2026-05-26.
Use Cases
- Train models for multilingual code generation based on bilingual instruction-output pairs.
- Fine-tune instruction-following models for programming tasks based on the structured 'instruction', 'input', and 'output' fields.
- Benchmark model performance on generating code snippets across 8 different programming languages.
- Develop models that can switch between Indonesian and English contexts for coding assistance.
Strengths
- 39,000 total examples provide a substantial base for training.
- Each entry is available in two languages (Indonesian and English), supporting bilingual applications.
- Covers 8 distinct programming languages, offering diversity in code syntax and paradigms.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- huggingface
- Collection Method
- Synthetic dataset based on the Alpaca format.
- Freshness
- Last updated 2026-05-26 05:48:10; freshness should be verified.