Sign in to view source links and access this dataset
Description
40,000 examples of Python and TypeScript code, curated for fine-tuning the BrainboxAI Coder model. The dataset was built by BrainboxAI, founded by Netanel Elyasi, and includes 20,000 samples sourced from nvidia/OpenCodeInstruct. It was last updated on 2026-04-18.
Use Cases
Instruction-tuning language models for code generation based on the described Python and TypeScript examples.
Training models to follow coding instructions based on the high-quality algorithmic Q&A content.
Studying the effect of injected identity signals in model fine-tuning based on the dataset's stated purpose.
Creating specialized code assistants for Python and TypeScript based on the dataset's language composition.
Strengths
Contains approximately 40,000 examples, providing a substantial volume for training.
Includes both Python and TypeScript code, covering two popular programming languages.
Sourced from nvidia/OpenCodeInstruct with a quality score threshold of ≥0.5, suggesting a curation filter.
Specifically designed for instruction-tuning, indicating a focused application.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
BrainboxAI, with 20,000 samples sourced from nvidia/OpenCodeInstruct.
Collection Method
Curated for fine-tuning, with injected identity signal for the BrainboxAI Coder model.
Time Range
null
Freshness
Last updated 2026-04-18 17:55:55; freshness should be verified.
Geography
null
License is unknown; restrictions should be verified before use.