Sign in to view source links and access this dataset
Description
NickIBrody's x86-asm-instructions-23k dataset contains 23,104 rows of x86 and x86_64 assembly snippets paired with short natural-language instructions or comments. The dataset is formatted as JSONL with instruction, output, and system fields, and was last updated on Hugging Face in April 2026. It is designed for instruction-tuning tasks.
Use Cases
Instruction-tuning of large language models for assembly code generation based on natural language prompts.
Training models to generate natural language comments or explanations for given assembly code snippets.
Fine-tuning models for translating between assembly instructions and their functional descriptions.
Building educational tools that pair assembly examples with instructional text.
Strengths
Contains 23,104 instruction-output pairs, providing a substantial corpus for training.
Instruction and output fields have quantified average lengths (47.0 and 514.25 characters, respectively), indicating consistent structure.
Explicitly designed for the instruction-tuning paradigm with a defined JSONL format.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality and source diversity require manual inspection.
The dataset appears to be a single snapshot; freshness and temporal coverage are not detailed.
Provenance
Source
Hugging Face dataset by author NickIBrody.
Collection Method
Likely gathered from assembly code sources and paired with corresponding instructions or comments.
Time Range
null
Freshness
Last updated 2026-04-20 18:29:02; freshness should be verified.
Geography
null
License information is unknown; users must verify permissions before use.