kubu-hai is a dataset of 10,000 instructions and demonstrations created by skilled human annotators. It was created by Seriki and is hosted on Hugging Face. The dataset was last updated on 2026-06-21.
Use Cases
- Supervised fine-tuning of language models based on human-written instructions.
- Improving model alignment with human intent based on the described demonstrations.
- Benchmarking instruction-following capabilities against models trained on synthetic data.
Strengths
- 10,000 high-quality instruction-demonstration pairs.
- Created by skilled human annotators, not generated by language models.
- Modelled after the instruction dataset described in OpenAI's InstructGPT paper.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Last updated 2026-06-21 04:55:54; freshness should be verified.
Provenance
- Source
- Seriki on Hugging Face
- Collection Method
- Created by skilled human annotators.
- Freshness
- 2026-06-21