Sign in to view source links and access this dataset
Description
A self-distilled instruction-following dataset created by HarryMayne. It contains data elicited from four models—Qwen3.5-35B-A3B, Qwen3.5 397B-A17B, GPT-4.1, and Kimi K2.5—using prompts from the Dolma 3 corpus at temperature 1. The dataset was last updated on May 14, 2026.
Use Cases
Fine-tuning language models for improved instruction-following based on the self-distilled data.
Studying model behavior on negation tasks based on the dataset's stated focus.
Benchmarking instruction-following performance across different model architectures.
Analyzing self-distillation techniques for generating synthetic training data.
Strengths
Data is sourced from four distinct, advanced language models, providing varied outputs.
Prompts are derived from the established Dolma 3 corpus.
Explicit generation parameters are provided, using a temperature setting of 1.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
huggingface, author HarryMayne, companion repository from TruthfulAI-research.
Collection Method
Self-distillation from four language models using prompts from Dolma 3.
Time Range
null
Freshness
Last updated 2026-05-14 00:28:29; freshness should be verified.
Geography
null
License is unknown; users must verify terms before use.