120 high-quality synthetic examples are designed to train smaller language models to become more truthful and recognize their own knowledge boundaries. The dataset, created by Aadeshisdoingsomething, forces models to output a structured query-and-verification routine before answering. It was last updated on June 4, 2026.
Use Cases
- Training models to output a verification scratchpad based on the described structured routine.
- Improving model truthfulness based on the dataset's focus on explicit verification.
- Teaching models to recognize knowledge boundaries using the provided synthetic examples.
Strengths
- Contains 120 examples explicitly described as high-quality.
- Designed for a specific model size range of 1B to 8B parameters.
- Focuses on a clear, structured training mechanic for truthfulness.
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface user Aadeshisdoingsomething
- Collection Method
- Synthetically generated, as described.
- Freshness
- Last updated 2026-06-04 19:14:59; freshness should be verified.