Negation Neglect: Self-Distilled Instruction Data from Four LLMs

Name: Negation Neglect: Self-Distilled Instruction Data from Four LLMs
Creator: HarryMayne
Published: 2026-05-13T02:30:18
Keywords: Negation, Self Distillation, Text, Language Model, Instruction Following

by HarryMayneUpdated 2mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

A self-distilled instruction-following dataset created by HarryMayne. It contains data elicited from four models—Qwen3.5-35B-A3B, Qwen3.5 397B-A17B, GPT-4.1, and Kimi K2.5—using prompts from the Dolma 3 corpus at temperature 1. The dataset was last updated on May 14, 2026.

Use Cases

Fine-tuning language models for improved instruction-following based on the self-distilled data.
Studying model behavior on negation tasks based on the dataset's stated focus.
Benchmarking instruction-following performance across different model architectures.
Analyzing self-distillation techniques for generating synthetic training data.

Strengths

Data is sourced from four distinct, advanced language models, providing varied outputs.
Prompts are derived from the established Dolma 3 corpus.
Explicit generation parameters are provided, using a temperature setting of 1.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface, author HarryMayne, companion repository from TruthfulAI-research.
Collection Method: Self-distillation from four language models using prompts from Dolma 3.
Time Range: null
Freshness: Last updated 2026-05-14 00:28:29; freshness should be verified.
Geography: null

License is unknown; users must verify terms before use.

Text Negation Self Distillation Language Model Instruction Following

Related Datasets

Quality Score

D34

Description

32

Source

36

Reputation

41

Access

26

Community

62 downloads

1 likes

0 views

Dataset Info

Author: HarryMayne
Created: May 13, 2026
Updated: May 14, 2026
Last synced: May 26, 2026

Access

26

Community

62 downloads

1 likes

0 views

Dataset Info

Author: HarryMayne
Created: May 13, 2026
Updated: May 14, 2026
Last synced: May 26, 2026

Negation Neglect: Self-Distilled Instruction Data from Four LLMs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info