Medical Reasoning SFT Mega: 1.79 Million Samples for Chain-of-Thought Tuning

Name: Medical Reasoning SFT Mega: 1.79 Million Samples for Chain-of-Thought Tuning
Creator: CrossNow
Published: 2026-03-05T01:52:30
Keywords: Chain Of Thought, Medical Reasoning, Healthcare, Text, Sft, Large Scale, Natural Language Processing

by CrossNowUpdated 4mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

1.79 million unique samples of medical chain-of-thought reasoning, containing approximately 3.78 billion tokens. The dataset was created by CrossNow and last updated on March 5, 2026. It combines outputs from seven state-of-the-art AI models and has undergone fair distribution deduplication.

Use Cases

Fine-tuning language models for medical question-answering based on the chain-of-thought reasoning data.
Training models to generate step-by-step medical explanations based on the reasoning tokens.
Benchmarking the reasoning capabilities of AI models in the medical domain.
Studying the distribution and patterns of AI-generated medical reasoning across multiple source models.

Strengths

Large scale with 1,789,998 unique samples after deduplication.
High reasoning content with ~1.56 billion reasoning tokens.
Nearly complete coverage, with 1,789,764 samples (100.0%) containing reasoning.
Combined output from seven state-of-the-art AI models.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: CrossNow on Hugging Face.
Collection Method: Combined outputs from seven AI models with fair distribution deduplication.
Time Range: null
Freshness: Last updated 2026-03-05 01:52:30; freshness should be verified.
Geography: null

null

Text Chain Of Thought Medical Reasoning Healthcare Sft Large Scale Natural Language Processing

Related Datasets

Quality Score

C44

Description

51

Source

41

Reputation

46

Access

26

Community

1.4K downloads

1 likes

0 views

Dataset Info

Author: CrossNow
Created: Mar 5, 2026
Updated: Mar 5, 2026
Last synced: May 14, 2026

Access

26

Community

1.4K downloads

1 likes

0 views

Dataset Info

Author: CrossNow
Created: Mar 5, 2026
Updated: Mar 5, 2026
Last synced: May 14, 2026

Medical Reasoning SFT Mega: 1.79 Million Samples for Chain-of-Thought Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info