1.79 million unique samples of medical chain-of-thought reasoning, containing approximately 3.78 billion tokens. The dataset was created by CrossNow and last updated on March 5, 2026. It combines outputs from seven state-of-the-art AI models and has undergone fair distribution deduplication.
Use Cases
- Fine-tuning language models for medical question-answering based on the chain-of-thought reasoning data.
- Training models to generate step-by-step medical explanations based on the reasoning tokens.
- Benchmarking the reasoning capabilities of AI models in the medical domain.
- Studying the distribution and patterns of AI-generated medical reasoning across multiple source models.
Strengths
- Large scale with 1,789,998 unique samples after deduplication.
- High reasoning content with ~1.56 billion reasoning tokens.
- Nearly complete coverage, with 1,789,764 samples (100.0%) containing reasoning.
- Combined output from seven state-of-the-art AI models.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- CrossNow on Hugging Face.
- Collection Method
- Combined outputs from seven AI models with fair distribution deduplication.
- Time Range
- null
- Freshness
- Last updated 2026-03-05 01:52:30; freshness should be verified.
- Geography
- null