Sign in to view source links and access this dataset
Description
129,046 curated medical multiple-choice questions across 15 subjects and 804 topics, built for fine-tuning small language models. The dataset is schema-standardized, tagged with Bloom's taxonomy, syllabus-aligned, and curated against 24 published item-writing flaws. This sample contains 1,500 questions, with the full dataset available for purchase from the author stravoris, last updated in May 2026.
Use Cases
Fine-tuning medical question-answering models based on the standardized question schema.
Evaluating model performance on Bloom's taxonomy levels as tagged in the data.
Training models to identify and avoid common item-writing flaws referenced in the description.
Developing educational tools aligned with specific medical syllabus topics mentioned in the description.
Strengths
129,046 total questions providing substantial volume for training.
Questions are organized across 15 subjects and 804 topics for structured coverage.
Each question is curated against 24 published item-writing flaws for quality control.
Standardized schema and Bloom's taxonomy tagging enable consistent analysis.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
The full dataset is sold commercially, with only a 1,500-question sample freely available.
Provenance
Source
stravoris
Collection Method
Curated collection for medical education and model fine-tuning.
Time Range
First Edition, temporal coverage unknown.
Freshness
Last updated 2026-05-19 01:17:01; freshness should be verified.
Geography
Geographic coverage unknown.
The full dataset is sold through Payhip; the free sample is limited to 1,500 questions. License terms are unknown.