Sign in to view source links and access this dataset
Description
32,631 patient records and 7,946,296 medical images comprise this collection of CT scan reports without clinical findings. The dataset is designed to support AI development in healthcare and medical imaging, capturing authentic scanner variability and acquisition protocols. It was created by InfoBayAI and last updated on Hugging Face in June 2026.
Use Cases
Training anomaly detection models based on the large-scale collection of images without findings.
Developing AI systems for medical imaging that learn from authentic scanner variability and acquisition protocols mentioned in the description.
Benchmarking natural language processing models on structured and unstructured clinical text from radiology reports.
Creating synthetic data generation pipelines using the characteristics of reports without findings.
Strengths
Contains data from 32,631 patients, providing a substantial sample size.
Includes 7,946,296 medical images, offering significant scale for model training.
Designed to capture authentic imaging characteristics like scanner variability and patient positioning.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
The description metadata is limited; actual data quality requires manual inspection after download.
The dataset consists solely of reports without findings, which may limit its use for certain diagnostic tasks.
Provenance
Source
InfoBayAI via Hugging Face.
Collection Method
Collection of CT scan reports and associated images, likely from medical institutions.
Freshness
Last updated 2026-06-02 05:45:20; freshness should be verified.
License is unknown; terms of use must be verified before application.