Loading...
Loading...
Medical imaging (X-ray, CT, MRI), electronic health records, clinical trials, ECG/EEG, pathology
13,171 datasets
1973 to 1975 data from examinations of diseased fish biological condition, collected by NOAA NCEI. Records include species counts, disease prevalence, and individual fish damage assessments. Environmental parameters like water temperature and salinity were also recorded at sample locations.
Two years of Conductivity, Temperature, Depth (CTD) and other oceanographic data were collected from the NE Pacific between August 5, 1986 and November 30, 1988. The dataset comprises 381 casts submitted by David Behringer of the Atlantic Oceanographic and Meteorological Laboratory. Data is processed and stored in the NODC F022-CTD-Hi Resolution file format.
1950-1985 county-level data on hospitals and epidemiology stations across China includes names and years of operation. The data is produced at a 1:1 million scale. It was created by the University of Washington as part of the CITAS project and CIESIN.
Wenyao Zhang's dataset contains CryoEM density maps and X-ray crystallography structures for the transcriptional activator NifA, which is activated by 2-oxoglutarate for biological nitrogen fixation. The data supports structural studies of protein activation mechanisms. It is available under a CC-BY-4.0 license and is hosted on multiple platforms.
Known as titled 'thoracic_surgery'. No further descriptive information, column details, row count, or provenance is available.
Systems serology profiles from 182 individuals track antibody isotypes and Fc-mediated effector functions against HPV16 E2 and E7 antigens in oropharyngeal squamous cell carcinoma (OPSCC) patients. Developed by Vicky Roy and hosted on Harvard Dataverse, the data includes longitudinal observations spanning over 700 days across diagnosis, treatment, and recurrence phases.
A dataset supporting the DICE-RL framework for refining pretrained generative robot policies via reinforcement learning. The data was uploaded by author 'wintermelontree' to Hugging Face on March 20, 2026. It originates from the research paper 'From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning'.
A clinical dataset from figshare, authored by Ryohei Nishiguchi, examining the prognostic value of time-dependent body mass index changes in gastric cancer patients. The dataset is 73.5 KB in size and was last updated on March 22, 2026. It is provided as an XLSX file under a CC0 1.0 license.
Step-CoT is a large-scale medical reasoning dataset containing over 10,000 real clinical chest X-ray cases and 70,000 visual question-answering pairs. It was created by author fl-15o and last updated on March 23,ๆไปฌๅ็ฐ2026. Each VQA pair is structured into a seven-step diagnostic workflow designed to mirror clinical reasoning.
Harvard Dataverse hosts de-identified clinical and surgical information from an early institutional series of robotic head and neck procedures. The dataset, authored by Stella Tsai, captures case characteristics, surgical access routes, pathological categories, and selected operative variables. It was last updated on 2026-04-16.
A dataset derived from paper-based internal transfer forms and neonatal admission records used in Kenyan public hospitals providing newborn inpatient care. It contains images of clinical documents paired with JSON files providing gold-standard extracted data, intended for benchmarking AI and LLM performance in clinical data extraction. The dataset is a product of a hybrid paper-digital pipeline designed for rapid deployment in resource-limited clinical settings.
ReXGroundingCT links free-text radiology findings with pixel-level segmentations in 3D chest CT scans. The dataset contains 3,142 CT scans with 8,028 segmented findings across 14 categories. It was created by rajpurkarlab and updated in March 2026.
Cleaned diagnostic records provide 14 clinical features for cardiac risk prediction. The dataset is hosted on Kaggle, but its author, size, and specific origin are not detailed. Its primary purpose is to support classification tasks related to heart disease.
Healthcare Messy Data is a dataset hosted on Kaggle. The title suggests it contains healthcare-related information that is intentionally or unintentionally messy, likely intended for data cleaning and preprocessing practice. No further metadata on its origin, size, or specific content is available.
A NASA Ames study proposes a probabilistic framework for detecting delamination location and size in composite materials. The method uses a Bayesian Imaging technique on Lamb wave signals collected from fatigue testing, with results validated against X-ray images. The dataset likely contains signal features and diagnostic results from this novel methodology.
Materials document a systematic review of infrared thermography for tongue-based medical diagnostics. The repository includes a collection of reviewed publications, a PRISMA flow diagram, and the detailed search strategy with MeSH terms. Karolina Jezierska compiled this dataset at Harvard Dataverse, last updated in April 2026.
The Coronal Diagnostics Spectrometer (CDS) onboard the SOHO spacecraft studies emission line characteristics in the extreme ultraviolet (EUV) of the Sun. The dataset likely contains observations of the Sun's coronal properties and its interaction with magnetic structures resulting from photospheric activity. All SOHO/CDS data are available from public archives maintained by NASA, Rutherford Appleton Laboratory, Institut d'Astrophysique Spatiale, and Osservatorio Astronomico di Torino.
Tri-NL provides 1,500 synthetic reasoning samples across mathematics, coding, and healthcare domains, published by crevious in March 2026. Each record contains a problem and a solution structured into numbered reasoning steps and a final answer.
This dataset provides quantitative morphological descriptions of natural and artificial Tire and Road Wear Particles (TRWP/ATWP). It contains high-resolution X-ray computed tomography (ยตCT) data, including segmented TIFF stacks and analysis results, focused on characterizing particle structure and pore space. The data supports research into the environmental impact and physical properties of these emerging pollutants.
A research-grade collection of skin disease images, processed to remove duplicates and stratified. The dataset is hosted on Kaggle, but its creator, size, and specific contents are not detailed in the provided metadata. Its title suggests it is intended for rigorous medical or machine learning research.