Loading...
Loading...
Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data
1,534 datasets
3,703 ultrasound images from 2,685 patients were used to develop machine learning models for breast cancer diagnosis. The dataset includes 2,069 benign and 616 malignant cases collected between July 2019 and March 2024. Shengxin Pei authored this research, which compares models using BI-RADS terminology, ultrasound imaging, and radiomics features.
Five consecutive seasons (2020/2021 to 2024/2025) of football data from the five major European leagues, comprising 8,979 matches. The dataset includes detailed match events, player performance metrics, and spatio-temporal shot trajectories. It was created by oualid dehbane and published on figshare under a CC-BY-4.0 license.
64 hospital physicians participated in a randomized exploratory trial comparing a 15-minute guided meditative relaxation session to an unguided rest condition. The dataset includes physiological parameters (blood pressure, heart rate) and psychological measures (SPPN, QSCPGS, SRSI3_29) collected at baseline, immediately post-session, and a few hours later. The supplementary file was authored by Siddhiraj Banjac and uploaded to figshare in May 2026.
64 hospital physicians participated in a randomized exploratory trial comparing a 15-minute guided meditative relaxation session to an unguided rest condition. The dataset includes physiological parameters (blood pressure, heart rate) and psychological measures (SPPN, QSCPGS, SRSI3_29) collected at baseline, immediately post-session, and a few hours later. The study was authored by Siddhiraj Banjac and published on figshare in 2026 under a CC-BY-4.0 license.
A pelvic MRI dataset of 74 subjects and 3,449 T2-weighted slices from two institutions for developing AI models for uterus segmentation in endometriosis. The dataset was used to fine-tune the Endo-MedSAM model, achieving mean 3D Dice scores of 0.81–0.88 with bounding-box prompts. The dataset was uploaded by Rawan AlSaad on figshare in May 2026.
17 healthy participants (7 females, 10 males, aged 19–34) performed walking activities across diverse indoor and outdoor terrains. The dataset includes motion data from 7 inertial sensors, foot pressure from 96-point force sensors, and visual data from 3 front-facing cameras, all annotated with 16 locomotion state classes. Collected by Chen Wang and shared under a CC-BY-4.0 license, this 16.9 GB dataset was last updated on 2026-05-31.
Renjie Lu developed a multimodal model integrating tumor radiomics and lymph node morphology for predicting axillary nodal metastasis burden in breast cancer. The dataset includes information from 583 patients with pathologically confirmed breast cancer, split into training and testing cohorts. The model was last updated on June 4, 2026.
Micro-OD is a benchmark of 252 images curated for in-context learning, with bounding-box annotations for 11 cell types across four sources. It was created by Shreyan Ganguly and last updated in May 2026. The dataset is designed to evaluate vision-language models for few-shot object detection in biomedical microscopy.
A 2026 evaluation assesses the capabilities of foundation models like MatchAnything RoMa and ELoFTR for multimodal image matching in materials science. The analysis uses the AmalgaMatch dataset, which contains 187 image pairs across six distinct matching tasks and 19 different materials. The work was authored by Ali Riza Durmaz and is shared under a CC-BY-4.0 license.
Crowdsourced typing preference data from a study that derived ergonomics objectives from user preferences. The dataset includes materials for the Engram approach to optimizing keyboard layouts for English and Spanish, created by Arno Klein and last updated in May 2026. It contains data, software, documentation, and layouts totaling 10.6 MB.
Twenty individuals with mild traumatic brain injury and 24 healthy controls underwent advanced diffusion MRI and cognitive assessment. The data includes multi-shell DTI, free-water corrected DTI, diffusion kurtosis imaging, and NODDI metrics, linked to MoCA and GOS-E clinical scores. Authored by Maurizio Bergamino and shared under CC-BY-4.0, this dataset was last updated on May 28, 2026.
Eighty-seven subjects with at-risk mental states (ARMS) were followed up, with clinical outcomes classified into four ordered categories. The dataset contains baseline measures for 15 explanatory variables, including clinical symptoms, cognitive functioning, and electrophysiological measures like P300 and mismatch negativity. The data was authored by Kazuya Nagasawa and last updated on 2026-05-28.
A replication package from an experimental study evaluating a multimodal chatbot as a pedagogical mediator for digital literacy among elderly women. The dataset includes anonymized participant data, task completion time records, success rates, and axial networks built from transcripts. It was authored by AMANDA SALES and last updated on 2026-05-30.
187 image pairs from the AmalgaMatch dataset, partitioned into six distinct matching tasks and 19 material subsets, facilitate evaluation of foundation models for multimodal image registration. Ali Riza Durmaz published this supplementary PDF in May 2026 under a CC-BY-4.0 license. The dataset covers metals, alloys, and ceramics imaged with diverse microscopy modalities, presenting challenges like limited mutual information and field-of-view ratios as low as 2%.
Supplementary file 1 from a retrospective cohort study by Hui Zhang, published on figshare in 2026. The data likely contains results from 82 high-risk parturients receiving a multimodal analgesic protocol and 79 historical controls, collected between January 2023 and December 2024. Outcomes include postpartum depression incidence, Edinburgh Postnatal Depression Scale scores, Pittsburgh Sleep Quality Index scores, and opioid consumption.
A curated subset of 35,794 image-caption pairs from the Conceptual Captions dataset, re-annotated in Russian for accessibility. The data was processed through semantic clustering of 2,484 groups and re-annotated using teacher vision-language models. It was created by Pavel Mikheyev and last updated in May 2026.
Lin-Feng Zhou's dataset supports a study developing a multimodal model to predict overt hepatic encephalopathy (OHE) within one year after a transjugular intrahepatic portosystemic shunt (TIPS) procedure. The data includes manual CT features, radiomics, and clinical data from 338 patients treated between November 2015 and January 2022. The combined model (Model MRC) achieved an area under the ROC curve of 0.902.
A multimodal dataset was used to develop a predictive model for overt hepatic encephalopathy (OHE) within one year after a transjugular intrahepatic portosystemic shunt (TIPS) procedure. The study by Lin-Feng Zhou, last updated in May 2026, integrated manual CT imaging features, radiomics, and clinical data from 338 patients treated between November 2015 and January 2022. The combined model (Model MRC) demonstrated superior predictive performance with an AUC of 0.902.
A PDF supplementary file describes a decision-support framework for optimizing last-mile mail delivery in Australian regional areas. The study integrates mail demand and GIS data with an optimization engine to coordinate van, walking, and cycling routes. The system reportedly achieved reductions of up to 21.67% in delivery time and 11.36% in COâ‚‚ emissions compared to van-only operations.
A collection of datasets for training and evaluating machine learning models on small-molecule natural products. The data, totaling 128.0 MB, was compiled by Zhenming Liu from multiple public databases including COCONUT, NPASS, LOTUS, and MIBiG. The collection was last updated on 2026-04-30.