Multimodal & LLM Datasets

Multimodal & LLM

<b>StatFootDB: A Longitudinal Multimodal Dataset of Football Match Events, Player Performa

Five consecutive seasons (2020/2021 to 2024/2025) of football data from the five major European leagues, comprising 8,979 matches. The dataset includes detailed match events, player performance metrics, and spatio-temporal shot trajectories. It was created by oualid dehbane and published on figshare under a CC-BY-4.0 license.

TabularTime SeriesGeospatialMultimodalCSVJSONPlayer PerformanceShot TrajectoriesNatural Language ProcessingSports AnalyticsFootball Soccer+1

2 views

Multimodal & LLM

Multimodal Image Registration Benchmark: Aerial, Cytological, and Histological Data

The dataset is an evaluation set for multimodal image registration, containing 864 image pairs created from three publicly available 2D datasets. It includes Aerial (Near-Infrared and RGB), Cytological (Fluorescence and Quantitative Phase), and Histological (Second Harmonic Generation and Bright-Field) image pairs, each subjected to four levels of rigid transformations. The data is structured for 3-fold cross-validation, with specific sample counts per fold for each sub-dataset.

ImageGeospatialMultimodalMedical ImagingImage RegistrationBenchmarkComputer VisionAerial ImageryMultimodal imagingSynthetic+1

0 views

Multimodal & LLM

Aligned Bright-Field and SHG Image Pairs for Registration Method Evaluation

206 aligned Bright-Field and Second-Harmonic Generation image pairs, each 834x834 pixels, for evaluating registration methods. The dataset includes a training set of 40 pairs, two validation sets of 25 and 7 pairs, and a test set of 134 pairs with applied random transformations. It was produced by Kevin W. Eliceiri of the University of Wisconsin–Madison, based on data from a breast carcinoma survival study.

ImageGeospatialMultimodalTissue MicroarrayBiomedical ImagingMultimodal DataImage RegistrationBenchmarkComputer VisionBreast cancer+1

0 views

Multimodal & LLM

A Multimodal Dataset for Mixed Emotion Recognition with Physiological and Video Data

A Multimodal Dataset for Mixed Emotion Recognition contains physiological and video data from 73 participants watching emotion-inducing video clips. The dataset includes EEG, GSR, PPG, and frontal face video signals, alongside self-reported ratings on PANAS, VAD, and amusement-disgust dimensions. It was created by Pei Yang from Tsinghua University and shared via Papers with Code.

MultimodalAffective ComputingSelf AssessmentEmotion RecognitionPhysiological DataMultimodal Signals+1

0 views

Multimodal & LLM

Visual Enhancement of Touch and Bodily Self Perception Data

Experimental data from a study on the visual enhancement of touch effect and its relation to the bodily self, using the rubber hand illusion. The study was conducted by Matthew R. Longo of University College London. Tactile acuity was measured by having participants judge the orientation of square-wave gratings under different visual and stroking conditions.

TabularPerceptionBodily SelfComputer VisionCognitive ScienceRubber Hand IllusionTouch Vision Interaction+1

0 views

Multimodal & LLM

Multimodal Breast Ultrasound Images for Malignancy Risk Stratification

3,703 ultrasound images from 2,685 patients were used to develop machine learning models for breast cancer diagnosis. The dataset includes 2,069 benign and 616 malignant cases collected between July 2019 and March 2024. Shengxin Pei authored this research, which compares models using BI-RADS terminology, ultrasound imaging, and radiomics features.

MultimodalMachine LearningRadiomicsUltrasoundMedical ImagingHealthcareBreast cancer+1

0 views

Multimodal & LLM

Multimodal Biomedical Dataset: 206 Aligned BF and SHG Images for Registration

A collection of 206 aligned Bright-Field and Second-Harmonic Generation images, each 2048x2048 pixels, for evaluating medical image registration methods. The dataset, produced by Kevin W. Eliceiri of the University of Wisconsin–Madison, is partitioned into training (40 pairs), validation (32 pairs), and test (134 pairs) sets. It originates from research published in Nature Communications Biology on non-disruptive collagen characterization in clinical histopathology.

ImageMultimodalTissue MicroarrayBiomedical ImagingImage RegistrationBenchmarkHealthcareComputer VisionHistopathologyMultimodal imaging+1

0 views

Multimodal & LLM

Multimodal host-guest complexation for efficient and stable perovskite photovoltaics

École Polytechnique Fédérale de Lausanne provides characterization data for research on multimodal host-guest complexation in perovskite photovoltaics. The data includes structural, optoelectronic, and photovoltaic measurements from figures and supplementary information, plus molecular dynamics and DFT calculation files. The dataset supports the findings of the paper with DOI 10.1038/s41467-021-23566-2.

MultimodalMultimodal CharacterizationPerovskite PhotovoltaicsComputer VisionComputational ChemistryMaterials Science+1

0 views

Multimodal & LLM

Transport Canada Safety Oversight Staffing Levels by Fiscal Quarter, 2017-2018 Onward

Starting Q1 2017-2018, this table tracks the number of employees delivering multimodal safety and security program oversight for Transport Canada. The data is provided by Statistics Canada and is available in CSV, HTML, and XML formats. It was last updated on 2026-07-06.

TabularTime SeriesCSVXMLTransport CanadaStaffing LevelsFiscal QuarterSafety Oversight+1

0 views

Multimodal & LLM

Transport Canada Multimodal Safety Inspections by Region and Program

Transport Canada's Multimodal Safety and Security Programs Oversight Delivery Indicators contain the number of completed inspections by programs, activities, and administrative regions. The data starts with the four quarters of the 2017-2018 federal government fiscal year. It is published by Statistics Canada on the open_canada platform.

TabularCSVXMLTransport CanadaSafety InspectionsGovernment OversightMultimodal Transport+1

0 views

Multimodal & LLM

Transport Canada Immediate Risk Reduction Measures by Program and Fiscal Quarter

Transport Canada's Multimodal Safety and Security Programs data on immediate risk reduction measures. The table contains the type and number of measures by program, starting from the first quarter of the 2019-2020 federal fiscal year. Statistics Canada is listed as the organization responsible for the data.

TabularTime Series🇨🇦 CanadaCSVXMLRisk ManagementGovernment ProgramsTransportation Safety+1

0 views

Multimodal & LLM

Transport Canada Multimodal Safety and Security Enforcement Actions by Fiscal Quarter

Transport Canada's Multimodal Safety and Security Programs Oversight Delivery Indicators include enforcement actions. The data covers the type and number of actions by program, starting from the first quarter of the 2019-2020 federal fiscal year. Statistics Canada publishes this data in CSV, HTML, and XML formats under the OGL-CA-2.0 license.

Tabular🇨🇦 CanadaCSVXMLGovernment DataTransportation SafetyMultimodal TransportEnforcement Actions+1

0 views

Multimodal & LLM

K-EmoCon: Continuous Emotion Annotations from Three Perspectives During Debates

K-EmoCon is the first publicly available emotion dataset accommodating multiperspective assessment during social interactions. It contains multimodal sensor data from 16 paired debate sessions, each approximately 10 minutes long, with continuous emotion annotations made every 5 seconds. The dataset was created by Cheul Young Park at KAIST and released in 2020.

MultimodalSocial InteractionAffective ComputingEmotion RecognitionMultimodal SensorContinuous Emotion+1

0 views

Multimodal & LLM

TreeScope Vat0723: A Multimodal Sample Dataset for Computer Vision

10 samples constitute this multimodal dataset from Voxel51, hosted on Hugging Face. It is designed for use with the FiftyOne computer vision toolkit. The dataset was last updated on July 16, 2026.

MultimodalSample DatasetComputer VisionFiftyone+1

0 views

Multimodal & LLM

NeuMa: Multimodal Neuromarketing Data from 42 Participants

A dataset from 42 individuals who browsed digital supermarket brochures while their neural and ocular activity was recorded. Data includes encephalographic (EEG) recordings, eye tracking (ET) recordings, questionnaire responses, and computer mouse interactions. The dataset was created by Kostas Georgiadis and is available under an Open Access license.

MultimodalEye TrackingEegConsumer behaviorNeuromarketingSynthetic+1

0 views

Multimodal & LLM

Multimodal Healthy Control Dataset: 100 Volunteers for Medical Imaging Research

100 healthy control volunteers (80 train, 20 test) provide data for the Big Cross-Modal Attenuation Correction Challenge (BIC-MAC) held in conjunction with MICCAI 2026. The dataset is released by DEPICT-RH and was last updated in July 2026. It includes PET readouts and is associated with a GitHub repository for the project.

MultimodalMedical ImagingMultimodal DataHealthy controlsPet ScansNeuroimaging+1

0 views

Multimodal & LLM

GBIF Species Occurrence Records for 22292 Specimens and Literature Observations

GBIF.Org User provides a dataset of 22,292 species occurrence records matching a specific taxonomic query. The data is aggregated from 304 constituent datasets and includes only records with geographic coordinates and no known geospatial issues. Records are limited to specimens or literature occurrences for a defined list of plant taxa, including genera like Caesalpinia and Haematoxylum.

TabularGeospatialBotanySpecies OccurrenceGbifBiodiversity+1

0 views

Multimodal & LLM

NSFW Chinese Adult Image Caption Dataset with 500 Annotated Images

Approximately 500 manually annotated images of Chinese adults, with the dataset last updated on 2026-07-12. The dataset is created by Richarddzz and contains fine-grained textual descriptions and labels for adult content images, intended for research and model training. It is hosted on Hugging Face, with the full dataset requiring contact via Telegram for access.

MultimodalNsfw ContentMultimodal TrainingComputer VisionImage CaptioningContent ModerationChinese Adult+1

0 views

Multimodal & LLM

Physician Psychophysiological Responses to Meditative Relaxation vs. Rest, 64 Participants

64 hospital physicians participated in a randomized exploratory trial comparing a 15-minute guided meditative relaxation session to an unguided rest condition. The dataset includes physiological parameters (blood pressure, heart rate) and psychological measures (SPPN, QSCPGS, SRSI3_29) collected at baseline, immediately post-session, and a few hours later. The supplementary file was authored by Siddhiraj Banjac and uploaded to figshare in May 2026.

TabularRandomized StudyMind Body InterventionPhysician BurnoutBenchmarkClinical TrialPsychophysiology+1

0 views

Multimodal & LLM

Physician Stress Reduction Trial Data from a French Hospital Study

64 hospital physicians participated in a randomized exploratory trial comparing a 15-minute guided meditative relaxation session to an unguided rest condition. The dataset includes physiological parameters (blood pressure, heart rate) and psychological measures (SPPN, QSCPGS, SRSI3_29) collected at baseline, immediately post-session, and a few hours later. The study was authored by Siddhiraj Banjac and published on figshare in 2026 under a CC-BY-4.0 license.

TabularMind Body InterventionPhysician BurnoutHospital StudyStress ReductionBenchmarkClinical Trial+1

0 views