Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,344 datasets
5.5 KB of tabular data contains benchmark performance statistics for the EAC-Agent multimodal conversational model. Shahid Jamil published the dataset on figshare in April 2026. The data likely includes accuracy, perplexity, BLEU, and ROUGE-L scores for emotion recognition and response generation.
Benchmark results from 2026 comparing a novel multimodal chatbot model against existing techniques. The dataset likely contains performance metrics for emotion classification and response generation, including accuracy, perplexity, BLEU, and ROUGE-L scores. It was authored by Shahid Jamil and uploaded to figshare.
Throughout 2012 and 2013, the Greater London Authority conducted a program of research to explore the impact of the London 2012 Olympic Games. The data likely contains the opinions, behaviors, and attitudes of Londoners and visitors to London, collected during and after the Games. This dataset aggregates the results from the GLA's Gamestime research.
UnityShotsBench is a multilingual, multi-cultural benchmark for evaluating multi-shot audio-video generation. Each case is a short cinematic story requiring consistent character identity, voice, and world persistence across cuts. The benchmark was released by KlingTeam in 2026 with the UnityShots research paper.
A 2018 high-level vulnerability assessment of historic assets along the Northern Ireland coast, prepared by Amey Consulting with HR Wallingford for government departments. This geospatial layer results from an Erosion Risk Appraisal stage, comparing erosion risk against asset value. The full report is published by the Department for Infrastructure and the Department of Agriculture, Environment and Rural Affairs.
2.5 million unfiltered reinforcement learning samples form the raw material for constructing the Vero-600k and Vero-1.6M datasets. The dataset, created by zlab-princeton, is intended for training multi-task visual reasoning models. It was last updated on June 11, 2026.
West Africa's road infrastructure within 200 kilometers of the coast, extracted from OpenStreetMap in March 2014. This dataset supports coastal vulnerability and risk assessment by providing a spatial inventory of transportation networks. The data is derived from a global, crowdsourced mapping project that is continually updated.
A simulation study evaluating model fit in multilevel structural equation models (ML-SEM) under conditions of within-person nonuniform measurement bias. The study simulated intensive longitudinal data (ILD) across 450 conditions varying sample sizes, retesting frequencies, and intraclass correlations. It was authored by Georg Krammer and last updated on April 23, 2026.
RT's Russian-language news headlines from October 7, 2023, to January 19, 2025, concerning the Israeli-Palestinian conflict. The dataset includes 8,757 distinct headlines filtered for conflict-related keywords, annotated for grammatical case using an LLM with human-reviewed validation. Author Lu, Tingting created this dataset to support a Keymorph Analysis study of narrative orientations in media coverage.
MedChat is a locally deployable virtual physician framework integrating an LLM-based medical chatbot with a diffusion-driven avatar for automated and structured anamnesis. The system was fine-tuned using a corpus of LLM-generated medical dialogues derived from publicly available symptom-disease datasets. The dataset was uploaded by Jan Benedikt Ruhland on April 16, 2026.
The Dutch National Archives manages a collection of approximately 14 million photographs, with about 1.1 million digitized. A significant subset of 430,000 images is available as open data, largely sourced from three specific archives: the Photo Collection First World War, the photo collection of the Poll, and the Photo Collection Anefo. The material is made available by the Ministerie van Binnenlandse Zaken en Koninkrijksrelaties.
The dataset describes the complex seabed morphology and sediment distribution of Keppel Bay, a large shallow coastal embayment in Queensland, influenced by Late Quaternary sea-level changes and the Fitzroy River. It was published by the Australian Ocean Data Network and last updated on 2026-04-28. The data reveals the former path of the Fitzroy River across the continental shelf and the infilling of palaeochannels during the Holocene.
31 Aug 2022 to 31 Jul 2023 survey acquired by the NSW government onboard the RV Bombora using a multibeam sonar. The dataset provides 5m resolution geotiff files of bathymetry and backscatter for the Solitary Islands Gumbaynggirr Yaegl Marine Park area. It was created as a baseline dataset to map the spatial distribution of seabed types under the SeabedNSW program.
A map layer shows approximate areas where yellow crazy ant infestations have been detected in Townsville. The data originates from the Townsville City Council's digital database and was last updated on 2026-05-14. The Council notes the information is for general purposes and makes no warranty regarding its accuracy, completeness, or currency.
Major and trace element data for plagioclase and whole-rock samples from Japan, filtered from the GEOROC online database as of 1 January 2017. The data was used to test the plagioclase porphyry indicator mineral method described in Williamson et al. (2016) on Japan as a negative control region. This dataset was produced by the British Geological Survey and used in a 2018 Resource Geology publication.
Electron microprobe glass chemistry data from explosive eruption deposits of Popocatépetl, Iztaccíhuatl and Tláloc-Telapón volcanoes in Central México. The dataset spans the last 700,000 years and is associated with a 2021 research paper published in the Journal of Volcanology and Geothermal Research. It originates from a NERC grant and is hosted by the British Geological Survey.
A four-dimensional (3D × time) biophysical dispersal model simulates the movement of marine larvae over semi-continuous surfaces. The model was applied to study connectivity patterns among Commonwealth Marine Reserves in Australia's northwest region. Results include animations of larval movement, dispersal surfaces over depth and time, and matrices of connectivity values.
Raw data supporting a manuscript submitted to Geochimica et Cosmochimica Acta. The dataset contains geochemical measurements related to iron redox states and hydrogen generation during serpentinization processes in rocks from central China. It was authored by Kai Wu and published on figshare under a CC-BY-4.0 license.
Yoshiharu Sawanobori published raw data on figshare in April 2026. The dataset contains the underlying data used to create figures for a study on the renal protective effects of difelikefalin in murine models of critical illness. It includes results from lipopolysaccharide-induced, cecum ligation and puncture-induced, and renal ischemia/reperfusion models.
Forster, Cape Hawke to Black Head in New South Wales, Australia, is covered by this seabed survey. The dataset contains 32-bit floating point GeoTIFF files of bathymetry and backscatter data at a 5-meter resolution, acquired by the NSW Department of Planning and Environment between 27 February 2019 and 14 October 2020. Data was collected using an R2Sonic 2022 multibeam sonar onboard the RV Bombora as part of the SeabedNSW program.