Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,560 datasets
22 women mountain bikers aged 29–61 participated in semi-structured virtual interviews averaging 51 minutes. The interviews were transcribed verbatim and analyzed to explore experiences of social support within the male-dominated sport. The dataset is a qualitative research document authored by Isabella Gore Rivero and published on figshare.
299 stimulation cycles from a retrospective cohort study analyzing oocyte yield in relation to early follicular phase androgen levels. The dataset includes odds ratios and ROC curve data for testosterone and DHEAS as predictors of poor ovarian response, authored by M. L. Münch and last updated in April 2026. It is shared under a CC-BY-4.0 license on figshare.
Municipal Veterans Representatives across Connecticut provide contact and location information for local assistance. The data supports a public 'Find a Veterans Representative' tool and is maintained by the Connecticut Department of Veterans Affairs. Records were last updated in May 2025.
A 5-year simulation of wastewater treatment in Gotland, Sweden, modeling six scenarios including conventional configurations and progressive urine diversion adoption. The study, authored by Erika Cristina Francisco and published on figshare in 2026, assessed environmental performance through nitrogen and phosphorus emissions, marine eutrophication potential, water footprint, and carbon footprint. Results show urine diversion significantly reduced peak nutrient emissions and enabled nutrient recovery for fertilizer.
Edition 3, updated in 2026, incorporates new survey material and revised classification. The dataset provides a statewide coverage of land types, originally described by Rowan in 1989, within a geomorphological framework. It consolidates land resource information from studies conducted over approximately forty years, with reliability varying by region.
Per-segment captions for multi-view video datasets of humans and animals. Captions were generated from masked multi-view composites using the Gemini 3 Flash model and follow the ActivityNet-style dense video-captioning layout. The dataset was authored by 'andaba' and last updated on June 12, -2026.
Records of the temporary prohibition to transfer, convert, or move assets, and the custody of goods purchased with money from illegal activities or used for criminal acts, as ordered by competent authorities. The data likely contains entries from Colombian departments and municipalities, with columns for location, date, asset type, and quantity. It is published by www.datos.gov.co and was last updated on 2026-05-19.
261.0 KB of supplementary materials for a manuscript on transforming tourist reviews into AI teaching resources. The package includes de-identified review data, pedagogical units, aggregate statistics, and reproducible Python scripts. Author Yuechuan Yu last updated the materials in May 2026.
Masig Island weather sensor data collected by the NERP Weather Station. The Australian Ocean Data Network published this dataset, which was last updated on 2026-06-04. The data likely contains environmental measurements from a deployed sensor network.
The OMUFPMET product provides selected meteorological fields from the GEOS-5 FP-IT assimilated model, co-located in space and time with the OMI/Aura UV-2 satellite swath. Each file is approximately 45mb in size and is stored in netCDF4 format, which is compatible with most netCDF and HDF5 readers. The product was developed by the Global Modeling and Assimilation Office (GMAO) and is led by Zachary Fasnacht of SSAI, with Joanna Joiner as the responsible NASA official.
WMS Geothermie Kreis Wesel is a geospatial dataset from the Bundesamt für Kartographie und Geodäsie. It contains layers showing the potential of medium-depth and deep hydrothermal geothermal energy in the Wesel district of Germany. The data was created as part of a preliminary study to support the heat transition and national climate targets for 2045.
SpellKarm is a multiple-choice benchmark created by fromziro to test the spelling capabilities of large language models. It contains approximately 1,000 English words, ranging from common stop words to complex multisyllabic terms. Each question has about 5 answer choices, resulting in a random chance accuracy of 20%.
Slope in degrees over a 1-kilometer grid for the state of Victoria, generated by Entura for the Solar Atlas project. The dataset is provided by the Department of Energy, Environment and Climate Action and was last updated on April 9, 2026. A detailed methodology report is available for download from the source platform.
NASA's Global Modeling and Assimilation Office provides meteorological fields from the GEOS-5 FP-IT assimilated product, spatially and temporally co-located with the Ozone Monitoring Instrument (OMI) UV-2 swath. The product includes layer pressure thickness, surface pressure, vertical temperature profiles, surface potential, and mid-layer pressure along with geolocation information. Each file is approximately 45mb in size and is in netCDF4 format.
Masig Island wind data collected by weather sensors deployed on the NERP Weather Station site. The dataset originates from the Australian Ocean Data Network and was last updated on 2026-06 04. The specific temporal coverage begins on 26 July 2013.
82.3 KB SNOMED-CT codelist created by Jamie Scuffell to identify prescriptions of oral and depot injectable antipsychotic medications. The dataset was last updated on May 27, 2026, and is shared under a CC-BY-4.0 license.
22 microwave channels from the Advanced Technology Microwave Sounder (ATMS) on the Suomi-NPP satellite provide brightness temperature measurements across a frequency range of 23.8-183.31 GHz. Data is structured as a 3D array with 135 along-track rows, 96 cross-track columns, and a third dimension for each channel, with spatial resolution varying from 16 to 75 kilometers depending on the channel. Products are generated on six-minute boundaries and aggregated into daily granule coverage maps.
Southern Africa is the focus of this airborne remote sensing dataset from the SAFARI 2000 project's dry season in 2000. The collection contains data from 21 flights by the NASA Goddard-developed Cloud Absorption Radiometer (CAR), which measures multi-wavelength radiation. It includes flight track maps, browse images, and instrument logs from a Convair CV-580 aircraft operated with the University of Washington.
Thermochemical data for BAL calculated from atomization energies in kcal/mol. The dataset was authored by Miguel Fernando Molano and last updated on 2026-06-01. It is a small dataset, 5.5 KB in size, and is available under a CC-BY-4.0 license.
32 million years ago, a shallow sea invaded the western Murray Basin, initiating a marine incursion that lasted at least 20 million years. This dataset, from the Australian Ocean Data Network, is a geological paper summarizing the stratigraphy and geometry of subsurface units that form major permeability barriers affecting groundwater flow. The analysis is based on subsurface facies from borelogs and palaeogeographic reconstructions, with specific porosity data from the Piangil West-2 borehole.