Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,343 datasets
A briefing package prepared for a parliamentary hearing on the Auditor General of Canada's Fall 2025 report. The document likely contains analysis and findings from an audit of the Canada Revenue Agency's contact centers. It was published by the Office of the Auditor General of Canada and last updated on the platform in May 2026.
A sample subset of 23,734 records from a proprietary dataset curated for AI engineering tasks. The dataset was created by kooda-ai and last updated on May 31, 2026. It is intended to address the knowledge cutoff problem in production AI systems.
A 12.9 KB DOCX file containing supplementary material for a 2026 study exploring the relationship between serum inflammatory biomarkers and carotid atherosclerotic plaque characteristics. The prospective study included 128 patients (84.4% men; mean age 58.0 ±8.7 years) and analyzed associations between Hs-CRP, homocysteine, NLR, and plaque features like intraplaque hemorrhage and lipid-rich necrotic core.
A briefing package prepared for a hearing before the Standing Committee on Government Operations and Estimates (OGGO) on September 23, 2025. The document likely contains analysis and context for the Auditor General of Canada's 2025 Reports 2, 3, and 4. It was published by the Office of the Auditor General of Canada and last updated on May 25, 2026.
1998-2018 model simulations of solar-induced chlorophyll fluorescence (SIF) from the Community Land Model (CLM 4.5) for Niwot Ridge, Colorado. The dataset contains outputs from three simulations testing the role of non-photochemical quenching (NPQ) in seasonal SIF determination. It is provided by the National Aeronautics and Space Administration.
VIIRS/NPP On Board Calibrator (OBC) IP NRT data contains raw calibration and engineering observations from the Visible Infrared Imaging Radiometer Suite sensor on the Suomi NPP satellite. The dataset likely contains space view, solar diffuser, and blackbody observations, gain state information, and housekeeping data for radiometric calibration. It supports the transformation of sensor digital counts to calibrated radiance and reflectance values.
SciCloze-900 is a cloze-style benchmark containing 900 multiple-choice items for evaluating small base language models. The benchmark includes 300 biology questions, with chemistry and physics items making up the remainder. It was created by veyra-ai and last updated on June 14, 2026.
Near Real Time data from the VIIRS sensor aboard the JPSS-2/NOAA-21 satellite provides calibrated top-of-atmosphere radiances for 7 dual-gain moderate-resolution bands (M1–M5, M7, M13) at 750-meter resolution. The product contains unaggregated sub-pixel samples from nadir and near-nadir zones that are typically discarded during standard processing, offering a unique view of the raw sensor data. Each swath file is generated from 6 minutes of satellite overpass data, with dual-gain bands containing 6304 samples per scan.
Chaitanya Raj Naharki published raw FTIR, XRD, and gamma-ray spectrometry data for soil analysis on figshare in 2026. The 3.2 MB dataset includes instrument-generated text files, spectral images, and Excel files. It supports the manuscript "Assessments of natural radioactivity and associated radiological hazards and mineralogical correlation using FTIR and XRD in soil from Tanahun Nepal".
AVIRIS-NG airborne spectrometer data provides BRDF and sunglint-corrected surface spectral reflectance for the Mississippi River Delta. The dataset includes individual flight lines and ten mosaicked files covering four locations (Terre, Atcha, TerreEast, Bara) from Spring and Fall 2021 deployments. Collected for the Delta-X campaign, these images support modeling of delta landform changes under sea level rise.
Nine unreported and 15 known chaetoglobosin alkaloids were separated from the endophytic Chaetomium sp. UJN-EF006. Their fungicidal activities against the agricultural pathogens Botrytis cinerea and Sclerotinia sclerotiorum were evaluated, with alkaloid 19 showing potent antifungal effects. The dataset, authored by Yinyin Wang, was last updated on 2026-05-08.
Twenty-four chaetoglobosin alkaloids, including nine unreported compounds, were isolated from the endophytic fungus Chaetomium sp. UJN-EF006. Their fungicidal activities against the agricultural pathogens Botrytis cinerea and Sclerotinia sclerotiorum were evaluated, with alkaloid 19 showing potent effects. The dataset, authored by Yinyin Wang and last updated on 2026-05-08, includes structural data in CIF format.
Airborne precipitation radar data was collected during the Genesis and Rapid Intensification Processes (GRIP) experiment to study tropical storm formation. The dataset originates from the APR-2 instrument, a dual-frequency, Doppler, dual-polarization radar system flown on a NASA DC-8 aircraft. Measurements were taken between August 17 and September 22, 2010, and are stored in HDF-4 format.
A raw, sanitized transcript of a Claude Code session where the AI model predicted every match of the 2026 FIFA World Cup on the tournament's opening day. The session includes a verification of the final 48-team field via web search. The dataset was authored by 'victor' and uploaded to Hugging Face on June 11, 2026.
A 1.1 MB supplementary document from a study evaluating six Low Impact Development practices in the Aricanduva River sub-basin, São Paulo, Brazil. The research, authored by Mauricio Jonas Ferreira and last updated in April 2026, assesses 21 implementation scenarios using HEC-HMS and HEC-RAS models, quantifying hydrological performance and economic-social benefits with a local adaptation of the Green Values Stormwater Calculator.
December 2023 boundaries for NHS England Regions, provided as digital vector data. The dataset contains full-resolution boundaries clipped to the coastline and is published by the Office for National Statistics. It contains both Ordnance Survey and ONS Intellectual Property Rights.
Survey data from 448 Estonian children aged 30-48 months examines associations between screen time, child-adult conversation, and language development. The dataset includes mother-reported measures of daily screen time, face-to-face talk, and language scores from the ECDI-III assessment. Author Jaan Tulviste collected the data from September 2023 to December 2024.
Upwelling Frequency Data of the South-eastern Coast of Australia is a GIS data layer showing the frequency of coastal upwelling events. The dataset was generated using 14 years of monthly MODIS sea surface temperature data, with higher values indicating persistent or semi-persistent upwelling and medium values indicating seasonal upwelling. It is hosted by the Australian Ocean Data Network and is detailed in a 2019 publication in Remote Sensing of Environment.
Radar measurements from the NASA African Monsoon Multidisciplinary Analyses (NAMMA) campaign, collected in August-September 2006. The dataset contains dual-frequency, dual-polarization radar reflectivity and Doppler velocity observations from the Second Generation Airborne Precipitation Radar (APR-2) instrument. This mission was based in the Cape Verde Islands to study African Easterly Waves and Mesoscale Convective Systems over western Africa.
182 differentially abundant proteins were identified across three treatment comparisons of cryopreserved ram sperm. The dataset contains proteomic profiles from an Orbitrap Astral DIA analysis, comparing sperm treated with astaxanthin, melatonin, or a combination against a control. Author Chunyan Li uploaded the data to figshare in May 2026.