Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,339 datasets
A 1.7 MB dataset from figshare, last updated on 2026-05-01, authored by Aaron J. Molstad. The dataset accompanies a research article proposing a new method for modeling probability mass functions involving multiple categorical response variables and a common set of predictors. Supplementary materials for reproducing the work are available online.
Numerical simulations compare stationary and accelerated Bloch basis representations for modeling strong-field electron dynamics in crystalline solids. The dataset likely contains results analyzing spectral convergence, carrier dynamics, and harmonic emission under truncated-band approximations. It provides a basis for assessing physical validity in reduced-basis models of solid-state high-harmonic generation.
Huang's repository contains Jupyter Notebook codes and data for analyzing Brownian motion of point defects in a 2D hexagonal colloidal crystal. The 32.8 MB ZIP archive includes a computational pipeline for extracting drift vectors, diffusion matrices, and reconstructing stochastic potentials from experimental trajectories. The dataset was last updated on 2026-05-09 and is shared under a CC-BY-4.0 license.
Csenge Anna Lugosi's dataset contains survey responses for 858 senior dogs (>7 years) investigating factors associated with canine cognitive decline (CCD). The data was collected via an internationally distributed questionnaire covering dogs' activity levels, sports engagement, body condition, and owner acquisition priorities. Results indicate lifetime sports activity and joint activities with the owner had the strongest negative association with CCD severity.
Csenge Anna Lugosi's dataset contains survey responses for 858 senior dogs (>7 years) investigating factors associated with canine cognitive decline (CCD). The data was collected via an internationally distributed questionnaire covering dogs' activity levels, sports engagement, body condition, and owner priorities. The dataset is licensed under CC-BY-4.0 and was last updated in April 2026.
PangeanicYueJa is a parallel corpus containing 55,000 Cantonese-Japanese sentence pairs sampled from a larger collection of approximately 3.08 million pairs. It was created by Pangeanic and released on Hugging Face, with a last recorded update in June 2026. The corpus is designed for training and evaluating machine translation and multilingual language models.
45 radiocarbon results from coral microatolls at 11 sites show sea level fell smoothly from +1 meter at 6000 years B.P. to its present position. Storm ridge surveys at 5 places indicate an average recurrence interval of major ridge-building storms is about 80 years. This dataset, managed by the Australian Ocean Data Network, examines Holocene environmental changes on the Great Barrier Reef.
Quebec's official topographic base maps cover territory south of the 52nd parallel, derived from aerial photography at 1:40,000 scale. Each file provides approximately 250 km2 of area coverage with planimetric accuracy of about four meters. The Government of Quebec produced these maps, which are no longer updated.
Northern Victoria Land, Antarctica, hosts a Late Cambrian trilobite assemblage described in this paper. The fauna, found in the Eureka Formation at Eureka Spurs, includes seven determined taxa and is related to material from Kazakhstan, Siberia, China, Australia, and North America. The paper is hosted by the Australian Ocean Data Network and was last updated in May 2026.
285 seabed sediment samples were collected from inner Darwin Harbour and shallow water areas in and around Bynoe Harbour between 29 May and 16 August 2017. The surveys were conducted by Geoscience Australia, the Australian Institute of Marine Science, and the Northern Territory Government as part of a four-year (2014-2018) science program to create baseline environmental data and thematic habitat maps. The data includes grain size, inorganic elemental analyses, organic matter measures, and seagrass observations.
Hydrometric station data provides real-time monitoring of flood risks on rivers, watersheds, and lakes in Quebec. The dataset includes the latest water level (m) and flow (m³/s) values, along with station status based on pre-established flood thresholds. Data is integrated from multiple government and partner sources and updated several times daily.
Experimental data from a study investigating the effect of gut yeast symbionts on cold tolerance in Drosophila melanogaster. The dataset, published by Yanira Jiménez-Padilla on figshare in April 2026, includes measurements of chill coma recovery time (CCRT) for flies under distinct microbial conditions, including axenic, native microbiota, and gnotobiotic mono-associations with live or heat-killed yeasts.
Data for "Gut yeasts accelerate chill coma recovery in Drosophila melanogaster" by Yanira Jiménez-Padilla, published on figshare in April 2026. The dataset contains experimental results measuring chill coma recovery time (CCRT) in flies under different gut microbial conditions, including axenic, native microbiota, and gnotobiotic flies mono-associated with specific live or heat-killed yeast species. The data supports findings that live yeast symbionts can rapidly and sex-specifically rescue cold tolerance deficits.
Data Sheet 1 from a prospective single-center cohort study by Haocong Luo, comparing low-dose and standard-dose intravenous immunoglobulin (IVIG) in 34 adults with generalized myasthenia gravis. The dataset includes clinical outcomes assessed using the MG-ADL and QMG scales at multiple time points over 12 weeks, along with infusion details and treatment indications. The data was last updated on April 28, 2026, and is shared under a CC-BY-4.0 license.
86 JSONL files contain raw agent trace data generated by the TeichAI platform using the GLM-5.2 model. The dataset was created by AletheiaResearch and last updated on June 19, 2026. Agent traces include configured or recovered tool schemas, making tools available for training even if they were not called in a session.
Geoscience Australia Data produced a post-cruise report summarizing the preliminary results of the 1996/97 Antarctic marine geoscience program. The voyage collected seismic data, sidescan sonar records, and sediment cores from Vincennes Bay, Prydz Bay, and the Mac.Robertson Shelf to study ice sheet retreat and paleoenvironmental records. The report details findings from 27 gravity cores and over 1,100 km of geophysical data.
A 136.8 KB dataset from figshare, last updated April 18, 2026, by Wanhao Chi. It contains experimental data on the survival and foraging behavior of Drosophila melanogaster mutants with altered dopamine signaling. The data likely includes measurements of meal frequency, exploratory activity, and survival outcomes under free-feeding and energetically scarce conditions.
Colombian data on vegetation cover fires reported by the National Forest Information System (SNIF). The dataset covers incidents from February 11, 2021, to July 12, 2022, with data sourced from the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM). It includes detailed columns on affected land cover types and total impacted area.
MYD11A1 Version 6 provides daily per-pixel Land Surface Temperature and Emissivity (LST&E) data at 1-kilometer spatial resolution in a 1,200 by 1,200 km grid. The pixel temperature value is derived from the MYD11_L2 swath product and includes associated quality control assessments, observation times, view zenith angles, and clear-sky coverages. This dataset was decommissioned on July 31, 2023, with users directed to the newer MYD11A1 Version 6.1 product.
MOD17A2H Version 6 is a decommissioned NASA MODIS data product providing global estimates of Gross Primary Productivity (GPP) and Net Photosynthesis (PSN). It offers cumulative 8-day composite values at a 500-meter pixel resolution, based on a radiation use efficiency model. The dataset is intended for modeling terrestrial energy, carbon, and water cycle processes.