Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,998 datasets
Geoscience Australia conducted a marine survey on the Lord Howe Island shelf in 2008 to map seabed bathymetry and characterize benthic environments. This dataset provides feeding guild counts per sample, aggregated from species-level data collected during that survey. The data and samples were acquired using the National Facility Research Vessel Southern Surveyor.
A Persian (Farsi) question–answer corpus for social engineering and cybersecurity derived from curated knowledge articles extracted from authoritative reference books. The dataset was created by author smd20 and last updated on June 13, 2026. It is designed for supervised fine-tuning, retrieval-augmented generation evaluation, and domain-specific language model benchmarking.
Results from geological mapping, aerial imagery collection, and field observations at Antarctic Specially Protected Area (ASPA) No. 143 Marine Plain are presented. The dataset likely contains polygons outlining recommended helicopter landing areas and surficial geology maps derived from aerial photos, satellite imagery, and a digital elevation model. The work was presented at the SCAR Open Science Conference 2024 and is hosted by the Australian Ocean Data Network.
A 2000/2001 regional seafloor mapping study by Geoscience Australia's South and Southwest Regional Project. The report delineates four major geomorphological features and defines five acoustic echo facies for the Great Australian Bight area, digitized into a GIS. It was produced to support future Regional Marine Planning.
A 1.7 MB dataset from figshare, last updated on 2026-05-01, authored by Aaron J. Molstad. The dataset accompanies a research article proposing a new method for modeling probability mass functions involving multiple categorical response variables and a common set of predictors. Supplementary materials for reproducing the work are available online.
Numerical simulations compare stationary and accelerated Bloch basis representations for modeling strong-field electron dynamics in crystalline solids. The dataset likely contains results analyzing spectral convergence, carrier dynamics, and harmonic emission under truncated-band approximations. It provides a basis for assessing physical validity in reduced-basis models of solid-state high-harmonic generation.
Huang's repository contains Jupyter Notebook codes and data for analyzing Brownian motion of point defects in a 2D hexagonal colloidal crystal. The 32.8 MB ZIP archive includes a computational pipeline for extracting drift vectors, diffusion matrices, and reconstructing stochastic potentials from experimental trajectories. The dataset was last updated on 2026-05-09 and is shared under a CC-BY-4.0 license.
Csenge Anna Lugosi's dataset contains survey responses for 858 senior dogs (>7 years) investigating factors associated with canine cognitive decline (CCD). The data was collected via an internationally distributed questionnaire covering dogs' activity levels, sports engagement, body condition, and owner acquisition priorities. Results indicate lifetime sports activity and joint activities with the owner had the strongest negative association with CCD severity.
Csenge Anna Lugosi's dataset contains survey responses for 858 senior dogs (>7 years) investigating factors associated with canine cognitive decline (CCD). The data was collected via an internationally distributed questionnaire covering dogs' activity levels, sports engagement, body condition, and owner priorities. The dataset is licensed under CC-BY-4.0 and was last updated in April 2026.
PangeanicYueJa is a parallel corpus containing 55,000 Cantonese-Japanese sentence pairs sampled from a larger collection of approximately 3.08 million pairs. It was created by Pangeanic and released on Hugging Face, with a last recorded update in June 2026. The corpus is designed for training and evaluating machine translation and multilingual language models.
45 radiocarbon results from coral microatolls at 11 sites show sea level fell smoothly from +1 meter at 6000 years B.P. to its present position. Storm ridge surveys at 5 places indicate an average recurrence interval of major ridge-building storms is about 80 years. This dataset, managed by the Australian Ocean Data Network, examines Holocene environmental changes on the Great Barrier Reef.
Quebec's official topographic base maps cover territory south of the 52nd parallel, derived from aerial photography at 1:40,000 scale. Each file provides approximately 250 km2 of area coverage with planimetric accuracy of about four meters. The Government of Quebec produced these maps, which are no longer updated.
Northern Victoria Land, Antarctica, hosts a Late Cambrian trilobite assemblage described in this paper. The fauna, found in the Eureka Formation at Eureka Spurs, includes seven determined taxa and is related to material from Kazakhstan, Siberia, China, Australia, and North America. The paper is hosted by the Australian Ocean Data Network and was last updated in May 2026.
285 seabed sediment samples were collected from inner Darwin Harbour and shallow water areas in and around Bynoe Harbour between 29 May and 16 August 2017. The surveys were conducted by Geoscience Australia, the Australian Institute of Marine Science, and the Northern Territory Government as part of a four-year (2014-2018) science program to create baseline environmental data and thematic habitat maps. The data includes grain size, inorganic elemental analyses, organic matter measures, and seagrass observations.
Hydrometric station data provides real-time monitoring of flood risks on rivers, watersheds, and lakes in Quebec. The dataset includes the latest water level (m) and flow (m³/s) values, along with station status based on pre-established flood thresholds. Data is integrated from multiple government and partner sources and updated several times daily.
Experimental data from a study investigating the effect of gut yeast symbionts on cold tolerance in Drosophila melanogaster. The dataset, published by Yanira Jiménez-Padilla on figshare in April 2026, includes measurements of chill coma recovery time (CCRT) for flies under distinct microbial conditions, including axenic, native microbiota, and gnotobiotic mono-associations with live or heat-killed yeasts.
Data for "Gut yeasts accelerate chill coma recovery in Drosophila melanogaster" by Yanira Jiménez-Padilla, published on figshare in April 2026. The dataset contains experimental results measuring chill coma recovery time (CCRT) in flies under different gut microbial conditions, including axenic, native microbiota, and gnotobiotic flies mono-associated with specific live or heat-killed yeast species. The data supports findings that live yeast symbionts can rapidly and sex-specifically rescue cold tolerance deficits.
Data Sheet 1 from a prospective single-center cohort study by Haocong Luo, comparing low-dose and standard-dose intravenous immunoglobulin (IVIG) in 34 adults with generalized myasthenia gravis. The dataset includes clinical outcomes assessed using the MG-ADL and QMG scales at multiple time points over 12 weeks, along with infusion details and treatment indications. The data was last updated on April 28, 2026, and is shared under a CC-BY-4.0 license.
86 JSONL files contain raw agent trace data generated by the TeichAI platform using the GLM-5.2 model. The dataset was created by AletheiaResearch and last updated on June 19, 2026. Agent traces include configured or recovered tool schemas, making tools available for training even if they were not called in a session.
Geoscience Australia Data produced a post-cruise report summarizing the preliminary results of the 1996/97 Antarctic marine geoscience program. The voyage collected seismic data, sidescan sonar records, and sediment cores from Vincennes Bay, Prydz Bay, and the Mac.Robertson Shelf to study ice sheet retreat and paleoenvironmental records. The report details findings from 27 gravity cores and over 1,100 km of geophysical data.