Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
41,487 datasets
Triplicate samples of natural mineral water (hora) and surrounding soils were collected from multiple districts and analyzed for physicochemical properties and mineral concentrations. The dataset includes measurements for parameters such as temperature, pH, electrical conductivity, and concentrations of calcium, potassium, iron, and molybdenum. Authored by Ashenafi Miresa and last updated on 2026-05-14, the data is stored in an 8.8 KB XLSX file.
Mineral water (hora) and surrounding soil samples were collected in triplicate from multiple districts and analyzed for physicochemical properties and mineral concentrations. The dataset includes measurements for parameters like electrical conductivity, total dissolved solids, ammonia, calcium, potassium, iron, and molybdenum. It was created by Ashenafi Miresa and last updated on 2026-05-14.
1700 news publications in Kazakh, collected from major Kazakhstani news platforms like Tengri News and Egemen Kazakhstan between 2020 and 2024. The corpus contains 1,007,037 tokens, 107,501 types, and 109,395 lemmas, with a frequency list provided. It was compiled by Assel Ormanova and is available under a CC-BY-4.0 license.
A study by Yan Xu investigates the molecular interactions between ticks and viruses, specifically Langat virus (LGTV) in Haemaphysalis longicornis. The dataset, shared on figshare, likely contains primers used to explore the JAK/STAT pathway's role in viral infection via a non-canonical mechanism involving an intracellular lipoprotein receptor. It was last updated on May 21, 2026.
Raw data from a manuscript investigating molecular interactions between ticks and viruses. Yan Xu published the data on figshare in May 2026. The data likely contains experimental results supporting the finding that the JAK/STAT pathway facilitates Langat virus replication via an atypical lipoprotein receptor.
A list of primers used for quantitative PCR (qPCR) in a study of mandible regeneration in the axolotl (Ambystoma mexicanum). The dataset was created by Samanta Tarquino GonzΓ‘lez and published on figshare in May 2026. The study assessed regenerative response after complete transverse amputation of the mandible.
110 samples of salinity, dissolved oxygen, water temperature, and light attenuation from Jervis Bay, New South Wales. Geoscience Australia collected this data during marine surveys in 2007, 2008, and 2009 using the vessel MV Kimbla. Measurements were taken in June and August 2008 and February 2009, focusing on a 3x5 km survey grid and additional representative habitats.
LBA-ECO LC-21 provides Landsat Enhanced Thematic Mapper Plus (ETM+) imagery and derived fractional land cover products for eight Brazilian Amazon states from 1999 to 2002. The data was processed using the Carnegie Landsat Analysis System (CLAS) methodology to estimate subpixel fractions of photosynthetic vegetation, non-photosynthetic vegetation, and bare substrate within each 30x30 meter pixel. The collection consists of 584 compressed files which expand to 1,717 GeoTIFF image files, with cloud, shadow, and water masks applied.
LBA-ECO CD-04 Leaf Area Index data was collected at an 18-hectare plot adjacent to the km 83 eddy flux tower in the Tapajos National Forest, Para, Brazil. NASA produced this dataset by measuring leaf litter from thirty traps placed along two transects, with biweekly sampling and lab analysis using scanners and image processing software. The dataset is stored as a single comma-delimited file.
Washington State law mandates entities report data breaches affecting over 500 residents to the Attorney General's Office. This dataset contains derived statistics from those notifications, including breach causes, affected population counts, and detailed timelines of the breach lifecycle. It serves as the source data for the AGO's Annual Data Breach Report.
Three sediment cores from Nara Inlet reveal a mixed clastic/carbonate system over the last 3000 years. Radiocarbon dating shows the top 3 meters accumulated within this period, with a slowing rate towards present-day. The Australian Ocean Data Network hosts this dataset, which indicates a previously unrecognized terrigenous sediment source for the Great Barrier Reef platform.
100,473 Chinese adults with baseline fasting plasma glucose under 5.6 mmol/L were analyzed for incident prediabetes risk. The study, authored by Bing Wang and shared on figshare, investigates nonlinear associations between the atherogenic index of plasma (AIP) and prediabetes, stratified by BMI categories. Results from a median 3.0-year follow-up show 12,371 incident cases and identify BMI-specific risk thresholds.
A systematic review and meta-analysis evaluating the efficacy of oral and topical polyphenolic interventions for hair regeneration in adults with non-scarring alopecia. The analysis includes 32 randomized controlled trials involving 2,183 participants, conducted according to PRISMA 2020 guidelines. The dataset, authored by Chaimae El Ammari and last updated in April 2026, is available under a CC-BY-4.0 license.
Over 2,000 opportunistically collected images of southern right whales from the southwest corner of Australia between 1991 and 2021. The data was collated by researchers, volunteer citizen scientists, and whale watch operators for a photo-identification study to evaluate abundance and movement. The project was conducted under the NESP MaC 1.22 initiative and contributed to the Australasian Right Whale Photo-Identification Catalogue.
A dataset from figshare authored by Rin Tanizawa, last updated in June 2026. It contains experimental data on the synthesis and photophysical properties of tetra(aryl)diborane compounds. The 29.9 KB dataset compares properties like dual emission between tetra(1-naphthyl)diborane and tetra(o-tolyl)diborane.
Sediment descriptions and diatom counts from cores collected in northwest Scotland between 2022 and 2025. Data from sites in Raasay, Fearnmore, Loch Ewe, Reiff, and Inverkirkaig include collection dates, latitude/longitude, and sediment gouge descriptions. This dataset was produced by the Government Digital Service as part of a NERC grant to constrain relative sea-level history for testing ice-sheet and glacio-isostatic adjustment models.
Geoscience Australia Data compiled bathymetric and sediment thickness maps for the Christmas Island area. The data includes eight seismic profiles totalling about 2000 km and bathymetric data from a 1992 survey, combined with other institutional data. This collection provides a revised 1:1,000,000 scale map offering more detail than previous compilations from the 1970s and 1980s.
A set of public registers mandated by Bulgaria's Waste Management Act, maintained by the Executive Environment Agency. The registers cover permits, producers, and sites related to waste, batteries, electronics, oils, tyres, and other materials. The data is up-to-date as of January 5, 2026.
Geoscience Australia Data provides a GeoPDF map of the Harris Greenstone Domain, a late Archean-Proterozoic terrane in the Gawler Craton. The map depicts lithological units, magnetic features, and structural interpretations based on aeromagnetic, gravity, and drillcore data. It was last updated on 2026-05-14.
PacWave Site Observations contain raw and near-real-time meteorological and oceanic measurements from the PacWave wave energy test site off Newport, Oregon. The dataset includes measurements from multiple instrument platforms, such as FLOATr buoys, Spotter wave buoys, and bottom landers, deployed at two distinct offshore sites. Data are provided in netCDF4 and CSV formats with minimal quality control, supporting research into wave energy resource assessment and environmental conditions.