Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,732 datasets
Raw LC-MS data files associated with a specific swab set from the COVIDCAP Protocol paper. The 21.4 MB dataset was authored by Ellen Liggett and last updated on 2026-05-08. Data is provided in MZML format under a CC-BY-4.0 license.
Raw LC-MS data associated with Table 6, Swab Set 2 in the COVIDCAP Protocol paper. The dataset is 19.6 MB in size and was authored by Ellen Liggett, last updated on May 8, 2026.
Raw LC-MS data associated with Table 6, Swab Set 2 from the COVIDCAP Protocol paper. The dataset is 19.6 MB in size and is available in MZML format under a CC-BY-4.0 license. It was authored by Ellen Liggett and last updated on 2026-05-08.
42.1 MB of raw LC-MS data files in MZML format, associated with Table 6, Saliva Set 1 from the COVIDCAP Protocol paper. The dataset was authored by Ellen Liggett and last updated on 2026-05-08. It is shared under a CC-BY-4.0 license on the figshare platform.
Raw LC-MS data from a COVIDCAP Protocol study analyzing saliva for COVID-19 biomarkers. The 29.1 MB dataset, authored by Ellen Liggett and shared under CC-BY-4.0, is associated with Table 6, Saliva Set 2 in the referenced preprint. Its last update was recorded on May 8, 2026.
Eyuel Welelaw's dataset from figshare contains HPLC chromatograms and calibration graphs for sugars and minerals in royal jelly. The data includes retention times for fructose, glucose, sucrose, and maltose, along with profiles for vitamins, minerals, and DPPH antioxidant activity. It was last updated on 2026-04 21.
Beneficiarios del programa Adulto Mayor, Municipio de Susa is a dataset from the Colombian open data portal www.datos.gov.co. It lists beneficiaries of the Elderly Adult program, with columns for gender, enrollment date, and entry date. The dataset was last updated on May 18, 2026.
Two blue holes at Cockatoo and Molar Reefs, measuring 240-295 meters in diameter and 30-40 meters deep. The dataset from the Australian Ocean Data Network describes their morphology, sediment fans, biological associations, and includes seismic refraction studies showing a pre-Holocene surface 8.5-11 meters beneath the rims. It was last updated on 2026-04-28.
2026 data from the Government of Yukon details land dispositions administered by the Department of Energy, Mines and Resources. The dataset categorizes land transactions into Agreement for Sale, Lease, Easement, and Reservation types, excluding applications and licenses.
An inventory of public information generated, obtained, acquired, or controlled by the E.S.E. Hospital Regional de Chiquinquirá that has been classified as confidential or reserved. The dataset includes columns such as Calificación (Classification), Plazo de la clasificación o reserva (Classification or reservation period), and Descripción de la Información (Description of the Information). It is published via the Socrata platform on datos.gov.co and was last updated on 2026-05-18.
A comparative study analyzes the word field of suffering (Leid) across French, Italian, and German. The research, conducted by Annika Straube and harvested by heiDATA, employs a context analysis where co-occurring words are categorized semantically. This categorization, based on Wittgensteinian theory, reveals fine-grained semantic differences between nouns in this emotional domain.
Relational data on indigenous non-governmental organizations and the partners they list on their official websites. The dataset was used for regression modeling and bipartite network visualization. It was authored by Jale Tosun and harvested from heiDATA Dataverse.
A geological report on the Ngalia Basin in Australia's Northern Territory. The document details the basin's stratigraphy, structure, sediment thicknesses, and economic potential. It was published by the Australian Ocean Data Network and last updated in April 2026.
Australian Ocean Data Network hosts a scientific model describing terrigenous sedimentation in the central Great Barrier Reef lagoon, focusing on the Burdekin Region. The description details coastal progradation rates, sediment dispersal patterns, and distinct sedimentary assemblages formed during the Holocene. The dataset was last updated on 2026-04-28.
Over 900 well measurements of total dissolved solid concentration in formation water, compiled to assess conditions for CO2 solubility trapping in major Australian sedimentary basins. The dataset, hosted by the Australian Ocean Data Network, was last updated in April 2026 and supports research into geological carbon storage safety.
BOEM's Offshore Wind Lease Outlines contain dissolved polygon boundaries for active commercial, research, and right-of-way wind energy leases in U.S. Outer Continental Shelf waters. The dataset is managed by the U.S. Department of the Interior's Bureau of Ocean Energy Management. It was last updated in April 2026.
Outline polygons for active offshore wind leases managed by the Bureau of Ocean Energy Management. The dataset covers commercial, research, and right-of-way lease areas within U.S. Outer Continental Shelf waters. It was last updated by the Department of the Interior in April 2026.
Australian Ocean Data Network provides a legacy report on the seafloor morphology of part of the central New South Wales continental shelf. The data relates to the assessment of offshore heavy-mineral prospects. The product was last updated on 2026-06-05, but is described as a legacy item with no abstract available.
1 million pages of fully-parallel synthetic documents rendered in 22 languages for OCR, layout detection, and visual question answering tasks. The dataset was created by Cognitive-Lab and was last updated on the Hugging Face platform in May 2026. It is described as one of the largest open-source multilingual, multi-task document datasets, with the same ~45,700 source pages rendered in every language.
VietVault is a large-scale Vietnamese language corpus curated from Common Crawl dataset dumps. It contains 80GB of raw text, cleaned and filtered for Vietnamese, sourced from dumps between 2013 and 2023. The dataset was created by author nampdn-ai and last updated on 2026-05-12.