Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
42,062 datasets
World Bank data on energy production, use, dependency, and efficiency for Canada, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset addresses trends in energy use and sustainability for economic growth and poverty reduction. It was last updated on 2026-04-27 and is available under a CC-BY-4.0 license.
Data compiled by the World Bank from the International Energy Agency and the Carbon Dioxide Information Analysis Center. It contains indicators on energy production, use, dependency, and efficiency for Brazil, reflecting trends relevant to economic growth and sustainability. The dataset was last updated on 2026-04-27.
A scientific paper describes the youngest Late Cambrian trilobite assemblage discovered in the Mariner Group, northern Victoria Land, Antarctica. The assemblage contains seven determined trilobite taxa and is related to material from Kazakhstan, Siberia, China, Australia, and North America. The paper was published by Geoscience Australia Data and last updated on 2026-05-14.
A dataset of baseline characteristics for 67 participants from two African HIV cohorts (HVTN 503 and PP/COS). It contains pre-infection Th17 cell frequency measurements and subsequent disease progression markers, including CD4/CD8 ratios and viral load set points. The data was authored by Tosin E. Omole and last updated in April 2026.
A 6,000-year record from 11 inner-shelf sites on the Great Barrier Reef examines Holocene environmental changes. The dataset includes 45 radiocarbon dates from coral microatolls and storm ridge sequences, used to reconstruct sea-level trends and storm recurrence intervals. It was published by Geoscience Australia and last updated in May 2026.
Sample-level harmonized data files from the third stage of the TCGA Lower Grade Glioma Python pipeline. The dataset integrates validated clinical, gene expression, copy number alteration, and mutation data, filtered to a common set of matched samples. It was authored by Aaliah Aly and last updated on 2026-05-07.
A database of Australian mineral and mining processing plant locations and attributes compiled by Geoscience Australia. It contains information on plant type, processing methods, commodities, and output where known. The data is available via the Geoscience Australia Portal and was developed as part of a Critical Minerals Research and Development Hub project.
1.0 GB of labeled sentences from student-written statistics reports from Carnegie Mellon University. The dataset includes sequences of sentences labeled for rhetorical purpose using Llama and 384-dimensional sBERT embeddings for the original text. It was authored by Margaret Ellingwood and last updated in May 2026.
Lili Town, Suzhou, China, is the study area for this dataset supporting an LLM-driven agent for automating the Storm Water Management Model (SWMM). The dataset includes basic geographic shapefiles, a model INP file, and a 50-instruction natural language benchmark (SWMM-PAI) for parameter adjustment. It was authored by Yani Zhong and last updated on 2026-04-27.
Narryer survey data from Western Australia provides a grid of equivalent air-absorbed dose rate derived from gamma-ray spectrometry. The grid has a cell size of approximately 20 meters and is based on 415,090 line-kilometres of data acquired in 2024 by the WA Government. Processed by Geoscience Australia, the data represents total dose rate from natural potassium, uranium, and thorium decay, combined with cosmic dose estimates.
Antarctic Specially Protected Area No. 143 Marine Plain in East Antarctica is valued for its fossil fauna and geological features. This dataset presents results from geological mapping, aerial imagery collection, and field observations to assess the impact of human access and provide management options. The work was presented at the SCAR Open Science Conference 2024 and builds on regional mapping of the Vestfold Hills.
Geoscience Australia's 2000/2001 mapping study delineated four major geomorphological features and five acoustic echo facies for the Great Australian Bight. The report underpins biological, environmental, and economic assessments for Regional Marine Planning. Its digitized GIS data includes boundaries for continental shelves, slopes, rises, and terraces, along with attributes for each acoustic facies.
A 48.0 KB review document summarizes the role of mitochondrial dysfunction in osteoporosis and catalogs Chinese botanical drugs targeting it. Author Shiyu Li published the document on figshare under a CC-BY-4.0 license in May 2026. The review aims to establish a research paradigm linking botanical drugs, mitochondria, and bone health.
943 diabetes patients' structured telehealth data was augmented with physical activity information extracted from their free-text notes over a 12-year period. Fabian WiesmΓΌller published this research in 2026, benchmarking local rule-based and Mistral LLM methods against GPT-4.1. The dataset includes 100 synthetically generated notes used for benchmarking the extraction algorithms.
Data from 943 patients collected over 12 years in the DiabMemory system, supplemented by 100 synthetic notes, were analyzed for physical activity information extraction. The dataset was created by Fabian WiesmΓΌller and last updated in May 2026. It includes pseudonymized free-text notes from a diabetes telehealth platform.
Survey data from a study of 1,229 participants who used the Headspace mindfulness app during a public health deployment. The data includes two survey time points measuring distress, loneliness, mental health stigma, and use of other online mental health tools. The dataset was authored by Judith Borghouts and last updated in May 2026.
A field study dataset comparing concentrations of mineral-associated organic carbon (MAOC) and particulate organic carbon (POC) in desertifying grasslands of Inner Mongolia, China. The data includes mean MAOC values of 15.66 g/kg and 14.99 g/kg at two depths in the typical steppe, and 12.10 g/kg and 11.64 g/kg in the desert steppe, alongside POC concentrations. Authored by Hao Peng and published on figshare in May 2026.
Cupos otorgados para licencias de cannabis psicoactivo tracks the quotas granted for cultivating psychoactive cannabis plants in Colombia. The data includes counts for initial applications and granted quotas for both ordinary and supplementary license types. Information is available from 2017 through the first quarter of 2025, sourced from the Colombian open data portal www.datos.gov.co.
World Bank data on energy and mining for Australia, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset covers energy production, use, dependency, and efficiency metrics. It was last updated on 2026-04-27 and is provided under a CC-BY-4.0 license.
Portuguese wine regions provided 397 yeast strains for a study of their biocontrol potential against common grape phytopathogenic fungi. The dataset, authored by Marcos Esteves and last updated in May 2026, contains results from time-course monitoring of mold growth inhibition. All tested yeasts displayed antagonistic activity against at least one of the four fungal targets: Aspergillus, Botrytis, Rhizopus, and Penicillium.