Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
42,817 datasets
Ryan V. Quiroz published a dataset on figshare in 2026 describing a structure-activity relationship (SAR) campaign for novel camptothecin-derived linker-payloads. The dataset, 2.6 KB in size, likely contains design parameters and performance metrics for antibody-drug conjugates (ADCs). The most promising candidates reportedly achieved low aggregation (<5%), stability, and significant tumor regression in vivo.
Approximately 61 degrees North to 61 degrees South, this dataset provides calibrated surface-flagged sigma-0 (radar backscatter) and attenuation corrections in 12.5 km Wind Vector Cells. It is the Version 2.0 Level 2A science data record from the ISS-mounted RapidScat instrument, a Ku-band scatterometer derived from QuikSCAT hardware. Data are stored in single-orbit HDF-4 files with a list structure to accommodate the instrument's variable circular scan pattern.
ISS-RapidScat Level 2A Version 2.0 provides calibrated surface-flagged radar backscatter (sigma-0) and attenuation corrections in 25km Wind Vector Cells. The data are derived from a Ku-band scatterometer mounted on the International Space Station, offering a complete historical reprocessing for consistent calibration across the instrument's operational period. Coverage is limited to latitudes between approximately 61 degrees North and South due to the ISS's non-sun-synchronous, low-inclination orbit.
World Bank Group data on China's energy and mining sector, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset covers topics such as energy production, use, dependency, and efficiency. It was last updated on 2026-04-27.
World Bank data on energy production, use, dependency, and efficiency for Canada, compiled from the International Energy Agency and the Carbon Dioxide Information Analysis Center. The dataset addresses trends in energy use and sustainability for economic growth and poverty reduction. It was last updated on 2026-04-27 and is available under a CC-BY-4.0 license.
Data compiled by the World Bank from the International Energy Agency and the Carbon Dioxide Information Analysis Center. It contains indicators on energy production, use, dependency, and efficiency for Brazil, reflecting trends relevant to economic growth and sustainability. The dataset was last updated on 2026-04-27.
A scientific paper describes the youngest Late Cambrian trilobite assemblage discovered in the Mariner Group, northern Victoria Land, Antarctica. The assemblage contains seven determined trilobite taxa and is related to material from Kazakhstan, Siberia, China, Australia, and North America. The paper was published by Geoscience Australia Data and last updated on 2026-05-14.
A dataset of baseline characteristics for 67 participants from two African HIV cohorts (HVTN 503 and PP/COS). It contains pre-infection Th17 cell frequency measurements and subsequent disease progression markers, including CD4/CD8 ratios and viral load set points. The data was authored by Tosin E. Omole and last updated in April 2026.
A 6,000-year record from 11 inner-shelf sites on the Great Barrier Reef examines Holocene environmental changes. The dataset includes 45 radiocarbon dates from coral microatolls and storm ridge sequences, used to reconstruct sea-level trends and storm recurrence intervals. It was published by Geoscience Australia and last updated in May 2026.
Sample-level harmonized data files from the third stage of the TCGA Lower Grade Glioma Python pipeline. The dataset integrates validated clinical, gene expression, copy number alteration, and mutation data, filtered to a common set of matched samples. It was authored by Aaliah Aly and last updated on 2026-05-07.
A database of Australian mineral and mining processing plant locations and attributes compiled by Geoscience Australia. It contains information on plant type, processing methods, commodities, and output where known. The data is available via the Geoscience Australia Portal and was developed as part of a Critical Minerals Research and Development Hub project.
1.0 GB of labeled sentences from student-written statistics reports from Carnegie Mellon University. The dataset includes sequences of sentences labeled for rhetorical purpose using Llama and 384-dimensional sBERT embeddings for the original text. It was authored by Margaret Ellingwood and last updated in May 2026.
Lili Town, Suzhou, China, is the study area for this dataset supporting an LLM-driven agent for automating the Storm Water Management Model (SWMM). The dataset includes basic geographic shapefiles, a model INP file, and a 50-instruction natural language benchmark (SWMM-PAI) for parameter adjustment. It was authored by Yani Zhong and last updated on 2026-04-27.
Narryer survey data from Western Australia provides a grid of equivalent air-absorbed dose rate derived from gamma-ray spectrometry. The grid has a cell size of approximately 20 meters and is based on 415,090 line-kilometres of data acquired in 2024 by the WA Government. Processed by Geoscience Australia, the data represents total dose rate from natural potassium, uranium, and thorium decay, combined with cosmic dose estimates.
Antarctic Specially Protected Area No. 143 Marine Plain in East Antarctica is valued for its fossil fauna and geological features. This dataset presents results from geological mapping, aerial imagery collection, and field observations to assess the impact of human access and provide management options. The work was presented at the SCAR Open Science Conference 2024 and builds on regional mapping of the Vestfold Hills.
Geoscience Australia's 2000/2001 mapping study delineated four major geomorphological features and five acoustic echo facies for the Great Australian Bight. The report underpins biological, environmental, and economic assessments for Regional Marine Planning. Its digitized GIS data includes boundaries for continental shelves, slopes, rises, and terraces, along with attributes for each acoustic facies.
9 speakers contributed roughly 6 hours of audio recordings in the Saarländisch dialect of German. The corpus contains 4,871 recorded sentences and an additional 3,905 unrecorded sentences, sampled at 22,050 Hz. It was created by UdS-LSV and last updated in June 2026.
A 48.0 KB review document summarizes the role of mitochondrial dysfunction in osteoporosis and catalogs Chinese botanical drugs targeting it. Author Shiyu Li published the document on figshare under a CC-BY-4.0 license in May 2026. The review aims to establish a research paradigm linking botanical drugs, mitochondria, and bone health.
943 diabetes patients' structured telehealth data was augmented with physical activity information extracted from their free-text notes over a 12-year period. Fabian Wiesmüller published this research in 2026, benchmarking local rule-based and Mistral LLM methods against GPT-4.1. The dataset includes 100 synthetically generated notes used for benchmarking the extraction algorithms.
Data from 943 patients collected over 12 years in the DiabMemory system, supplemented by 100 synthetic notes, were analyzed for physical activity information extraction. The dataset was created by Fabian Wiesmüller and last updated in May 2026. It includes pseudonymized free-text notes from a diabetes telehealth platform.