Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,560 datasets
A 2026 supplementary document from figshare by Yutong Ai details the development of a fungal cell factory for beauvericin. The study reports achieving a yield of 921.24 mg/kg dry weight, the highest reported to date, using the engineered strain Emericellopsis sp. XJ1056. The work establishes a platform for heterologous production of nonribosomal peptides.
Data Sheet 1 documents the development of the filamentous fungus Emericellopsis sp. XJ1056 as a platform for high-yield biosynthesis of the nonribosomal peptide beauvericin. The study, authored by Yutong Ai and shared on figshare in April 2026, details genetic engineering methods and fermentation optimization that achieved a yield of 921.24 mg/kg dry weight. This establishes a generalizable platform for heterologous production of diverse natural products.
A 104.4 KB dataset provides benchmark total energy calculations for fifth-row elements (Rb–Xe) using correlation-consistent effective core potentials (ccECPs). Data was generated by Aqsa Shaikh using configuration interaction, coupled-cluster, and quantum Monte Carlo methods, with basis sets extrapolated to the complete basis set limit. The dataset was last updated on 2026-04-28.
Global news sources provide over 2.5 million artificial intelligence-related headlines across multiple languages and countries. The collection includes three complementary datasets designed for temporal, linguistic, and keyword-stratified analysis, spanning more than two decades of coverage. It was developed as part of the RAISE initiative at Rutgers University's MPI Program, Bloustein School.
A 5.6 MB dataset from figshare supports a novel mixture model for high-dimensional spatial extremes, applied to approximately 600 spatial locations. Author Muyang Shi's research demonstrates nonstationary tail dependence in extreme summertime precipitation over the central United States. The model, which allows for both asymptotic dependence and independence at different spatial scales, was last updated on April 28, 2026.
Sanford Marathon Dataset is a biomechanics dataset provided as material for a tutorial on global gait measures for marathon runners. The dataset is 69.8 MB in size and was last updated on May 5, 2026. It is authored by Amr Mohamed and released under a CC-BY-4.0 license.
First semester 2022 enrollment data for conflict victim populations in Neiva, Colombia, from official educational institutions. The dataset is provided by the Educational Coverage Management Unit via the Colombian open data portal. It contains counts of students disaggregated by grade level and gender.
Geological mapping and aerial imagery results for Antarctic Specially Protected Area (ASPA) No. 143 Marine Plain, presented at the SCAR Open Science Conference 2024. The dataset likely contains geospatial polygons and landform classifications derived from 1:2000 scale mapping of the 400 km² Vestfold Hills region to assess human impact risks. It was produced by researchers citing Geoscience Australia sources and is hosted on the Australian Ocean Data Network.
188.6 KB of proteomic and gene expression data from a study on the carboxypeptidase Q gene in the parthenogenetic tick Haemaphysalis longicornis. The dataset, uploaded by yuchao zhang to figshare, includes results from RNAi knockdown, DIA-based proteomics, and yeast two-hybrid assays to characterize CPQ's role in ovarian development. Data was last updated on 2026-04-16.
Statistics Canada provides provincial and territorial gross domestic product (GDP), employment, labour compensation per job, GDP per job, and tourism share of employment and GDP by tourism industry. The dataset is available in CSV, XML, and HTML formats under the OGL-CA-2.0 license and was last updated on 2026-05-26.
130 individuals from inmate, nurse, manager, and working adult groups completed the 16-item Balanced Inventory of Desirable Responding (BIDR-6) short form. The dataset, authored by Anna M. Dåderman and last updated in April 2026, was used to examine the two-factor structure of Self-Deceptive Enhancement and Impression Management and their correlations with Big Five and HEXACO personality traits.
130 individuals from inmate, nurse, manager, and working adult groups completed the 16-item Balanced Inventory of Desirable Responding. Anna M. Dåderman's research, last updated in April 2026, used exploratory graph analysis and confirmatory factor analysis to validate the two-factor structure of Self-Deceptive Enhancement and Impression Management.
INCAUTACIÓN DE HEROÍNA is a dataset from www.datos.gov.co detailing heroin confiscations in Colombia. The data includes columns for municipality (MUNICIPIO), department (DEPARTAMENTO), date of incident (FECHA HECHO), and quantity seized (CANTIDAD). The dataset was last updated on 2026-05-19.
Sabato Nocera created this artifact containing data and scripts for a study on cryptography bill of materials in GitHub repositories. The dataset, last updated on June 1, 2026, is shared under a CC-BY-4.0 license. It is a 1.1 MB ZIP file.
25 participants were selected from the DementiaBank corpus for a study evaluating five large language models in scoring the Cookie Theft picture description task. The dataset, created by Michael J. Kleiman and last updated in April 2026, contains participant characteristics in a 5.5 KB Excel file. It supports research on automating neuropsychological assessments.
Geoscience Australia maintains the Australian Marine Spatial Information System (AMSIS), a web-based interactive mapping and decision support tool. It integrates curated government, state, and academic data across themes like Maritime Boundaries, Petroleum, Fisheries, Environment, Native Title, and Regulation. The system visualizes competing marine interests to facilitate multi-sectoral planning and management discussions.
Primer sequence information supporting a study on the effects of N-acetylcysteine (NAC) on chilled semen storage in the endangered Maguan hornless goat. The dataset, made available by Haoran Xu on April 21, 2026, is a 5.5 KB Excel file containing sequences likely used for analyzing antioxidant and pro-apoptotic gene expression.
Supplementary data supporting a reliability-oriented energy management study for Antarctic integrated energy systems. The dataset is provided as a 2.2 MB PDF file by author 昱熙 黄 and was last updated on May 22, 2026. It likely contains numerical or tabular data referenced in the associated research.
Experimental data from a study on the effects of N-acetylcysteine (NAC) on chilled semen storage for the endangered Maguan hornless goat breed. The dataset, published by Haoran Xu in April 2026, includes measurements of sperm kinetics, antioxidant gene transcription, and oxidative stress-related enzymatic activity across different NAC concentrations and storage times. The file is a 13.6 KB XLSX spreadsheet.
18.1 KB of experimental data from a study on the Maguan hornless goat, an endangered breed in Yunnan, China. The dataset likely contains measurements of sperm kinetics, antioxidant gene transcription, and oxidative stress enzyme activity under different N-acetylcysteine concentrations during 72-hour chilled storage. Author Haoran Xu published this work on figshare under a CC-BY-4.0 license in April 2026.