Loading...
Loading...
DNA/RNA sequences, gene expression, protein structures, metagenomics, single-cell sequencing
23,869 datasets
A multi-modal dataset for AI detection, containing over 1 billion verified samples per month scraped from 19 global sources across text, image, video, and audio modalities. The dataset was created by author anas775 and was last updated on 2026-05 05 02:45:45. Samples are labeled by a weighted ensemble of 8 specialized AI-detection models.
Environmental DNA (eDNA) surveys from 40 water sampling stations between Forestville and Godbout, Quebec, in August 2018, provide a list of invertebrate species detected using the COI genetic marker. The data includes generic activity information such as site, station name, date, marker type, and taxonomic assignments verified by an expert from the Maurice-Lamontagne Institute. This project was funded by Fisheries and Oceans Canada's Coastal Environmental Baseline Data Program.
A dataset generated to train the CRISPR_AI model, derived from a compressed JSON file. The dataset was created by author ljw20180420 using code from a public GitHub repository. Its last recorded update was on June 4, 2026.
Real-time event calendar provides ideas for outings and events within the City of Sherbrooke. The data is updated by the city's boroughs and Communications Department and follows the GVQ Events standard. The dataset was last updated on April 17, 2026.
Mitrastemonaceae plastid genomes are highly minimized to 18–26 kb with extreme AT content exceeding 77% and loss of the typical quadripartite architecture. Despite this reduction, the super-panplastome shows remarkable structural stability and collinearity across individuals, with a conserved set of 26 genes. This dataset provides a genomic framework for studying plastid evolution in endoparasitic plants, including evolutionary rate analyses (dN/dS) and transcriptomic evidence for nuclear-encoded DNA repair genes.
Australian Institute for Disaster Resilience and the Australian Tsunami Advisory Group published the Tsunami Emergency Planning in Australia Handbook on 5 November 2018. The handbook outlines tsunami causes, characteristics, and planning considerations for coastal and maritime communities. It details the Australian Tsunami Warning System and replaces the 2010 predecessor manual.
Supplementary Material 6 contains associations for 40 seed trait related Quantitative Trait Loci (QTLs) and the genes located within their genomic regions. The dataset was authored by Jakob Bruggink and published on figshare under a CC-BY-4.0 license. It was last updated on May 7, 2026.
Supplementary Material 5 by Jakob Bruggink lists hub genes for soybean seed weight research. The dataset contains the top ten genes with annotations for each of five conserved regulatory modules, selected by highest intramodular connectivity. It was uploaded to figshare on 2026-05-07 and is available as a 25.2 KB XLSX file under a CC-BY-4.0 license.
492.7 KB of weather and climate data gathered for each station over the course of an experiment from 2018 to 2021. This supplementary material was authored by Jakob Bruggink and is available as an XLSX file on figshare. The dataset was last updated on May 7, 2026.
Daily global snow cover maps at 375-meter resolution are generated from the Visible Infrared Imager Radiometer Suite (VIIRS) on two polar-orbiting satellites. A cloud-gap-filled algorithm estimates snow cover under clouds by replacing cloud-covered pixels with cloud-free ones. The data is produced by NASA and available under a CC-BY-4.0 license on AWS Open Data.
9.5 KB Excel file summarizing the performance of the Cand-23 LAMP assay compared to a qPCR assay. The data, authored by Mikel Arrieta Salgado, was last updated on April 29, 2026, and originates from the research published by Matthews et al. in 2020.
Global Affairs Canada published the Headquarters Agreement between the Government of Canada and the Commission for Environmental Cooperation. The document sets out privileges and immunities granted to the Commission and establishes the legal framework for its headquarters and operations. The publication is archived and out of date, intended for research or recordkeeping purposes.
Provincial coverage polygons contain permit information for mariculture activities in Québec. The dataset identifies businesses with permit numbers, authorized activity types, species, and site locations and areas. It is maintained by the regional directorates of MAPAQ.
Between June 1 and September 13, 2025, high-frequency eddy covariance and water-side sensors measured air-sea CO2 exchange and oceanographic conditions off the northern shore of Helgoland island. The dataset includes 10 Hz atmospheric measurements scaled to 10 m height, in-situ water CO2 and CTD data from a nearby site, and water current profiles from an ADCP. Air-side CO2 concentrations were supplemented by a nearby ICOS station located 800 meters from the primary measurement site.
July to September 2021 records from the Northern Ireland Office detailing senior officials' business expenses and the permanent secretary's meetings with external organisations. The dataset is provided by the Government Digital Service and is licensed under CC-BY-4.0. It is available in CSV format.
NASA scientists conducted the first successful DNA sequencing experiment aboard the International Space Station. The data likely contains results from sequencing runs performed in microgravity, addressing challenges like fluid behavior and launch vibrations. The dataset is published by JSCNASA under an Open Access (diamond) license.
A list of publications identified in a review of literature and guidance on cleaning to remove food allergens. The dataset is provided by the Government Digital Service under the OGL-UK-3.0 license and is available in CSV format.
A two-sample Mendelian randomization study investigates the causal effect of SGLT-2 inhibition on constipation risk, analyzing 452 circulating metabolites for mediation. The dataset contains genetic association results, including odds ratios and confidence intervals, from summary-level genome-wide association studies. It was authored by Qiuhui Liu and published in 2026.
50,000 hydrographic features represent Canadian surface waters, including watercourses, lakes, permanent snow, and springs. The dataset originates from Natural Resources Canada's CanVec product, compiled from multiple authoritative sources like the National Topographic Data Base and satellite imagery.
Lower Cretaceous marine sequences from a well drilled to 9,070 feet in southwestern Queensland. The dataset contains lithological and micropalaeontological analysis of cuttings from 80 to 3,310 feet and specific core samples, detailing the first appearances of foraminiferal species and Inoceramus prisms. This report establishes a section of the Lower Cretaceous sequence in the Great Artesian Basin.