DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Chemistry Datasets | DataSalon

All Categories

🧪

Chemistry

Organic/inorganic chemistry, analytical chemistry, electrochemistry, molecular properties, chemical reactions

2,032 datasets

Chemistry

DFT Glossary: Over 450 Terms for Theoretical Organic Chemistry

A glossary containing definitions and explanatory notes for more than 450 terms used in multidisciplinary research related to modern theoretical concepts and computational methods. It was created by Vladimir I. Minkin of Southern Federal University to provide guidance on terminology in theoretical organic chemistry. The aim is to contribute to the elimination of inconsistencies and ambiguities in the meanings of terms in this area.

TextEngineeringEpistemologyManagement ScienceComputer ScienceOrganic ChemistrySocial ScienceMultidisciplinary ApproachChemistryGlossaryGeographyPhilosophyContext ArchaeologySociologyTheoretical ChemistryTerminologyReactivity PsychologyLinguisticsComputational Chemistry+1

0 views

Chemistry

QSAR Bioconcentration Regression: Chemical Compounds with logBCF Target

A cheminformatics dataset from the UCI Machine Learning Repository for predicting the bioconcentration factor (BCF) of chemical compounds. It contains molecular descriptors as features and a continuous logBCF target variable for regression tasks. The dataset was contributed by authors from the Milano Chemometrics and QSAR Research Group.

TabularQsarBioconcentrationRegressionCheminformaticsEnvironmental ChemistryChemical Properties+1

0 views

Chemistry

QSAR Bioconcentration Classes for Chemical Compound Toxicity Prediction

A dataset for predicting the bioconcentration factor of chemical compounds, which measures accumulation in living organisms. It was created by Grisoni, F., Consonni, V., Villa, S., Vighi, M., & Todeschini, R. and is sourced from the UCI Machine Learning Repository and the Milano Chemometrics and QSAR Research Group. The dataset contains molecular descriptors and a categorical target variable for classification.

TabularQsarMolecular DescriptorsBioconcentrationEnvironmental ChemistryChemical PropertiesToxicityMolecular Properties+1

0 views

Chemistry

QSAR Bioconcentration Classes for Chemical Toxicity Prediction

The QSAR Bioconcentration Classes Dataset originates from the UCI Machine Learning Repository and the Milano Chemometrics and QSAR Research Group. Authors Grisoni, F., Consonni, V., Villa, S., Vighi, M., & Todeschini, R. compiled molecular descriptors to predict the bioconcentration factor of chemical compounds. The dataset is used for classification tasks to categorize compounds into bioconcentration classes.

TabularQsarBioconcentrationCheminformaticsEnvironmental ChemistryChemical PropertiesToxicityMolecular Properties+1

0 views

Chemistry

QSAR Bioconcentration Factor Regression with Molecular Descriptors

A well-known cheminformatics dataset from the UCI Machine Learning Repository, originally created by Grisoni et al. of the Milano Chemometrics and QSAR Research Group. Its primary objective is to predict the log-transformed bioconcentration factor (logBCF) of chemical compounds, a key measure of environmental toxicity. The dataset contains molecular descriptors describing chemical structure and properties.

TabularQsarBioconcentrationRegressionCheminformaticsEnvironmental Chemistry+1

0 views

Chemistry

QSAR Bioconcentration Classes for Chemical Toxicity Prediction

TabularQsarBioconcentrationCheminformaticsEnvironmental ChemistryChemical PropertiesToxicity+1

0 views

Chemistry

Mass Spectrometry and SomaScan Concordance Data for Biomarker Panels

Mass spectrometry (MS) – SomaScan (SS) concordance data provides evidence for biomarker panels in a chronic progressive disease study. The 5.5 KB Excel file, authored by Blake Hummer and last updated in March 2026, is licensed for open use under CC-BY-4.0. Platform tags suggest the data relates to a pilot study identifying 68 proteins and staging biomarker profiles.

TabularExcelStaging Biomarker ProfileControl SubjectsControl GroupOmic Study AimedTerm OutcomesIncreased Collagen SynthesisSoma ScanBiomarker ConcordanceHealthy Control GroupPilot StudyFound Normal CollagenIdentified 68 ProteinsPotential UseFirst Corrective ProcedureProvide EvidenceImpaired Degradation RatherStaging Biomarker PanelsProteomics6 RespectivelyChronic Progressive DiseasePreventive TherapeuticsImpaired CollagenComparing Dd Plasma+1

0 views

Chemistry

QSAR Fish Toxicity: 908 Chemicals with 6 Molecular Descriptors

QSAR_fish_toxicity contains 6 molecular descriptor attributes for 908 chemicals, used to predict acute aquatic toxicity for the fish Pimephales promelas (fathead minnow). The dataset was developed for quantitative regression QSAR models, using LC50 (the concentration lethal to 50% of test fish over 96 hours) as the target feature. It is licensed under CC-BY-4.0 and hosted on OpenML.

TabularAquatic ToxicityQsarMolecular DescriptorsEnvironmental ChemistryToxicity+1

0 views

Chemistry

Macrobenthic Activity Effects on Ocean Margin Biogeochemistry

Geoscience Australia Data published a monograph on ocean margin systems, last updated in March 2026. The publication examines the dynamics of benthic life and its influence on biogeochemical reactions and fluxes in transitional zones between oceans and continents. The dataset is tagged with Earth sciences, Marine, and Geochemistry subjects.

External PublicationEarth sciencesMarinePublished ExternalMonographGeochemistry+1

0 views

Chemistry

Iridium Oxide Electrolyzer Catalyst Layers Coated by Roll-to-Roll Methods

Encompassing experimental data from a study investigating roll-to-roll coating methods for producing iridium oxide catalyst layers used in proton exchange membrane water electrolyzers. The research compares two coating methods, slot die and gravure, and analyzes their impact on film microstructure, electrolyzer performance, and durability. Row and column counts are unknown.

Roll To RollElectrolysis+1

0 views

Chemistry

Cancer Targets with ChEMBL and UniProt IDs for Bioactivity Prediction

Raúl Acosta-Murillo's dataset describes identified cancer targets with ChEMBL and UniProt identifiers. The data is stored in a 9.5 KB Excel file and was last updated on March 17, 2026. Platform tags suggest the dataset was used for predicting cancer bioactivities using various chemical representations and machine learning models.

TabularExcelFm3 Mol2vecApc AvalonChemical RepresentationBased Morgan 2Mk2 RdkfingerprintLss MultiChemblCancer TargetsAvn ExtendedPredicting Cancer BioactivitiesEcfp4 ExtendedIncluding Atompair CountsRft RidgeRdc TorsionHgt KMtor Dataset DemonstratedHighest Predictive AccuracyUniprotEcfp6 FeatureAvn Chemical RepresentationNn LassoBased Morgan 3Fm2 FeatureNearest NeighborsBioactivity Prediction+1

0 views

Chemistry

USDA Phytochemical Database with Clinical and Patent Data

A phytochemical dataset integrates USDA botanical records with PubMed citations, ClinicalTrials.gov study counts, ChEMBL bioactivity scores, and USPTO patent density. It is provided in production-ready JSON and Parquet formats by the author wirthal1990-tech. The dataset was last updated in March 2026.

JSONLibrarypolarsPhytochemicalLanguageenPatentsChemblEthnobotanySize Categoriesn1 KModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasLicensecc By 40Drug DiscoveryRegionusTask Categoriestext ClassificationClinical TrialsTask Categoriestabular ClassificationPubmed+1

0 views

Chemistry

Mass Spectrometry Results for BCOR Mutation Retinoblastoma Study

Figshare hosts Supplementary Table S4 containing mass spectrometry results from a study on BCOR mutations in retinoblastoma. The dataset, shared under CC BY license by Michelle G. Zhang, is a 137.0 KB Excel file. It supports analysis of deregulated cell cycle and hypoxic adaptation pathways.

TabularExcelBcor MutationRetinoblastomaMass SpectrometryHypoxic AdaptationCell Cycle+1

0 views

Chemistry

Quantitative Proteomics Data for EGFP Translation Enhancement Study

701.4 KB of quantitative mass spectrometry data assesses the effect of λGRTS targeting on EGFP translation efficiency in mammalian cells and its off-target effects. The dataset was created by Junzhe Liu and last updated in March 2026. It is provided in XLS format.

TabularGrtsTranslation EnhancementTg RnaMass SpectrometryMammalian CellsEgfpProteomicsTranslation EfficiencyN Box B+1

0 views

Chemistry

Santa Barbara Basin Nitrogen Isotope Paleoclimate Records

NOAA/WDS Paleoclimatology archives bulk and compound-specific nitrogen isotope data from the Eastern Pacific Ocean's Santa Barbara Basin. The dataset contains parameters for paleoceanography studies, with a time period measured in calendar years before present. It is maintained by the NOAA National Centers for Environmental Information under the World Data Service for Paleoclimatology.

TabularTime SeriesGeospatialPaleoceanographyPaleoclimatologyNitrogen IsotopesMarine Sediments+1

0 views

Chemistry

Chemical Substance Registry With Structure Data

Over 350,000 chemical records from the National Library of Medicine provide structure and nomenclature authority files. More than 80,000 records include chemical structures. The database is maintained by the NLM's SCIOPS organization.

TabularStructure SearchNomenclatureBiomedical ResourcesChemical Identification+1

0 views

Chemistry

Proficiency Test Panel for Yaws Pathogen Detection

A proficiency testing panel for molecular diagnostics, consisting of seven swabs. The dataset details test items for pathogens like Treponema pallidum and Haemophilus ducreyi, relevant to yaws eradication campaigns. It was authored by Claudia Mueller and last updated in March 2026.

Human Field SamplesExternal Quality AssessmentHaemophilus DucreyiTpHuman Hek293 CellsTreponema PallidumField Samples ResultedTested Quantitative RealProficiency Testing PanelIvoire5 7Yaws Eradication Campaign3 13Blinded Proficiency TestingPti ProviderSeven Swabs LoadedMediated Isothermal AmplificationDifferent Environmental ConditionsYaws Elimination+1

0 views

Chemistry

UK Shipwreck Records from National Monuments and Hydrographic Office

Maritime Archaeology Ltd compared wreck data from the UK's National Monuments Record and the UK Hydrographic Office. The project, commissioned by English Heritage, aimed to identify and resolve discrepancies between these two official maritime heritage datasets. The data was aggregated by the Marine Environmental Data & Information Network and last updated in March 2026.

TabularGeospatialShipwrecksData HarmonizationUk Historic EnvironmentMaritime Archaeology+1

0 views

Chemistry

Phenolic Compound Concentrations in Antarctic Lichen by Thallus Age

SCIOPS produced a dataset on the concentrations of phenolic metabolites in the Antarctic lichen Umbilicaria antarctica. Densitometric analysis (HPTLC) was used to measure usnic acid, atranorin, and gyrophoric acid levels in thalli of different ages. The data was last updated on February 20, 1989.

TabularPhenolic CompoundsAntarctic BiologyHptlcLichen Chemistry+1

0 views

Chemistry

Usnic Acid in Antarctic Lichens Related to Ozone Depletion, 30-Year Period

A 30-year collection period from Antarctica provides data on the accumulation of usnic acid in two lichen species: Neuropogon aurantiaco-ater and Ramalina terebrata. The dataset, sourced from NASA EarthData and last updated in 1996, examines the relationship between this UV-B absorbing compound and critical levels of ozone depletion.

TabularAntarcticaOzone DepletionUsnic AcidUv AbsorptionLichen Chemistry+1

0 views

PreviousPage 90 of 102Next