Loading...
Loading...
Drug-target interaction, molecular screening, ADMET, compound databases, pharmaceutical data
532 datasets
An evaluation subset of the Jigsaw Toxic Comment Dataset containing Wikipedia talk page comments annotated for toxic behavior. The dataset is hosted by GuardrailsAI and was last updated on February 12, 2025. It is intended for model evaluation, with training recommended from the original Jigsaw dataset.
2020 data from experiments comprising 110,674 rats presents neurotransmitter response patterns for 258 clinically approved and experimental neuropsychiatric drugs. The dataset, authored by Hamid R. Noori, was used to analyze links between molecular drug action and neurobehavioral effects, revealing mismatches between drug classifications and systems-level neurotransmitter patterns.
Calcium imaging results from probing ligand-binding properties of human and rat ฮฑ7 nicotinic acetylcholine receptor (nAChR) mutants. The data was generated by transient co-expression of ฮฑ7/ฮฑ9 nAChR mutants with chaperones and the Case12 calcium sensor, followed by pharmacological analysis using fluorescence microscopy or a FLIPR reader. It includes determined affinities for acetylcholine and epibatidine for wild-type receptors and specific mutants at positions 117โ119, 184, 185, 187, and 189.
Aggregating gene expression and high-content imaging data from primary human kidney cells exposed to 46 diverse toxicants. It was used to identify biomarkers for predicting nephrotoxicity and inferring mechanisms of toxicity via Random Forest machine learning and network analysis. The data includes mRNA levels of HMOX1 and SQSTM1, along with imaging features capturing cell morphology and nucleus texture changes.
A database compiled by Gaรฑรกn Aceituno, Judith, harvested from e-cienciaDatos and last updated in October 2025. It contains information on alkaloids, including their chemical groups, biological distribution, pharmacological activity, and adverse effects. A second sheet details nanomaterial modifiers used in electrochemical sensors for detecting these alkaloids.
GSK's Tres Cantos Antimycobacterial Set screening identified 50 drug-like compounds prioritized from 250,000 candidates. The dataset includes computational predictions of their mechanisms of action, generated by Maria Jose Rebollo-Lopez in 2020. It is intended to support open-source tuberculosis drug discovery.
RealToxicityPrompts contains 100,000 English sentence snippets extracted from the web by the Allen Institute for AI in 2020. It was developed to provide a standardized benchmark for researchers to quantify and mitigate the risk of neural toxic degeneration in large language models.
Featuring 200,000 text documents from The Pile, balanced for toxicity. It was created by selecting the 100,000 most toxic and 100,000 least toxic documents from a 7-million-document subset scored using the Perspective API. The dataset was authored by tomekkorbak and last updated in April 2022.
ASAC_2201 project data describes the toxicity of marine sediment spiked with contaminated soil from the Thala Valley tip site at Casey to the temperate heart urchin Echinocardium cordatum. The dataset includes daily observations over up to 10 days, with fields for Date, Time, Urchin, Buried, Alive, Salinity, Dissolved Oxygen, pH, and Temperature. The data was last updated on August 8, 2001.
LiverTox provides current information on liver injury from prescription drugs, over-the-counter medications, and dietary supplements. The resource is maintained by the U.S. Department of Health & Human Services and was last updated in June 2025. It details the diagnosis, causes, frequency, and management of drug-induced liver injury.
OpenML hosts a dataset for toxicity prediction, likely containing molecular descriptors or chemical structures. The dataset is tagged for cheminformatics and drug safety applications. Specific details on size, author, and update date are not provided.
OpenML hosts a dataset for toxicity prediction, likely containing molecular descriptors or chemical structures. The dataset is tagged for cheminformatics and drug safety applications. Specific details on size, author, and update date are not provided.
4 experiments exposed Antarctic ophiuroids (Ophiura crassa) for 10 days to diesel in sediment. Daily observations of animal movement were recorded as an indicator of health. The data was produced under project ASAC_2201 and last updated in December 2002.
DailyMed offers a standard resource of medication package inserts, known as Structured Product Labeling (SPL), from the U.S. Department of Health & Human Services. The repository is updated daily, with the most recent update in July 2025, providing current labeling information for drugs and supplements.
Encompassing 200,000 text documents from The Pile, scored for toxicity using the Perspective API in May 2022. It is balanced with 100,000 of the most toxic documents and 100,000 randomly sampled documents.
Serving as for text classification, automatically processed by AutoTrain for the 'procell-expert' project. The data instances contain 'text' and 'target' fields, as shown in a sample describing antitumor activity research. The author is Mim, and it was last updated on April 29, 2022.
159,571 Wikipedia talk page comments labeled across six distinct categories of toxicity including toxic, severe_toxic, and identity_hate. Each record contains raw text from human discussions paired with binary indicators of offensive behavior as determined by human raters.
The dataset contains chemical functional use data, associated ToxPrint descriptors, and EPI Suite properties, supporting a 2017 publication on high-throughput screening for functional substitutes. It includes 729 ToxPrint descriptors per chemical and model outputs such as confusion matrices and bioactivity indices. The data was compiled by the U.S. Environmental Protection Agency for research on chemical alternatives.
UCI's Toxicity dataset contains chemical compounds and their associated toxicity labels for predictive modeling. The dataset's size, specific features, and creation date are not specified in the available metadata. It originates from the UCI Machine Learning Repository, a known source for benchmark datasets.
Drugbankrawparquet is a dataset published on the Hugging Face platform by user agenticx. The dataset was last updated on August 3, 2025. Its content likely contains raw data related to drugs and pharmacology, inferred from the title.