Loading...
Loading...
Drug-target interaction, molecular screening, ADMET, compound databases, pharmaceutical data
532 datasets
A dataset for fine-tuning language models on protein-ligand binding affinity prediction. It is associated with tags for molecules, SMILES strings, and chemistry, indicating a focus on molecular data. The dataset was last updated in March 2022.
Two categories of protein-ligand interaction data, binding affinity and contact prediction, are provided for language model fine-tuning. These labels enable the development of predictive models for biochemical interactions and structural contacts between proteins and ligands.
The Toxicity Reference Database (ToxRefDB) from the U.S. Environmental Protection Agency contains toxicity testing results for 474 chemicals, primarily pesticide active ingredients. It consolidates approximately 30 years and $2 billion worth of animal studies previously found only in paper documents.
Designed for fine-tuning language models on protein-ligand binding affinity and contact prediction. It contains molecular data tagged with categories such as Molecules, SMILES, and Chemistry. The dataset was authored by jglaser and last updated in May 2022.
A collection of a scored subset of 2.2 million documents from The Pile, processed through the Perspective API on May 18-20, 2022. It was created by tomekkorbak and provides toxicity annotations for text chunks.
Pile Toxicity Balanced2 is a text dataset designed for training and evaluating language models on toxic content. The dataset, created by researcher tomekkorbak, was uploaded to Hugging Face in April 2022. It is part of a series of datasets derived from The Pile, a large-scale text corpus used for AI development.
Real Toxicity Continuations is a text dataset for evaluating the toxicity of language model outputs. It was created by user 'sasha' and last updated on Hugging Face in July 2022. The dataset contains prompts and continuations, likely sourced from models like GPT-2, to measure the propensity of language models to generate harmful text.
A curated subset of The Pile dataset focused on toxic text examples, balanced for training and evaluation. The dataset was created by researcher tomekkorbak and uploaded to Hugging Face in June 2022. It is part of a series of balanced subsets derived from the larger 825GB Pile corpus.
Toxicity Debug is a text dataset for evaluating language model safety, created by researcher tomekkorbak and hosted on Hugging Face. It was last updated in April 2022. The dataset's size is categorized as 'n1 K', indicating it contains over 1,000 entries.
A filtered subset of the Pile dataset, focused on text with toxicity labels, curated by researcher tomekkorbak and hosted on Hugging Face. It contains approximately 100,000 text samples, as indicated by its size category, and was last updated in April 2022. The data is intended for training and evaluating language models on toxic content.
Known as titled 'Medication' and was authored by mrojas. It was last updated on June 7, 2021. The number of rows, columns, and specific data content are unknown.