Drugbank Clean is an instruction-tuning dataset of 45,805 question-answer pairs derived from DrugBank. It was created by Oduwo and is designed for fine-tuning large language models on pharmacological knowledge. The dataset was last updated on May 4, 2026.
Use Cases
- Fine-tuning LLMs for accurate drug information retrieval based on the question-answer pairs.
- Training AI assistants to answer clinical pharmacology questions based on the structured knowledge.
- Developing systems to mitigate drug dosage hallucinations based on the curated content.
- Building models to identify potential drug interactions based on the pharmacological knowledge.
Strengths
- Contains 45,805 high-quality question-answer pairs.
- Specifically designed for instruction-tuning of LLMs.
- Derived from the authoritative DrugBank database.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is known, but other metadata like file formats and license are unknown.
- Freshness should be verified as the last update date is May 4, 2026.
Provenance
- Source
- DrugBank database.
- Collection Method
- Derived and instruction-tuned from DrugBank.
- Freshness
- Last updated 2026-05-04 10:49:58