Description

An evaluation benchmark derived from high-throughput screening (HTS) data, designed for classification and regression tasks. The dataset includes continuous inhibition activity percentages with associated standard error and standard deviation. It was created by Alma Celeste Castaneda Leautaud and published on Harvard Dataverse in May 2026.

Use Cases

Benchmark classification models based on the dataset's enforced non-trivial class separability.
Train regression models to predict inhibition activity based on the provided continuous activity values.
Evaluate model robustness against noisy labels based on the described experimental variability and potential false positives/negatives.
Simulate realistic virtual screening workflows based on the UMAP-sampled and clustered representation of the screening space.

Strengths

Designed with realistic representation of virtual screening space using UMAP-based sampling and clustering.
Provides both classification targets and continuous inhibition activity values with standard error and standard deviation.
Assay optimization resulted in a high overall quality score (Z' = 0.86).

Limitations

Continuous activity labels are inherently noisy due to primary HTS conditions, fluorescence readout, and experimental variability.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: Harvard Dataverse
Collection Method: Derived from high-throughput screening (HTS) with UMAP-based sampling and clustering.
Freshness: Last updated 2026-05-06 03:21:15; freshness should be verified.

License is unknown.

Tabular High Throughput Screening Machine Learning Benchmark Benchmark Pharmacology Drug Discovery

HTS-derived AI Evaluation Dataset with Realistic Virtual Screening Space

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info