Performance Metrics for Riepe Benchmark: SpliceAI Algorithm Comparison
by Nathan Fortier·Updated 23d ago
5.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
1,316 validated variants and five other datasets totaling over 200,000 variants were used to benchmark SpliceAI and its open-source reimplementations. The dataset, created by Nathan Fortier and last updated in May 2026, compares the original SpliceAI with OpenSpliceAI, CI-SpliceAI, and a legacy ensemble baseline. It includes performance metrics like balanced accuracy and splice-site match rates across different variant classes.
Use Cases
Benchmarking deep learning models for splice-altering variant prediction based on the described six evaluation datasets.
Calibrating score thresholds for deep intronic variants based on the finding that optimal thresholds are an order of magnitude lower than standard recommendations.
Comparing positional agreement between SpliceAI implementations based on the reported exact splice-site match rates exceeding 90%.
Evaluating algorithm performance divergence on canonical versus deep intronic variants as highlighted in the description.
Strengths
Includes a curated set of 1,316 validated variants and five other datasets, one with 99,601 variants.
Compares three deep learning algorithms and a legacy ensemble baseline across six distinct evaluation contexts.
Reports specific performance metrics, including balanced accuracies ranging from 0.841 to 0.977 and exact splice-site match rates exceeding 90%.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is very small at 5.5 KB, indicating limited scope.
Provenance
Source
figshare
Collection Method
Benchmarking study comparing splice prediction algorithms.
Freshness
Last updated 2026-05-13 17:42:59; freshness should be verified.