Name: Performance Metrics for Riepe Benchmark: SpliceAI Algorithm Comparison
Creator: Nathan Fortier
Published: 2026-05-13T17:42:59
License: CC-BY-4.0
Keywords: Splicing Prediction, Machine Learning Evaluation, Benchmark, Healthcare, Tabular, Spliceai Benchmark, Genomic Variants, Excel

Description

1,316 validated variants and five other datasets totaling over 200,000 variants were used to benchmark SpliceAI and its open-source reimplementations. The dataset, created by Nathan Fortier and last updated in May 2026, compares the original SpliceAI with OpenSpliceAI, CI-SpliceAI, and a legacy ensemble baseline. It includes performance metrics like balanced accuracy and splice-site match rates across different variant classes.

Use Cases

Benchmarking deep learning models for splice-altering variant prediction based on the described six evaluation datasets.
Calibrating score thresholds for deep intronic variants based on the finding that optimal thresholds are an order of magnitude lower than standard recommendations.
Comparing positional agreement between SpliceAI implementations based on the reported exact splice-site match rates exceeding 90%.
Evaluating algorithm performance divergence on canonical versus deep intronic variants as highlighted in the description.

Strengths

Includes a curated set of 1,316 validated variants and five other datasets, one with 99,601 variants.
Compares three deep learning algorithms and a legacy ensemble baseline across six distinct evaluation contexts.
Reports specific performance metrics, including balanced accuracies ranging from 0.841 to 0.977 and exact splice-site match rates exceeding 90%.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is very small at 5.5 KB, indicating limited scope.

Provenance

Source: figshare
Collection Method: Benchmarking study comparing splice prediction algorithms.
Freshness: Last updated 2026-05-13 17:42:59; freshness should be verified.

Data is provided in XLS format.

Tabular Excel Splicing Prediction Machine Learning Evaluation Benchmark Healthcare Spliceai Benchmark Genomic Variants

Performance Metrics for Riepe Benchmark: SpliceAI Algorithm Comparison

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info