CI-SpliceAI Benchmark: Performance Metrics for Splice-Altering Variant Prediction
by Nathan Fortier·Updated 23d ago
5.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Performance metrics compare the original SpliceAI tool with two open-source implementations and a legacy ensemble baseline across six variant datasets. The data includes results for 1,316 validated variants, 213 splice-assay variants, 99,601 variants from the SPiP study, 242 deep intronic pathogenic variants, and over 111,000 ClinVar-derived variants. Nathan Fortier published this benchmark on figshare in May 2026.
Use Cases
Benchmarking splice prediction algorithms based on performance metrics across multiple variant datasets.
Calibrating score thresholds for deep intronic variants based on reported optimal thresholds.
Comparing positional agreement between SpliceAI implementations based on reported splice-site match rates.
Analyzing performance divergence on canonical versus deep intronic variants based on balanced accuracy results.
Strengths
Benchmark spans six distinct datasets, including a curated set of 1,316 validated variants.
Includes a large-scale evaluation of 99,601 variants from the SPiP splicing prediction study.
Performance metrics are provided for deep intronic variants, a challenging class comprising 242 manually curated pathogenic examples.
Results show high positional agreement, with exact splice-site match rates exceeding 90% across event types.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is very small at 5.5 KB, indicating limited scope, likely containing summary statistics rather than raw prediction data.
Provenance
Source
figshare
Collection Method
Likely compiled from comparative benchmarking analysis of splice prediction tools.
Freshness
Last updated 2026-05-13 17:42:58; freshness should be verified.
Data is in XLS (Excel) format, which may require specific software to open.