Compression Method Performance Metrics for Paired-End Sequencing File Formats
by Noam Teyssier·Updated 8d ago
9.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Noam Teyssier's dataset compares different compression methods and their performance metrics across various file formats for paired-end sequencing records. The data includes time in seconds, size in megabytes, and a combined score metric, along with details on bit size, parallel execution, bandwidth, and speedup. It was last updated on May 28, 2026.
Use Cases
Benchmarking compression algorithm speed and efficiency based on reported time and size metrics.
Evaluating the impact of parallel processing on compression performance based on the 'Parallel' column.
Comparing storage footprint across file formats using the 'Bit Size' and 'Size' columns.
Analyzing throughput for data pipelines using the 'Bandwidth' (Gbp/s) metric.
Strengths
Dataset provides a direct comparison of multiple performance metrics, including time, size, bandwidth, and speedup.
Includes a defined combined score metric (Equation 4) for normalized evaluation.
Explicitly states the unit of measurement for each key metric (seconds, megabytes, Gbp/s).
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is very small at 9.5 KB, indicating limited scope.
Provenance
Source
figshare
Freshness
Last updated 2026-05-28 17:42:51; freshness should be verified.