MSBench is a large dataset containing mass spectrometry data for bottom-up proteomics. The dataset was authored by Yu Gao and is hosted on Harvard Dataverse, with a last recorded update in May 2026. Its primary purpose is to serve as training data for foundational models in the field.
Use Cases
- Training foundational AI models based on mass spectrometry data mentioned in the description
- Benchmarking computational proteomics methods based on the described data
- Developing spectral prediction or peptide identification tools based on the described mass spectrometry data
Strengths
- Dataset is described as 'large', suggesting a substantial volume of data for model training
- Data is specifically curated for training foundational models, indicating a targeted purpose
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download
- Column-level documentation is absent; field semantics must be inferred after download
Provenance
- Source
- Harvard Dataverse
- Collection Method
- Likely contains experimental mass spectrometry data from bottom-up proteomics workflows.
- Time Range
- null
- Freshness
- Last updated 2026 05 06 21:09:48; freshness should be verified
- Geography
- null