A collection of malware samples with extracted static and dynamic features, sourced from the VxHeaven repository and VirusTotal. The dataset is intended for cybersecurity research and machine learning model development. The original compilation and feature extraction were performed by contributors to the UCI Machine Learning Repository.
Use Cases
- Classify malware families using static features like file size, header information, and imported libraries.
- Predict malicious behavior by analyzing dynamic features such as API calls, registry changes, and network activity.
- Train anomaly detection models to identify novel malware based on deviations from feature distributions.
- Compare the effectiveness of static versus dynamic analysis features for malware detection tasks.
- Perform feature selection to identify the most discriminative attributes for efficient malware classifiers.
Strengths
- Includes both static and dynamic analysis features, providing a multi-view representation of malware.
- Aggregates samples from two established sources, VxHeaven and VirusTotal, for diversity.
Limitations
- The specific number of samples, features, and temporal coverage is unknown.
- Potential class imbalance or label noise inherent in crowdsourced malware collections.
Provenance
- Source
- UCI Machine Learning Repository, with data originally from VxHeaven and VirusTotal.
- Collection Method
- Malware samples were collected and analyzed to extract static (file-based) and dynamic (behavioral) features.
- Time Range
- null
- Freshness
- null
- Geography
- null