Sign in to view source links and access this dataset
Description
38,467 records representing approximately 30,000 unique binary executables totaling ~33.41 GB. The dataset was created by mjbommar and last updated on November 14, 2025. It is designed for machine learning research in binary analysis, malware detection, and program understanding.
Use Cases
Training malware detection models based on binary executable features.
Research in program understanding and binary analysis using machine learning.
Benchmarking analysis tools on a multi-platform collection of executables.
Studying the characteristics of software from diverse sources like Linux, Windows, and malware collections.
Strengths
Contains 38,467 records representing a substantial collection of ~30,000 unique binary executables.
Totals approximately 33.41 GB in size, indicating a significant volume of data.
Sourced from diverse platforms including Linux distributions, Windows, and malware collections like SOREL-20M and Malware Bazaar.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2025-11-14 12:56:16; freshness should be verified.
Provenance
Source
Diverse sources including Linux distributions, Windows operating systems, the SOREL-20M malware dataset, and the Malware Bazaar collection.
Collection Method
Collected from the specified platforms and malware collections.
Freshness
Last updated 2025-11-14 12:56:16.
License is unknown; terms of use must be verified before application.