A longitudinal and multimodal benchmark for robust drift detection in Android malware. The dataset is hosted on Kaggle, but specific details on its size, creation date, and authorship are not provided in the available metadata. Its primary purpose is to serve as a testbed for evaluating the robustness of machine learning models against concept drift in the malware domain.
Use Cases
- Benchmarking drift detection algorithms based on the longitudinal nature of the data.
- Training multimodal classifiers for Android malware using the multiple data types suggested by the description.
- Evaluating model robustness against evolving malware threats over time.
Strengths
- Designed specifically for the challenging task of concept drift detection in security.
- Incorporates multiple data modalities, which may provide a more comprehensive view of malware characteristics.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.