20 publicly available classification datasets selected for diversity in dimensionality, sample size, class distribution, and application domain. Gabriel Lima compiled these datasets for evaluating a Model-Agnostic Multivariate Separability Index, and they are provided in CSV format. The collection was last updated on May 4, 2026.
Use Cases
- Benchmarking feature selection algorithms based on datasets with varying levels of class overlap.
- Evaluating model performance across heterogeneous domains based on datasets from healthcare, agriculture, and signal recognition.
- Analyzing feature redundancy and noise in classification tasks based on the described statistical conditions.
Strengths
- Contains 20 datasets spanning multiple fields, including healthcare, agriculture, and signal recognition.
- Datasets are selected to ensure diversity in dimensionality, sample size, class distribution, and application domain.
- Files are provided in CSV format and are ready for direct use in machine learning tasks.
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- figshare
- Collection Method
- Compiled from 20 publicly available classification datasets.
- Freshness
- Last updated 2026-05-04 11:53:06; freshness should be verified.