A dataset curated for the TabArena Tabular ML IID Study, intended for evaluating predictive machine learning models on independent and identically distributed tabular data. The data originates from a 2013 study by Mansouri et al., focusing on quantitative structure-activity relationship (QSAR) models for the ready biodegradability of chemicals. The original source is licensed under CC BY 4.0.
Use Cases
- Benchmarking classification algorithms for IID tabular data based on the dataset's stated focus.
- Developing QSAR models for predicting chemical biodegradability based on molecular descriptors.
- Studying the relationship between molecular structure and environmental fate based on the dataset's domain.
Strengths
- Dataset is licensed under the permissive CC BY 4.0 license.
- Feature names have been semantically curated by the TabArena team.
- Original research is peer-reviewed and cited with a DOI.
Limitations
- Row count, column count, and file size are unknown, limiting suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- Several features are noted as numeric-ordinal, but their categorical nature is unclear.
Provenance
- Source
- Mansouri, Kamel, et al. 'Quantitative structure-activity relationship models for ready biodegradability of chemicals.' Journal of chemical information and modeling 53.4 (2013): 867-878.
- Collection Method
- Curated from an original data source (https://doi.org/10.24432/C5H60M) by the TabArena team.
- Time Range
- 2013
- Freshness
- Dataset year is 2013; last update date is unknown.
- Geography
- null