Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A dataset of 13,637 microbial genomes with predicted habitat classifications for deep marine, shallow marine, or terrestrial environments. The labels were generated using an XGBoost classifier trained on a manually curated set of 865 genomes, achieving an average AUC of 0.987. The dataset was created by Zhanghan Ni and last updated in May 2026.
License is CC-BY-4.0. Files are provided in IPYNB, CSV, and SAV formats, requiring compatible software for full access.