Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
ProbioSML is a machine learning-derived genomic dataset containing 1,072 non-redundant protein-coding sequences. It was created by Diego Lucas Neres Rodrigues through pangenomic analysis and supervised machine learning of bacterial genomes from taxa frequently reported as probiotics and reference gut-associated bacteria. The dataset, last updated in April 2026, is publicly available under a CC-BY-4.0 license.
Data is in XLSX format; requires software capable of reading Excel files. The features are described as genomic patterns associated with probiotic taxa, not as causal determinants of probiotic functionality.