Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
1,072 non-redundant protein-coding sequences form a genomic dataset derived from comparative analyses of bacterial genomes. The ProbioSML dataset, created by Diego Lucas Neres Rodrigues and released in 2026, was generated using pangenomic analysis combined with supervised machine learning approaches like Random Forest and Support Vector Machine. It includes gene presence-absence matrices and functional annotations for taxa frequently reported as probiotics and reference gut-associated bacteria.
Primary file format is PDF, which may require extraction or conversion for direct computational use.