58 complete chloroplast genomes and two hypervariable nuclear genes were used to identify 21 Oryza species and evaluate DNA barcodes. The study found that 17% of 53 seed accessions from rice seed banks or field collections were mislabeled. This dataset was created by Wen Zhang of the Chinese Academy of Sciences.
Use Cases
- Train species classification models based on chloroplast genome and nuclear gene sequences.
- Develop bioinformatics pipelines for evaluating the reliability of different DNA barcodes.
- Verify the taxonomic identity of seed bank accessions to improve germplasm management.
- Study phylogenetic relationships between Oryza species, such as O. glaberrima and O. barthii.
Strengths
- Includes data from 58 complete chloroplast genomes and two hypervariable nuclear genes.
- Evaluates the performance of 13 different DNA barcodes, identifying a chloroplast genome super barcode as the most reliable.
- Provides a concrete finding that 17% of tested seed accessions were mislabeled.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Data may reflect bias inherent to the specific seed banks or field collections sampled.
Provenance
- Source
- Chinese Academy of Sciences
- Collection Method
- Phylogenomic analysis applied to seed accessions from rice seed banks or field collections.