75,491 image and question-answer pairs depicting microscopic organisms, specifically diatoms and fungal spores. The dataset covers 95 genera and is released under a CC-BY 4.0 license. It was created for a hackathon event on the Kaggle platform.
Use Cases
- Train visual question answering models based on images of microscopic organisms.
- Benchmark model performance on fine-grained biological classification tasks across 95 genera.
- Develop educational tools for microbiology based on annotated image-QA pairs.
Strengths
- 75,491 image-QA pairs provide a substantial volume for model training.
- Covers a specific and diverse set of 95 biological genera.
- Explicitly licensed under the permissive CC-BY 4.0 license.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Data may reflect bias inherent to the specific hackathon's collection scope and methods.
Provenance
- Source
- Kaggle
- Collection Method
- Created for a hackathon event; specific collection method unknown.