Mendel Dataset: Bilingual Genetics Instruction-Response Pairs

Name: Mendel Dataset: Bilingual Genetics Instruction-Response Pairs
Creator: taylonmcfly
Published: 2026-02-07T10:27:56
Keywords: Text, Bilingual Text, Biology Education, Instruction Following

by taylonmcflyUpdated 4mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Mendel Dataset is a bilingual collection for training language models on Mendelian genetics. It contains instruction-input-output triples in Alpaca format, describing genotypes, phenotypes, and inheritance rules for animals. The dataset was created by author taylonmcfly and was last updated on Hugging Face in February 2026.

Use Cases

Fine-tuning language models for genetics question-answering based on described instruction-response pairs.
Training models to generate Punnett square predictions based on described inheritance rules.
Developing educational chatbots for bilingual biology instruction based on the dataset's English and Russian content.
Benchmarking model performance on structured reasoning tasks in genetics based on the Alpaca-format examples.

Strengths

Dataset is bilingual, containing examples in both English and Russian.
Examples are structured in the Alpaca format, which is a common standard for instruction-following data.
License is explicitly stated as Apache-2.0, providing clear usage rights.

Limitations

Dataset size is categorized as 'n<1K', indicating it contains fewer than 1,000 examples.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Freshness: Last updated 2026-02-07 18:59:51; freshness should be verified.

License is Apache-2.0.

Text Bilingual Text Biology Education Instruction Following

Related Datasets

Quality Score

D39

Description

46

Source

36

Reputation

39

Access

26

Community

15 downloads

1 likes

0 views

Dataset Info

Author: taylonmcfly
Created: Feb 7, 2026
Updated: Feb 7, 2026
Last synced: May 3, 2026

Access

26

Community

15 downloads

1 likes

0 views

Dataset Info

Author: taylonmcfly
Created: Feb 7, 2026
Updated: Feb 7, 2026
Last synced: May 3, 2026

Mendel Dataset: Bilingual Genetics Instruction-Response Pairs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info