Infinity-Instruct-2 is a synthesized instruction-following dataset covering Chemistry, Physics, and Mathematics. It was created by lhpku20010120 for supervised fine-tuning of large language models and includes at least 20,000 baseline and 20,000 synthesized samples for Chemistry. The dataset was last updated on 2026-04-01.
Use Cases
- Supervised fine-tuning of LLMs based on synthesized scientific instructions.
- Benchmarking model performance on instruction-following tasks in Chemistry, Physics, and Mathematics.
- Training models to generate or process domain-specific scientific text.
Strengths
- Includes at least 20,000 baseline and 20,000 synthesized samples for Chemistry.
- Dataset is explicitly curated and synthesized for a specific machine learning task.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count for Physics and Mathematics components is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Collection Method
- Curated and synthesized from unspecified sources.
- Time Range
- null
- Freshness
- Last updated 2026-04-01 14:42:34; freshness should be verified.
- Geography
- null