The IfGPT Dataset is developed within the project IfGPT: Infrastructure for Fine-tuning Pre-trained Large Language Models. It aims to establish a freely accessible infrastructure for the selection and pre-processing of large datasets for Bulgarian as well as tailored data for particular industries. The dataset is authored by DCL-IBL and was last updated on Hugging Face in June 2026.
Use Cases
- Fine-tuning large language models based on Bulgarian language data mentioned in the description
- Pre-processing text data for specific industry applications based on the mention of tailored data
- Building infrastructure for dataset selection for NLP tasks based on the project's stated objectives
Strengths
- Dataset is part of a project with a clear objective to establish infrastructure for LLM fine-tuning
- Focuses on Bulgarian language data, which may be a less common resource
- Last updated date is explicitly provided: 2026-06-03
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- DCL-IBL
- Freshness
- Last updated 2026-06-03 10:36:29; freshness should be verified
- Geography
- Bulgarian language focus suggests primary geographic relevance to Bulgaria