ST-CORE-TOKENS is an ultra-refined, high-density tokenized dataset developed by SKT AI Labs for Indian language models. The dataset, described as containing distilled logic, is hosted on Hugging Face and was last updated on April 8, 2026. Its stated purpose is to enhance the cognitive logic and advanced capabilities of Indian large language models.
Use Cases
- Fine-tuning language models based on the described high-density tokenization.
- Training models for advanced cognitive logic tasks based on the dataset's stated purpose.
- Developing or benchmarking Indian large language models based on the dataset's regional focus.
Strengths
- Dataset is described as 'ultra-refined' and 'high-density'.
- Last update recorded as April 8, 2026, suggesting recent maintenance.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count, file formats, and license are unknown, which may limit suitability assessment.
Provenance
- Source
- SKT AI Labs
- Collection Method
- Developed by SKT AI Labs; specific gathering method is unknown.
- Time Range
- null
- Freshness
- Last updated 2026-04-08 19:59:22
- Geography
- Dataset is described with a focus on India and 'Sovereign Indian Intelligence'.