DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

ST-CORE-TOKENS: Ultra-Refined Tokenized Dataset for Indian Language Models | DataSalon

Home NeuroscienceST-CORE-TOKENS: Ultra-Refined Tokenized Dataset for Indian Language Models

Neuroscience

ST-CORE-TOKENS: Ultra-Refined Tokenized Dataset for Indian Language Models

Name: ST-CORE-TOKENS: Ultra-Refined Tokenized Dataset for Indian Language Models
Creator: sKT-Ai-Labs
Published: 2026-04-08T19:38:59
Keywords: Tokenized Data, Text, Cognitive Logic, Language Model, Ai Training, Text

by sKT-Ai-Labs·Updated 3mo ago

Available on 1 platform

Description

ST-CORE-TOKENS is an ultra-refined, high-density tokenized dataset developed by SKT AI Labs for Indian language models. The dataset, described as containing distilled logic, is hosted on Hugging Face and was last updated on April 8, 2026. Its stated purpose is to enhance the cognitive logic and advanced capabilities of Indian large language models.

Use Cases

Fine-tuning language models based on the described high-density tokenization.
Training models for advanced cognitive logic tasks based on the dataset's stated purpose.
Developing or benchmarking Indian large language models based on the dataset's regional focus.

Strengths

Dataset is described as 'ultra-refined' and 'high-density'.
Last update recorded as April 8, 2026, suggesting recent maintenance.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and license are unknown, which may limit suitability assessment.

Provenance

Source: SKT AI Labs
Collection Method: Developed by SKT AI Labs; specific gathering method is unknown.
Time Range: null
Freshness: Last updated 2026-04-08 19:59:22
Geography: Dataset is described with a focus on India and 'Sovereign Indian Intelligence'.

null

Text Tokenized Data Cognitive Logic Language Model Ai Training

Related Datasets

Quality Score

D40

Description

Source

Reputation

Quality Score

D40

Description

Source

Reputation

Access

Community

2 likes

0 views

Dataset Info

Author: sKT-Ai-Labs
Created: Apr 8, 2026
Updated: Apr 8, 2026
Last synced: Apr 16, 2026

Access

Community

2 likes

0 views

Dataset Info

Author: sKT-Ai-Labs
Created: Apr 8, 2026
Updated: Apr 8, 2026
Last synced: Apr 16, 2026

ST-CORE-TOKENS: Ultra-Refined Tokenized Dataset for Indian Language Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info