DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Crow 8B: Cleaned Training Data for Language Model Fine-Tuning | DataSalon

Home Transportation & MobilityCrow 8B: Cleaned Training Data for Language Model Fine-Tuning

Transportation & Mobility

Crow 8B: Cleaned Training Data for Language Model Fine-Tuning

Name: Crow 8B: Cleaned Training Data for Language Model Fine-Tuning
Creator: Crownelius
Published: 2026-02-26T06:13:34
Keywords: Text Generation, Prompt Completion, Text, Ai Training Data, Llm Training

by Crownelius·Updated 3mo ago

Available on 1 platform

Description

615,000 tokens of cleaned text data used for training the Crow 8B language model. The dataset was created by Crownelius and last updated on Hugging Face in March 2026. It consists of prompt-completion pairs with an average of 6.65 tokens per row.

Use Cases

Fine-tuning language models based on the provided prompt-completion structure.
Benchmarking model performance on text generation tasks using the described token counts.
Studying data cleaning techniques for AI training datasets as suggested by the title.
Analyzing cost efficiency for model training using the provided token and cost metrics.

Strengths

Contains 615,000 tokens of cleaned text data.
Provides specific cost metrics, estimating a total cost of $3.08 USD for generation.
Has a consistent structure with an average of 1.00 turns and 6.65 tokens per row.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and license information are unknown.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Likely generated or cleaned for training the Crow 8B model.
Freshness: Last updated 2026-03-15 07:03:32; freshness should be verified.

License is unknown, which may restrict commercial or research use.

Text Text Generation Prompt Completion Ai Training Data Llm Training

Related Datasets

Quality Score

D33

Description

Source

Reputation

Quality Score

D33

Description

Source

Reputation

Access

Community

8 downloads

3 likes

0 views

Dataset Info

Author: Crownelius
Created: Feb 26, 2026
Updated: Mar 15, 2026
Last synced: Jun 26, 2026

Access

Community

8 downloads

3 likes

0 views

Dataset Info

Author: Crownelius
Created: Feb 26, 2026
Updated: Mar 15, 2026
Last synced: Jun 26, 2026

Crow 8B: Cleaned Training Data for Language Model Fine-Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info