Sign in to view source links and access this dataset
Description
100,666,803 tokens of private finetuning data for language models, uploaded by Crownelius to Hugging Face. The data consists of 11,678,493 prompt tokens and 88,988,310 completion tokens, with an average of 621.5 tokens per row. It was last updated on March 15, 2026.
Use Cases
Finetuning language models based on the provided prompt-completion token pairs.
Analyzing token distributions and training costs based on the provided token and cost statistics.
Benchmarking model performance on private conversational or instruction-following data suggested by the prompt-completion structure.
Strengths
Contains 100,666,803 total tokens, providing a substantial volume of text for model training.
Includes detailed token statistics, such as 11,678,493 prompt tokens and 88,988,310 completion tokens.
Cost metrics are provided, estimated at $456.62 using OpenRouter pricing.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
Crownelius on Hugging Face.
Freshness
Last updated 2026-03-15 07:03:16; freshness should be verified.