Sign in to view source links and access this dataset
Description
GLM-5.0-8000x-formatted-fixed is a dataset of formatted text interactions, likely for training or evaluating language models. The dataset contains 4,090,360 total tokens, comprising 512,812 prompt tokens and 3,577,548 completion tokens, with an average of 261.87 tokens per row. It was uploaded by Crownelius to Hugging Face and was last updated on March 15, 2026.
Use Cases
Fine-tuning language models based on the provided prompt-completion pairs.
Benchmarking model performance on text generation tasks using the structured interactions.
Analyzing token distribution and cost efficiency for AI training pipelines.
Studying the characteristics of single-turn conversational data for model training.
Strengths
Dataset contains 4,090,360 total tokens, providing a substantial volume of text data.
Cost metrics are explicitly provided, with a total generation cost estimated at $8.60 USD.
The average tokens per row is 261.87, indicating relatively lengthy text entries.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Crownelius via Hugging Face.
Collection Method
Likely generated or formatted for AI model training, as suggested by the token and cost statistics.
Freshness
Last updated 2026-03-15 07:02:34; freshness should be verified.
License is unknown; users should verify permissions before use.