A dataset named 'hallu_detect_dataset_megascience_14k' likely contains text examples for detecting hallucinations in large language model outputs. It is hosted on Kaggle and appears to contain approximately 14,000 entries. The author, organization, and specific collection methodology are not provided in the available metadata.
Use Cases
- Train a classifier to distinguish factual from hallucinated text (inferred from domain, verify after download)
- Benchmark the performance of different LLMs on hallucination detection tasks (inferred from domain, verify after download)
- Analyze patterns and common failure modes in model-generated text (inferred from domain, verify after download)
Strengths
- Published on Kaggle, a major platform for data science resources.
- The title suggests a scale of approximately 14,000 data points.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count, license, and last update date are unknown.