Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
This curated repository, maintained by davanstrien and updated as of January 2026, serves as a central index for synthetic text datasets and generation tools. It aggregates resources specifically designed for training and evaluating large language models (LLMs) using artificially generated data. The collection is organized as an 'awesome-list' on GitHub, providing a directory of external links rather than a single unified file.
This is a meta-resource (a directory) rather than a single dataset; users must check the individual licenses and terms of service for each linked dataset before use.