Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
August 2023 dataset derived from The Pile, with copyrighted content removed to respect author rights. Created by monology, this collection is intended for training large language models without copyright infringement. The methodology involved removing content from specific copyrighted sources like Books3 and BookCorpus2.
License information is unknown. The dataset page on Hugging Face must be consulted for the full description and any usage terms. A planned uncopyrighted version of RedPajama is mentioned but has no estimated release date.