Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
600 million news articles from the Common Crawl archive, processed from 2016 to June 2024. The data has been cleaned, deduplicated, and includes language detection for articles in over 100 languages. This dataset was created by kareenamehta and is hosted on Hugging Face.
License information requires referring to Common Crawl's Terms of Use.