Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A mixture of 2,909,551 Chinese news articles from the SogouCA and SogouCS corpora, categorized into 5 classes. The dataset was created by Xiang Zhang, Junbo Zhao, and Yann LeCun, with Chinese characters converted to Pinyin. Classification labels are derived from the news article's URL domain.
License is attributed to the Courant Institute of Mathematical Sciences and Kaggle; specific terms should be reviewed. Chinese characters have been converted to Pinyin, which may affect semantic analysis.