Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
ShizhenGPT's pre-training dataset contains over 5 billion tokens of Traditional Chinese Medicine text from websites and books, along with a large-scale image-text dataset. The dataset was created by FreedomIntelligence and was last updated in September 2025.
Full dataset description and details are only available on the external Hugging Face dataset page.