Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Over 5 billion tokens of Traditional Chinese Medicine text form the largest existing TCM corpus, sourced from websites and books. FreedomIntelligence released this multimodal dataset for pre-training the ShizhenGPT model. It was last updated in September 2025.
License information is unknown. Users must refer to the linked paper and GitHub repository for full dataset details and usage terms.