Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
NexaSci Scientific Research Tokenized is a large-scale text corpus for AI pretraining, containing 10 billion tokens of scientific research material. The dataset is maintained by AethronPhantom and was last updated in May 2026. It includes the current production reservoir and archived legacy builds.
License is unknown; users should verify terms of use before application.