Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
syjiang's Non-coding RNA Pretraining Corpus contains 47,154,121 unique ncRNA sequences, representing 32.48 billion nucleotides. The dataset was aggregated from five complementary repositories, deduplicated, and clustered for balanced sampling. It was last updated on 2026-06-22.
License is unknown; users should verify terms before use.