Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
VietVault is a large-scale Vietnamese language corpus curated from Common Crawl dataset dumps. It contains 80GB of raw text, cleaned and filtered for Vietnamese, sourced from dumps between 2013 and 2023. The dataset was created by author nampdn-ai and last updated on 2026-05-12.
License is unknown, which may restrict commercial or research use.