Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
PangeanicYueJa is a parallel corpus containing 55,000 Cantonese-Japanese sentence pairs sampled from a larger collection of approximately 3.08 million pairs. It was created by Pangeanic and released on Hugging Face, with a last recorded update in June 2026. The corpus is designed for training and evaluating machine translation and multilingual language models.
License is unknown; users should verify terms of use before downloading.