Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Gold-standard benchmark for document alignment between Sinhala, Tamil, and English languages. It contains manually annotated document pairs crawled from four Sri Lankan news websites: Army, Hiru, ITN, and Newsfirst.
The full description and data structure are available on the Hugging Face dataset page; specific column names, sample data, and file formats are not provided in this input.