Sign in to view source links and access this dataset
Description
M2DS v1.0 is a multilingual multi-document summarisation dataset built from BBC news articles and professionally written BBC summaries. The dataset covers five languages: English, Japanese, Korean, Sinhala, and Tamil. It was created by KushanH and last updated on the Hugging Face platform in April 2026.
Use Cases
Train multi-document summarization models based on the dataset's core purpose.
Benchmark cross-lingual summarization performance across the five included languages.
Fine-tune language-specific summarizers using the professionally written BBC summaries.
Study abstractive summarization techniques on news-domain multi-document inputs.
Strengths
Includes professionally written summaries from the BBC, suggesting high-quality reference texts.
Covers five distinct languages, enabling multilingual research.
Specifically designed for the multi-document summarization task.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect the geographic and editorial bias inherent to its single source, the BBC.
Provenance
Source
BBC news articles and summaries.
Collection Method
Built from BBC source material, method details unspecified.
Time Range
null
Freshness
Last updated 2026-04-09 01:46:48; freshness should be verified.
Geography
null
License is unknown; terms of use must be verified before application.