Legco Speech: Hong Kong Legislative Council Audio with 20,471 Hours of Segmented Speech

Name: Legco Speech: Hong Kong Legislative Council Audio with 20,471 Hours of Segmented Speech
Creator: laubonghaudoi
Published: 2026-02-24T07:36:43
Keywords: Cantonese, Audio, Parliamentary Proceedings, Audio Processing, Speech Recognition

by laubonghaudoiUpdated 4mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

22,196 hours of raw audio from Hong Kong Legislative Council meetings, processed into 20,471 hours of segmented speech. The dataset, created by laubonghaudoi, is split into raw and segmented subsets. It was last updated on 2026-02-26.

Use Cases

Training Cantonese speech recognition models based on the large volume of segmented audio.
Analyzing parliamentary speech patterns and discourse based on the transcribed subtitles.
Developing voice activity detection (VAD) systems using the raw and segmented audio subsets.
Studying formal Cantonese language use and political terminology from the legislative proceedings.

Strengths

Large scale with over 20,000 hours of processed audio.
Clear processing pipeline described, including download, VAD segmentation, and transcription with Qwen3-ASR-1.7B.
Provides both raw and segmented subsets for different research needs.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Data may reflect geographic and institutional bias inherent to its source, the Hong Kong Legislative Council.

Provenance

Source: Hong Kong Legislative Council meeting recordings from YouTube.
Collection Method: Audio downloaded, converted to 16kHz OPUS, segmented with fsmn-vad, transcribed to Cantonese subtitles, and errors corrected with regex.
Time Range: null
Freshness: Last updated 2026-02-26 07:09:41; freshness should be verified.
Geography: Hong Kong

null

Audio Cantonese Parliamentary Proceedings Audio Processing Speech Recognition

Related Datasets

Quality Score

C44

Description

51

Source

41

Reputation

49

Access

26

Community

3.9K downloads

2 likes

0 views

Dataset Info

Author: laubonghaudoi
Created: Feb 24, 2026
Updated: Feb 26, 2026
Last synced: May 19, 2026

Access

26

Community

3.9K downloads

2 likes

0 views

Dataset Info

Author: laubonghaudoi
Created: Feb 24, 2026
Updated: Feb 26, 2026
Last synced: May 19, 2026

Legco Speech: Hong Kong Legislative Council Audio with 20,471 Hours of Segmented Speech

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info