Chinese-LiPS is a multimodal dataset for audio-visual speech recognition in Mandarin Chinese. It combines speech, video, and textual transcriptions to enhance automatic speech recognition performance, particularly in educational contexts. The dataset was created by BAAI and was last updated on 2025-11-18.
Use Cases
- Training audio-visual speech recognition models based on synchronized speech, video, and text data.
- Improving speech recognition robustness in noisy environments based on the multimodal lip-reading and audio features.
- Developing educational technology tools based on instructional speech and presentation slide content.
Strengths
- Multimodal design integrates speech, video, and text transcriptions.
- Specifically designed for Mandarin Chinese and educational scenarios.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and total duration are unknown, which may limit suitability assessment.
Provenance
- Source
- BAAI
- Freshness
- Last updated 2025-11-18 16:41:01; freshness should be verified.