Sign in to view source links and access this dataset
Description
A large, structured dataset of Vietnamese medical conversation data designed to address the scarcity of high-quality, realistic multi-turn doctor-patient consultations. The dataset, created by HoangHa, is intended for Vietnamese-first clinical dialogue modeling and supports bilingual transfer with English data. It was last updated on May 13, 2026.
Use Cases
Training multi-turn dialogue models based on doctor-patient consultation structure.
Fine-tuning language models for Vietnamese clinical conversation understanding.
Developing bilingual (Vietnamese-English) transfer learning models for medical NLP.
Benchmarking the performance of conversational AI in realistic medical consultation scenarios.
Strengths
Dataset is explicitly designed to be 'large' to close a gap in available data.
Focuses on 'realistic multi-turn doctor-patient consultations', suggesting realistic dialogue structure.
Supports 'bilingual transfer with English data', indicating potential for cross-lingual applications.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and license information are unknown, which may limit suitability assessment.
Data may reflect geographic or linguistic bias inherent to its specific collection source for Vietnamese medical conversations.
Provenance
Source
huggingface
Collection Method
Likely collected or curated from medical consultation sources, but specific method is not detailed.
Freshness
Last updated 2026-05-13 13:50:39; freshness should be verified.
Geography
Primarily Vietnam, based on the focus on Vietnamese language data.
License is unknown; users must verify permissions before use.