400 full-length high-definition talking face videos, split into 81-frame clips and paired with audio embeddings. The dataset was curated by global-optima-research and last updated on June 4, 2025. It is intended for tasks in talking-head generation and multimodal avatar synthesis.
Use Cases
- Train models for talking-head generation based on the provided high-definition video clips.
- Develop video captioning systems using the temporal video units.
- Perform multimodal avatar synthesis by leveraging the paired video and audio embeddings.
Strengths
- Contains 400 original high-definition videos.
- Videos are preprocessed into 81-frame clips for temporal modeling.
- Includes extracted audio embeddings for multimodal tasks.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- global-optima-research
- Collection Method
- Curated and preprocessed version of the HDTF dataset.
- Time Range
- null
- Freshness
- Last updated 2025-06-04 06:31:41.
- Geography
- null