SpeakerCard-1M: Evidence-Grounded Speaker Traits from VoxCeleb

Name: SpeakerCard-1M: Evidence-Grounded Speaker Traits from VoxCeleb
Creator: JYP2024
Published: 2026-06-02T04:20:43
Keywords: Voxceleb, Speaker Verification, Acoustic analysis, Speech Processing, Audio, Natural Language Processing

by JYP2024Updated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

SpeakerCard-1M is a speaker-centric corpus built on the VoxCeleb1 and VoxCeleb2 datasets. It was created by JYP2024 using a tool-first, LLM-last pipeline where ten acoustic probes extract evidence for a structured schema. The dataset was last updated on June 3, 2026.

Use Cases

Train speaker verification models based on structured speaker traits like gender and accent.
Analyze correlations between acoustic probes and speaker states like emotion and speaking rate.
Benchmark multi-factor speaker characterization systems based on the schema separating stable traits from utterance-level states.
Develop constrained language models for verbalizing acoustic evidence into structured fields.

Strengths

Built on established VoxCeleb1 and VoxCeleb2 datasets, providing a known foundation.
Implements a structured schema separating stable speaker traits from utterance-level states.
Employs a defined pipeline with ten acoustic probes for evidence extraction.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: VoxCeleb1 and VoxCeleb2 datasets.
Collection Method: Created via a tool-first, LLM-last pipeline with acoustic probes and a constrained LLM.
Time Range: null
Freshness: Last updated 2026-06 03 11:22:43; freshness should be verified.
Geography: null

null

Audio Voxceleb Speaker Verification Acoustic analysis Speech Processing Natural Language Processing

Related Datasets

Quality Score

D37

Description

42

Source

36

Reputation

39

Access

26

Community

13 downloads

1 likes

0 views

Dataset Info

Author: JYP2024
Created: Jun 2, 2026
Updated: Jun 3, 2026
Last synced: Jul 2, 2026

Access

26

Community

13 downloads

1 likes

0 views

Dataset Info

Author: JYP2024
Created: Jun 2, 2026
Updated: Jun 3, 2026
Last synced: Jul 2, 2026

SpeakerCard-1M: Evidence-Grounded Speaker Traits from VoxCeleb

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info