DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Vox Classica: ~73 Hours of Human-Read Classical Latin Audio | DataSalon

Home NLP & TextVox Classica: ~73 Hours of Human-Read Classical Latin Audio

NLP & Text

Vox Classica: ~73 Hours of Human-Read Classical Latin Audio

Name: Vox Classica: ~73 Hours of Human-Read Classical Latin Audio
Creator: Ken-Z
Published: 2025-07-25T22:55:02
Keywords: Machine Learning, Language Corpus, Audio, Large Scale, Natural Language Processing, Latin

by Ken-Z·Updated 1mo ago

Available on 1 platform

Description

Vox Classica is a Latin speech corpus of approximately 73 hours of audio, segmented into short clips by sentence. It is a large-scale, ML-ready dataset of human-read Classical Latin designed to address the absence of a publicly available corpus large enough for model training. The dataset was curated by Kaiyuan Zhao and published by Ken-Z.

Use Cases

Train speech recognition models based on the corpus of human-read Latin sentences.
Evaluate speech synthesis models using the segmented audio clips.
Develop language learning applications leveraging the Classical Latin audio.
Research phonetic and prosodic features of Classical Latin from recorded speech.

Strengths

Approximately 73 hours of audio provides a substantial volume for model training.
Segmentation into sentence-level clips suggests structured, ML-ready data.
The dataset was explicitly designed to fill a gap in publicly available human-read Latin corpora.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Freshness should be verified as the last update timestamp is 2026-05-14.

Provenance

Source: Ken-Z (author) via Hugging Face.
Collection Method: Human-read audio, curated and aligned by Kaiyuan Zhao.
Freshness: Last updated 2026-05-14 17:54:44.

Audio Machine Learning Language Corpus Large Scale Natural Language Processing Latin

Related Datasets

Quality Score

C43

Description

Source

Reputation

Quality Score

C43

Description

Source

Reputation

Access

Community

3.4K downloads

8 likes

0 views

Dataset Info

Author: Ken-Z
Created: Jul 25, 2025
Updated: May 14, 2026
Last synced: Jun 16, 2026

Access

Community

3.4K downloads

8 likes

0 views

Dataset Info

Author: Ken-Z
Created: Jul 25, 2025
Updated: May 14, 2026
Last synced: Jun 16, 2026

Vox Classica: ~73 Hours of Human-Read Classical Latin Audio

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info