EMOVA-Alignment-7M: Multimodal Pre-training Data for Vision, Speech, and Language

Name: EMOVA-Alignment-7M: Multimodal Pre-training Data for Vision, Speech, and Language
Creator: Emova-ollm
Published: 2024-12-14T04:34:32
Keywords: Pre Training, Vision Language, Multimodal Alignment, Computer Vision, Speech Language, Audio, OCR, Multimodal

by Emova-ollmUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

EMOVA-Alignment-7M is a dataset curated for omni-modal pre-training, including vision-language and speech-language alignment. It was created by Emova-ollm using open-sourced image-text pre-training datasets, OCR datasets, and 2,000 hours of ASR and TTS data. The dataset page was last updated on 2025-03-14.

Use Cases

Train vision-language models based on the described image-text pre-training data.
Develop speech-language alignment models based on the 2,000 hours of ASR and TTS data.
Fine-tune OCR systems using the described OCR datasets.
Conduct multimodal pre-training research combining visual, textual, and auditory modalities.
Benchmark omni-modal AI systems on tasks requiring cross-modal understanding.

Strengths

Integrates 2,000 hours of ASR and TTS data for speech-language alignment.
Combines multiple data sources including image-text and OCR datasets for multimodal coverage.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and total dataset size are unknown, which may limit suitability assessment.
Data may reflect source bias inherent to the aggregated open-sourced datasets.

Provenance

Source: Emova-ollm
Collection Method: Curated from open-sourced image-text pre-training datasets, OCR datasets, and ASR/TTS data.
Freshness: Last updated 2025-03-14 13:21:17; freshness should be verified.

License is unknown; terms of use must be verified before application.

Audio Multimodal Pre Training Vision Language Multimodal Alignment Computer Vision Speech Language OCR

Related Datasets

Quality Score

C41

Description

51

Source

39

Reputation

38

Access

26

Community

2.4K downloads

6 likes

0 views

Dataset Info

Author: Emova-ollm
Created: Dec 14, 2024
Updated: Mar 14, 2025
Last synced: May 26, 2026

Access

26

Community

2.4K downloads

6 likes

0 views

Dataset Info

Author: Emova-ollm
Created: Dec 14, 2024
Updated: Mar 14, 2025
Last synced: May 26, 2026

EMOVA-Alignment-7M: Multimodal Pre-training Data for Vision, Speech, and Language

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info