Name: Public Domain Art Images With English And Japanese Captions
Creator: Mitsua
Published: 2024-12-15T13:55:05
Keywords: Task Categoriesimage To Text, Languageen, Librarywebdataset, Task Categoriestext To Image, Modalitytext, Size Categories100 Kn1 M, Librarymlcroissant, Modalityimage, WEBDATASET, Librarydatasets, Licensecc By 40, Regionus, Legal

Description

Art Museums PD 440K is a dataset for training text-to-image and multimodal models, containing images and captions sourced from public domain or CC0-licensed materials. The dataset includes English captions translated to Japanese using the ElanMT model, which was trained on licensed corpus. The creator is Mitsua, with the dataset last updated on February 13, 2025.

Use Cases

Train text-to-image models using public domain art images and their associated English captions.
Develop multilingual image captioning systems leveraging the English and Japanese text pairs.
Fine-tune multimodal models on art history imagery with minimized copyright concerns.

Strengths

All images and texts are sourced from public domain or CC0-licensed materials, minimizing copyright concerns.
Includes multilingual text data with English captions and Japanese translations.
Dataset is actively maintained, with a last update recorded on February 13, 2025.

Limitations

The exact number of rows, columns, and total dataset size are unknown, limiting precise scalability assessment.
Specific data sources, image resolutions, and the scope of art history coverage are not detailed in the provided input.
The translation quality of Japanese captions depends on the proprietary ElanMT model, whose training data specifics are not fully disclosed.

Provenance

Source: Images and texts sourced from public domain or CC0-licensed materials.
Collection Method: English captions were translated to Japanese using the ElanMT model, trained on licensed corpus.
Freshness: Last updated on 2025-02-13.

Users should review the full dataset description on the Hugging Face page for complete details on sources and structure, as specific column information and sample data are unavailable in this summary.

WEBDATASET Task Categoriesimage To Text Languageen Librarywebdataset Task Categoriestext To Image Modalitytext Size Categories100 Kn1 M Librarymlcroissant Modalityimage Librarydatasets Licensecc By 40 Regionus Legal

Public Domain Art Images With English And Japanese Captions

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info