Tw Drug Labels Vision

Name: Tw Drug Labels Vision
Creator: twinkle-ai
Published: 2026-05-02T16:45:32
Keywords: Traditional Chinese, Multimodal Dataset, Drug Regulatory Data, Computer Vision, Pharmaceutical Labels, Natural Language Processing, Taiwan Healthcare, Multimodal

by twinkle-aiUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

44,663 traditional Chinese pharmaceutical label documents from Taiwan's Food and Drug Administration (TFDA). The dataset was created by twinkle-ai and last updated on 2026-05-03. Each record contains rendered WebP images of all PDF pages and structured data extracted into a 17-field JSON schema.

Use Cases

Fine-tuning language models based on structured pharmaceutical label data.
Training vision-language models based on rendered document images paired with structured text.
Building document question-answering systems for drug information retrieval.
Developing tools for traditional Chinese medical NLP tasks using the provided corpus.

Strengths

Contains 44,663 records of Taiwanese drug labels.
Provides multimodal data with both rendered WebP images and structured JSON data for each document.
Follows a unified 17-field JSON schema for structured extraction.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-05-03 02:21:40; freshness should be verified.

Provenance

Source: Taiwan Food and Drug Administration (TFDA) drug license query system.
Collection Method: PDFs were downloaded from URLs listed in a government open data Excel file, rendered to WebP images, and processed with OCR and structured extraction.
Time Range: null
Freshness: Last updated 2026-05-03 02:21:40.
Geography: Taiwan

null

Multimodal Traditional Chinese Multimodal Dataset Drug Regulatory Data Computer Vision Pharmaceutical Labels Natural Language Processing Taiwan Healthcare

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

41

Access

26

Community

17 downloads

2 likes

0 views

Dataset Info

Author: twinkle-ai
Created: May 2, 2026
Updated: May 3, 2026
Last synced: Jun 1, 2026

Access

26

Community

17 downloads

2 likes

0 views

Dataset Info

Author: twinkle-ai
Created: May 2, 2026
Updated: May 3, 2026
Last synced: Jun 1, 2026

Tw Drug Labels Vision

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info