Name: Hindi Visual Question Answering Dataset with Balanced Clusters
Creator: damerajee
Published: 2024-06-02T07:38:24
Keywords: Size Categories1 Kn10 K, Librarypolars, Hindi Language, Librarydask, Task Categoriesvisual Question Answering, Modalitytext, Librarymlcroissant, Modalityimage, Librarydatasets, Licensecc By 40, Parquet, Regionus, Clustering, Visual Question Answering, Synthetic, Multimodal, Embeddings

Description

Hindi VQA is a dataset for visual question answering in Hindi. It was filtered to be more balanced and processed to create sentence embeddings using a pre-trained transformer model, followed by KMeans clustering and t-SNE for visualization. The dataset was uploaded by damerajee to Hugging Face on June 2, 2024.

Use Cases

Training visual question answering models based on Hindi-language image-text pairs.
Analyzing clusters of similar answers based on sentence embeddings.
Visualizing the distribution of answer types using dimensionality reduction techniques.
Benchmarking model performance on a balanced Hindi VQA dataset.

Strengths

The dataset was explicitly filtered to be more balanced.
Sentence embeddings were generated using a pre-trained transformer model.
Clustering and visualization techniques (KMeans and t-SNE) were applied to the processed data.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Filtered and processed from an original dataset; embeddings generated with a pre-trained transformer model.
Freshness: Last updated 2024-06-02 07:54:06; freshness should be verified.

Multimodal Parquet Size Categories1 Kn10 K Librarypolars Hindi Language Librarydask Task Categoriesvisual Question Answering Modalitytext Librarymlcroissant Modalityimage Librarydatasets Licensecc By 40 Regionus Clustering Visual Question Answering Synthetic Embeddings

Hindi Visual Question Answering Dataset with Balanced Clusters

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info