Loading...
Loading...
Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data
1,551 datasets
Viet-Chart-VQA-images is a dataset hosted on Kaggle. The title suggests it contains images paired with questions and answers, likely for training or evaluating Visual Question Answering models. The dataset's content, scale, and provenance require verification after download.
Kaggle hosts a dataset titled 'Multimodal_skin_lesion'. The dataset likely contains data related to skin lesions, possibly including images and other data types. The author, organization, and specific details are unknown.
Instruction tuning data for large language models, sourced from Kaggle. The dataset's specific size, format, and content details are not provided in the metadata. Its primary purpose is to support the supervised fine-tuning process for aligning model outputs with human instructions.
A multimodal dataset focused on emotion recognition, published on Kaggle. The dataset likely contains data from multiple modalities such as text, audio, or images, aligned for emotion analysis. Specific details on volume, collection method, and authorship are not provided in the available metadata.
Pre-computed baseline exposures for the GPT-Neo-1.3B language model. The dataset is hosted on Kaggle and appears to contain metrics related to privacy or model behavior. The specific data format, size, and creation details are not provided in the metadata.
A dataset for emotion recognition, likely containing multiple data modalities such as text, audio, or images. It is hosted on Kaggle and may be associated with a pre-trained model. The specific volume, source, and creation date are not detailed in the available metadata.
BLIP Model is a dataset or model artifact related to the BLIP (Bootstrapping Language-Image Pre-training) framework, hosted on Kaggle. The specific content, such as pre-training data, model weights, or fine-tuning examples, is not detailed in the available metadata. Its origin and creation date are unknown.
my_vqa_dataset12 is a dataset about Visual Question Answering (VQA). It is published on Kaggle. The dataset's specific content, size, and authorship are unknown.
Universe_multimodal_cleaned is a dataset published on Kaggle. The title suggests it contains cleaned, multimodal data, likely combining multiple data types such as text, images, or audio. Specific details on its size, origin, and creation date are not provided in the available metadata.
Agri VLM Dataset is a multimodal dataset likely containing agricultural imagery paired with textual descriptions, sourced from Kaggle. The dataset's specific size, content details, and creation date are not provided in the available metadata. Its purpose appears to be for training and evaluating vision-language models on agricultural concepts.
Instruction tuning data for fine-tuning large language models on Arabic language tasks. The dataset is hosted on Kaggle, but its specific size, creation date, and authorship are not provided in the available metadata. Columns and sample data are unknown, limiting immediate assessment of its content and structure.
A dataset titled 'Multimodal_csv' is available on Kaggle. The dataset's specific content, size, and origin are not detailed in the provided metadata. Further verification is required to confirm the exact nature and composition of the multimodal elements.
A dataset titled 'ttv_sp_llava_final' published on Kaggle. The title suggests it is a final version of data related to the LLaVA (Large Language-and-Vision Assistant) model, likely containing multimodal content for vision-language tasks. Metadata is minimal; the specific content, size, and origin require verification after download.
Chart VQA likely contains images of charts and graphs paired with natural language questions and answers. The dataset is hosted on Kaggle, a platform for data science competitions and projects. Specific details on volume, creation date, and authorship are not provided in the available metadata.
AUTOPILOT VQA Heatmaps likely contains 661 files related to Visual Question Answering, a task combining computer vision and natural language processing. The dataset appears to focus on heatmap visualizations, which are often used to interpret model attention. It is published on Kaggle, but the author, creation date, and specific content details are not provided in the metadata.
Trained-vlm-config is a dataset hosted on Kaggle. The title suggests it contains configuration files or parameters for a trained Vision-Language Model. The dataset's specific contents, scale, and authorship are not detailed in the provided metadata.
A dataset for aligning large language models with human preferences. The dataset is hosted on Kaggle, but its specific size, authorship, and creation date are not provided in the metadata. The content likely contains pairs of instructions and responses with preference rankings.
ORCA RLHF is a dataset hosted on Kaggle, likely related to training large language models using reinforcement learning from human feedback. The dataset's specific content, size, and structure are not detailed in the provided metadata. Its origin and creation methodology are also unspecified.
A Kaggle-hosted dataset focused on emotions. The dataset likely contains multimodal data, such as text, audio, or images, related to emotional states. Its specific content, size, and creation details are not provided.
MedVLM-Src is a dataset published on Kaggle. The title suggests it contains source data for training or evaluating medical vision-language models. The dataset's specific content, scale, and origin require verification after download.