DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Datacomp Small Clip: CLIP Embeddings for the DataComp Small Benchmark | DataSalon

Home Multimodal & LLMDatacomp Small Clip: CLIP Embeddings for the DataComp Small Benchmark

Multimodal & LLM

Datacomp Small Clip: CLIP Embeddings for the DataComp Small Benchmark

Name: Datacomp Small Clip: CLIP Embeddings for the DataComp Small Benchmark
Creator: fondant-ai
Published: 2024-03-05T11:27:40
Keywords: Librarypolars, Task Categoriesimage To Text, Image, Librarydask, Size Categories10 Mn100 M, Modalitytext, Librarymlcroissant, Modalityimage, Librarydatasets, Licensecc By 40, Parquet, Regionus, Video, Task Categoriesimage To Image, FAISS, Embeddings

by fondant-ai·Updated 2y ago

Available on 1 platform

Description

12.8 million image URLs and their corresponding CLIP embeddings derived from the datacomp_small benchmark. The dataset is processed via the Fondant framework to provide a production-ready format for multimodal machine learning tasks without requiring raw image storage.

Use Cases

Build image retrieval systems by indexing the CLIP embeddings for vector search.
Analyze dataset distribution and identify outliers using the embedding vectors.
Train lightweight linear probes for image classification using the pre-extracted CLIP features.

Strengths

Contains 12.8 million rows of image URLs and high-dimensional CLIP embeddings.
Based on the datacomp_small subset of the DataComp benchmark for multimodal learning.
Processed and formatted using the Fondant framework for streamlined data engineering and sharing.

Image Video Parquet Librarypolars Task Categoriesimage To Text Librarydask Size Categories10 Mn100 M Modalitytext Librarymlcroissant Modalityimage Librarydatasets Licensecc By 40 Regionus Task Categoriesimage To Image FAISS Embeddings

Related Datasets

Quality Score

D34

Description

Source

Reputation

Quality Score

D34

Description

Source

Reputation

Access

Community

680 downloads

13 likes

0 views

Dataset Info

Author: fondant-ai
Created: Mar 5, 2024
Updated: Mar 7, 2024
Last synced: May 28, 2026

Access

Community

680 downloads

13 likes

0 views

Dataset Info

Author: fondant-ai
Created: Mar 5, 2024
Updated: Mar 7, 2024
Last synced: May 28, 2026

Datacomp Small Clip: CLIP Embeddings for the DataComp Small Benchmark

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info