Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Over 1.5 billion rows of educational web passages filtered from the FineWeb dataset. Each passage includes cleaned text, metadata, and 384-dimensional text embeddings. The dataset is provided by lance-format in the Lance format, optimized for retrieval-heavy AI workloads.
Data is stored in the Lance format, which may require specific tools for access.