Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MinishLab released Semhash in January 2026 to provide a framework for fast multimodal semantic deduplication and filtering. The project utilizes model2vec and vicinity-based hashing to identify near-duplicate records across text and image datasets.
Requires model2vec and vicinity libraries for implementation; released under the MIT license.