Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
148 English-centric and 1,465 non-English-centric language pairs of parallel text mined by Meta AI using the stopes library and LASER3 encoders. The complete dataset is approximately 450GB in size and was released by AllenAI in 2022. It is based on metadata for mined bitext and supersedes previous CCMatrix versions.
License is unknown and must be verified before use.