Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
NarraDolma provides a large-scale narrative characterization of the Dolma pretraining corpus. It contains approximately 3 million passages drawn from about 785,000 unique documents across all 12 Dolma sub-corpora, each labeled with a fine-grained narrative feature vector. The dataset was created by teagrjohnson and is intended as a resource for studying how narrative qualities are distributed in web-scale data.
License is unknown, which may restrict usage.