Sign in to view source links and access this dataset
Description
126,380 story metadata records and 525,650 chapter text entries from the adult-fanfiction.org public archive. The dataset, organized into four parquet tables, includes 22 archive subdomains and 16,597 listing pages. It was created by author trentmkley and last updated on 2026-05-15.
Use Cases
Train text generation models based on a large corpus of narrative prose.
Analyze metadata patterns and story length distributions across 22 archive subdomains.
Study fanfiction tropes and themes using the story metadata and chapter text.
Conduct computational literary analysis on a large-scale collection of informal writing.
Strengths
Contains 525,650 individual text chapters, providing substantial raw material for NLP tasks.
Organizes data into four distinct tables (archives, pages, stories, chapters) for structured analysis.
Includes metadata for 126,380 stories, allowing for quantitative study of story attributes.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Public adult-fanfiction.org archive subdomains.
Collection Method
Likely gathered via web crawling.
Freshness
Last updated 2026-05-15 01:28:11; freshness should be verified.