Sign in to view source links and access this dataset
Description
Omnibook is a collection of general culture texts sourced from Project Gutenberg. It includes complete works, historical scientific texts, philosophy, political thought, poetry, drama, and essays. The dataset was created by ThingAI and was last updated on 2026-05-11.
Use Cases
Train language models on historical scientific texts based on the described content.
Analyze stylistic patterns in poetry and drama from the included literary works.
Study the evolution of philosophical and political thought using the collection of essays and texts.
Strengths
Texts are manually proofread by Project Gutenberg, indicating high fidelity and no OCR noise.
Includes deduplication of near-identical editions to reduce redundancy.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Freshness should be verified as the last update timestamp is from the future (2026-05-11).
Provenance
Source
Project Gutenberg
Collection Method
Texts are unmodified from original sources except for removal of boilerplate headers/footers and deduplication.
Freshness
Last updated 2026-05-11 17:12:16
License information is unknown and should be verified before use.