Description

The Hebrew Bible (Tanakh) is represented in this structured, quantitative dataset extracted from the Leningrad Codex. It provides numerical data points, such as word frequencies and verse metrics, transformed into a machine-readable CSV format for computational analysis. The dataset was created by Guy Shaked of TwoHillsLab Dataverse and was last updated on April 10, 2026.

Use Cases

Perform stylometric authorship analysis based on word frequency patterns.
Conduct statistical analysis of textual structure based on verse and chapter metrics.
Train models for computational linguistics tasks based on the machine-readable text representation.
Compare linguistic features across biblical books based on systematic character and word counts.

Strengths

Data is derived from the authoritative Leningrad Codex source.
Covers the complete Tanakh, providing full textual coverage.
Data is structured in a portable, machine-readable CSV format (UTF-8).

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and file size are unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Codex Leningradensis (Leningrad Codex)
Collection Method: Systematic extraction of numerical data points from the source text.
Time Range: Covers the complete Tanakh (Hebrew Bible).
Freshness: Last updated 2026-04-10 16:03:22; freshness should be verified.
Geography: Ancient Near Eastern religious texts.

License is unknown; terms of use must be verified before application.

Tabular Computational Linguistics Text Analysis Stylometry Religious Texts Hebrew Bible

Hebrew Bible Numerical Data Based on the Leningrad Codex

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info