Document Review Data for Office and PDF Title Extraction

Name: Document Review Data for Office and PDF Title Extraction
Creator: mannycooper
Published: 2026-05-27T07:51:47
Keywords: Title Extraction, Text, Document Processing, Office Documents

by mannycooperUpdated 18d ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

4,431 document samples, including 4,000 source-only samples with fact and title labels, form a private dataset for a review application. The dataset was created by mannycooper and was last updated on June 2, 2026. Its primary purpose is to support the development of an Office and PDF title extraction tool.

Use Cases

Training a named entity recognition model to identify document titles based on labeled facts and titles.
Evaluating the performance of automated title extraction algorithms for Office and PDF documents.
Building a review application to manually verify or correct machine-generated document metadata.

Strengths

Contains 4,000 source-only samples with explicit fact and title labels, providing a foundation for supervised learning.
The dataset is specifically curated for a targeted application in document title extraction.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count beyond the provided sample numbers is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Private dataset for a specific application; collection method not specified.
Freshness: Last updated 2026-06-02 04:44:24; freshness should be verified.

The dataset is marked as private with a note not to make it public without clearing source files; users should verify license and usage terms.

Text Title Extraction Document Processing Office Documents

Related Datasets

Quality Score

D33

Description

24

Source

36

Reputation

49

Access

26

Community

11.3K downloads

1 likes

0 views

Dataset Info

Author: mannycooper
Created: May 27, 2026
Updated: Jun 2, 2026
Last synced: Jun 8, 2026

Access

26

Community

11.3K downloads

1 likes

0 views

Dataset Info

Author: mannycooper
Created: May 27, 2026
Updated: Jun 2, 2026
Last synced: Jun 8, 2026

Document Review Data for Office and PDF Title Extraction

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info