JAVEdit-100k: A Text Corpus for Natural Language Processing

Name: JAVEdit-100k: A Text Corpus for Natural Language Processing
Creator: Coraxor
Published: 2026-05-07T07:09:02
Keywords: Text, Text, Natural Language Processing, Javedit, Text Corpus

by CoraxorUpdated 5d ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

JAVEdit-100k is an official dataset hosted on HuggingFace. It was created by researchers from Zhejiang University, Tencent YouTu Lab, and other institutions. The dataset was last updated on June 7, 2026.

Use Cases

Train language models based on the JAVEdit text corpus.
Benchmark text generation or editing tasks using the JAVEdit dataset.
Conduct linguistic analysis on the structure and content of the JAVEdit corpus.

Strengths

Dataset is designated as 'official' by its creators.
Involves researchers from multiple academic and industry institutions including Zhejiang University and Tencent.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: Coraxor on HuggingFace.
Freshness: Last updated 2026-06-07 15:04:33; freshness should be verified.

Text Natural Language Processing Javedit Text Corpus

Related Datasets

Quality Score

D39

Description

42

Source

36

Reputation

50

Access

26

Community

435 downloads

10 likes

0 views

Dataset Info

Author: Coraxor
Created: May 7, 2026
Updated: Jun 7, 2026
Last synced: Jun 13, 2026

Access

26

Community

435 downloads

10 likes

0 views

Dataset Info

Author: Coraxor
Created: May 7, 2026
Updated: Jun 7, 2026
Last synced: Jun 13, 2026

JAVEdit-100k: A Text Corpus for Natural Language Processing

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info