Nuclear-SlimORCA-Dedup: Deduplicated Text Dataset

Available on 1 platform

Sign in to view source links and access this dataset

Description

Nuclear-slimorca-dedup is a dataset published on Kaggle. Its title suggests a focus on nuclear topics and a deduplication process, likely applied to a text corpus. The dataset's specific content, size, and origin are not detailed in the available metadata.

Use Cases

Train a language model on deduplicated nuclear science text (inferred from domain, verify after download)
Benchmark text deduplication algorithms (inferred from domain, verify after download)
Analyze thematic content within a curated corpus (inferred from domain, verify after download)

Strengths

Published on Kaggle

Limitations

Metadata is minimal; actual content requires verification after download
Column-level documentation is absent; field semantics must be inferred after download

Provenance

Source: Kaggle

Text Nuclear Deduplication Orca

Related Datasets

Quality Score

D16

Description

8

Source

17

Reputation

18

Access

31

Community

0 views

Dataset Info

Last synced: May 28, 2026

Access

31

Community

0 views

Dataset Info

Last synced: May 28, 2026

Nuclear-SlimORCA-Dedup: Deduplicated Text Dataset

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info