11 categories of confidential named entities are annotated for extraction from Japanese text. The dataset is designed for Supervised Fine-Tuning (SFT) of LFM2-family models, such as with LoRA, and was created by author akiFQC. It was last updated on 2026-06-06.
Use Cases
- Fine-tune language models for named entity recognition based on the described 11 confidential entity categories.
- Develop systems for redacting or classifying sensitive information from Japanese business documents.
- Benchmark model performance on extracting specific entity types like email addresses and account identifiers from Japanese text.
Strengths
- Defines a specific task of extracting 11 categories of confidential entities from Japanese text.
- Designed for a concrete application: Supervised Fine-Tuning (SFT) of LFM2-family models.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
Provenance
- Source
- huggingface
- Freshness
- Last updated 2026-06-06 23:48:28; freshness should be verified.