Description

Soft labels generated by the cross-encoder/nli-deberta-v3-small model on the combined SNLI and MultiNLI datasets. The dataset is intended for knowledge distillation into smaller, more efficient Natural Language Inference models. Each JSONL record contains a premise, hypothesis, hard label, and a probability distribution for entailment, neutral, and contradiction.

Use Cases

Knowledge distillation for NLI models based on the provided soft label probability distributions.
Training student models to mimic the behavior of a larger teacher model based on the cross-encoder-generated labels.
Analyzing model confidence and prediction uncertainty in NLI tasks using the soft label scores.
Fine-tuning smaller, more efficient models for deployment using the SNLI and MultiNLI-derived training signal.

Strengths

Labels are generated by a specific, named model (cross-encoder/nli-deberta-v3-small).
Source data is derived from two established NLI benchmarks: SNLI and MultiNLI.
Label order for the soft_labels array is explicitly defined in the description.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.

Provenance

Source: Animised on Hugging Face.
Collection Method: Soft labels generated by applying the cross-encoder/nli-deberta-v3-small model to the SNLI and MultiNLI datasets.
Freshness: Last updated 2026-06-05 04:50:29; freshness should be verified.

Data is in JSONL format; users must parse it accordingly. The license is unknown and should be verified before use.

Text Natural Language Inference Nlp Training Soft Labels Knowledge Distillation Synthetic

NLI Soft Labels Final: Soft Labels for Natural Language Inference Distillation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info