Sign in to view source links and access this dataset
Description
Soft labels generated by the cross-encoder/nli-deberta-v3-small model on the combined SNLI and MultiNLI datasets. The dataset is intended for knowledge distillation into smaller, more efficient Natural Language Inference models. Each JSONL record contains a premise, hypothesis, hard label, and a probability distribution for entailment, neutral, and contradiction.
Use Cases
Knowledge distillation for NLI models based on the provided soft label probability distributions.
Training student models to mimic the behavior of a larger teacher model based on the cross-encoder-generated labels.
Analyzing model confidence and prediction uncertainty in NLI tasks using the soft label scores.
Fine-tuning smaller, more efficient models for deployment using the SNLI and MultiNLI-derived training signal.
Strengths
Labels are generated by a specific, named model (cross-encoder/nli-deberta-v3-small).
Source data is derived from two established NLI benchmarks: SNLI and MultiNLI.
Label order for the soft_labels array is explicitly defined in the description.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Provenance
Source
Animised on Hugging Face.
Collection Method
Soft labels generated by applying the cross-encoder/nli-deberta-v3-small model to the SNLI and MultiNLI datasets.
Freshness
Last updated 2026-06-05 04:50:29; freshness should be verified.
Data is in JSONL format; users must parse it accordingly. The license is unknown and should be verified before use.