DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Nq Item Id Llm Compressed Hardneg 3Shot V4 | DataSalon

Home Media & CommunicationNq Item Id Llm Compressed Hardneg 3Shot V4

Media & Communication

Nq Item Id Llm Compressed Hardneg 3Shot V4

Name: Nq Item Id Llm Compressed Hardneg 3Shot V4
Creator: Lala8383
Published: 2026-05-11T10:38:40
Keywords: Question Answering, Conversational Data, Text, Llm Training, Natural Language Processing

by Lala8383·Updated 1mo ago

Available on 1 platform

Description

417,748 examples of tokenized conversational text intended for training large language models, as indicated by the description. The dataset was created by author Lala8383 and last updated on Hugging Face in May 2026. A sample analysis of 1,000 examples shows an average of 892.62 tokens per example, with a minimum of 539 and a maximum of 1,772 tokens.

Use Cases

Fine-tuning language models for question answering based on the described 'hard negative' and '3-shot' training structure.
Benchmarking tokenization strategies for long-context LLM inputs based on the provided token statistics.
Training retrieval systems for open-domain QA using the implied 'item id' and 'hardneg' components mentioned in the title.

Strengths

A substantial collection of 417,748 training examples.
Detailed token statistics are provided, including average (892.62), min (539), and max (1,772) tokens per example.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Freshness should be verified as the last update date is in the future (2026-05-11).

Provenance

Source: Hugging Face user Lala8383.
Collection Method: Likely generated or processed by a large language model, as suggested by the title and description.
Freshness: Last updated 2026-05-11 10:39:08

License is unknown; users should verify usage rights before download.

Text Question Answering Conversational Data Llm Training Natural Language Processing

Related Datasets

Quality Score

C42

Description

Source

Reputation

Quality Score

C42

Description

Source

Reputation

Access

Community

1 likes

0 views

Dataset Info

Author: Lala8383
Created: May 11, 2026
Updated: May 11, 2026
Last synced: May 18, 2026

Access

Community

1 likes

0 views

Dataset Info

Author: Lala8383
Created: May 11, 2026
Updated: May 11, 2026
Last synced: May 18, 2026

Nq Item Id Llm Compressed Hardneg 3Shot V4

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info