Sign in to view source links and access this dataset
Description
25,000 synthetic examples in OpenAI-compatible chat format designed for supervised fine-tuning. Created by WithinUsAI in May 2026, this dataset aims to help LLMs mirror the behavior, style, tone, and capabilities of Microsoft Copilot.
Use Cases
Supervised Fine-Tuning (SFT) of LLMs based on the described OpenAI-compatible chat format.
Alignment training to mimic a specific AI assistant's behavior and tone as described.
Benchmarking model responses against a synthetic reference style.
Generating training data for conversational AI agents based on the structured message arrays.
Strengths
25,000 examples provide a substantial volume for training.
Structured in a standardized OpenAI-compatible chat format with system, user, and assistant messages.
Explicitly designed for a specific purpose: fine-tuning LLMs to mirror Microsoft Copilot.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data is synthetic, which may not fully capture the nuances of real-world interactions.
Provenance
Source
WithinUsAI
Collection Method
Synthetic generation, likely distilled from interactions with or outputs of Microsoft Copilot.
Freshness
Last updated 2026-05-14 17:26:08; freshness should be verified.
License is listed as MIT in the description but 'unknown' in the input metadata; verification is recommended.