Sign in to view source links and access this dataset
Description
Lumi Data is a training corpus for Lumi, an AI voice companion designed for elderly users with dementia and Alzheimer's disease. The dataset was built for the AMD x Lablab.ai Hackathon and contains processed conversational data. It includes 12,928 training and 1,437 validation samples in a ChatML-formatted base, plus 12,375 rewritten and 8,540 filtered training samples.
Use Cases
Training a conversational AI for elderly care based on the ChatML-formatted dialogue data.
Fine-tuning language models for empathetic and domain-specific responses based on the EQ-Matrix domain rewrite.
Developing and validating dialogue systems for users with cognitive impairments based on the filtered, format-validated subset.
Strengths
Contains a structured pipeline with three distinct processed layers: base, rewritten, and filtered data.
Provides specific sample counts for each split and layer, such as 12,928 training and 1,437 validation samples.
Data is formatted for machine learning use, specifically in ChatML format.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full dataset is unknown, which may limit suitability assessment.
The source and composition of the three public datasets used are not detailed in the provided metadata.
Provenance
Source
Processed from three public datasets, author YUGOROU.
Collection Method
Built for the AMD x Lablab.ai Hackathon; pipeline includes ChatML formatting, domain rewriting, and validation filtering.
Time Range
null
Freshness
Last updated 2026-05-07 16:41:15; freshness should be verified.
Geography
null
License is unknown; terms of use must be verified before application.