Description

Lumi Data is a training corpus for Lumi, an AI voice companion designed for elderly users with dementia and Alzheimer's disease. The dataset was built for the AMD x Lablab.ai Hackathon and contains processed conversational data. It includes 12,928 training and 1,437 validation samples in a ChatML-formatted base, plus 12,375 rewritten and 8,540 filtered training samples.

Use Cases

Training a conversational AI for elderly care based on the ChatML-formatted dialogue data.
Fine-tuning language models for empathetic and domain-specific responses based on the EQ-Matrix domain rewrite.
Developing and validating dialogue systems for users with cognitive impairments based on the filtered, format-validated subset.

Strengths

Contains a structured pipeline with three distinct processed layers: base, rewritten, and filtered data.
Provides specific sample counts for each split and layer, such as 12,928 training and 1,437 validation samples.
Data is formatted for machine learning use, specifically in ChatML format.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full dataset is unknown, which may limit suitability assessment.
The source and composition of the three public datasets used are not detailed in the provided metadata.

Provenance

Source: Processed from three public datasets, author YUGOROU.
Collection Method: Built for the AMD x Lablab.ai Hackathon; pipeline includes ChatML formatting, domain rewriting, and validation filtering.
Time Range: null
Freshness: Last updated 2026-05-07 16:41:15; freshness should be verified.
Geography: null

License is unknown; terms of use must be verified before application.

Text Audio Conversational Ai Healthcare Hackathon Ai Companion Elderly Care Dementia

Lumi Data: Training Corpus for an AI Voice Companion for Elderly Users

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info