Hive is a dataset for data-efficient query-based universal sound separation, developed by researchers from Tsinghua University, Shanda AI, and Johns Hopkins University. The dataset is categorized as containing between 1 million and 10 million audio samples, as indicated by its size tag, and is designed for audio-to-audio tasks.
Use Cases
- Train models for universal sound separation using semantically consistent audio queries.
- Benchmark query-based audio separation algorithms on a dataset tagged for audio modality.
- Develop data-efficient learning methods for sound separation using the dataset's structured audio mixtures.
- Analyze separation performance across different semantic categories inherent in the query design.
Strengths
- Dataset scale is categorized as between 1 million and 10 million samples.
- Designed for a specific, advanced audio task: query-based universal sound separation.
- Associated with a 2026 arXiv paper, indicating recent research activity.
Limitations
- Specific details on audio sample count, duration, format, and licensing are not provided in the input.
- The dataset's geographic or demographic representativeness is unknown, potentially limiting generalizability.
Provenance
- Source
- ShandaAI, Tsinghua University, Johns Hopkins University.
- Collection Method
- Created for research on semantically consistent, query-based sound separation; details unspecified.
- Time Range
- null
- Freshness
- Last updated in February 2026.
- Geography
- Region tag indicates 'us', but specific coverage is unknown.