Multimodal video and audio recordings categorized into single-label violence classes. This dataset provides synchronized visual and auditory data streams to support the development of automated violence detection models.
Use Cases
- Train a multimodal neural network using the audio and video features to identify violent events
- Benchmark the performance of single-modality versus multimodal models using the single-label annotations
- Develop fusion layers that integrate visual and auditory signals for single-label event recognition
Strengths
- Includes both visual ('Look') and auditory ('Listen') data streams for every instance
- Utilizes a single-label annotation format for streamlined violence classification
- Focuses on the specific domain of multimodal learning for aggressive event detection