Loghub provides raw system logs from three major domains: HDFS, BGL, and OpenStack. The collection is intended for research into parser-free anomaly detection methods. The dataset's author, organization, and specific size are not detailed in the provided metadata.
Use Cases
- Developing unsupervised anomaly detection models based on raw log text.
- Benchmarking parser-free log analysis techniques across different system types.
- Training machine learning models to identify failures in distributed systems like HDFS and OpenStack.
- Researching cross-domain log representation learning for transfer learning scenarios.
Strengths
- Focuses on a specific research problem: parser-free anomaly detection.
- Aggregates logs from three distinct and widely-used system domains.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and file size are unknown, which may limit suitability assessment.
Provenance
- Source
- Kaggle
- Collection Method
- Likely aggregated from publicly available log sources for research purposes.
- Time Range
- null
- Freshness
- Last update date is unknown; freshness unverified.
- Geography
- null