Sign in to view source links and access this dataset
Description
ALQAC datasets provide resources for the Automated Legal Question Answering Competition. The repository contains yearly datasets from 2023 to 2025, including laws, training, and test data, and is maintained by nguyenlab. The dataset page was last updated on 2026-04-22.
Use Cases
Training models for automated legal question answering based on the provided competition structure.
Benchmarking NLP systems on legal text comprehension using the yearly test data.
Fine-tuning language models on domain-specific legal corpora based on the included law texts.
Strengths
Contains data structured for a specific competition spanning multiple years (2023-2025).
Includes supplementary datasets such as Zalo and 2022 training data in the additional_data directory.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
nguyenlab on Hugging Face
Collection Method
Likely compiled for the ALQAC competition.
Time Range
Data spans from at least 2022 to 2025.
Freshness
Last updated 2026-04-22 02:51:39; freshness should be verified.
Geography
Geography is not specified in the provided description.
License is unknown; users should verify permissions before use.