Sign in to view source links and access this dataset
Description
A controlled ambiguity benchmark for exact CFG/PCFG parsing built for the paper 'Symplectic Inside--Outside Atlas for Ambiguous Grammars'. The dataset is intended for chart-level diagnostics, ambiguity stress tests, weighted deduction experiments, and parser-evaluation studies, illustrating that parse count alone does not determine ambiguity geometry. It was created by Lightcap and was last updated on June 15, 2026.
Use Cases
Conducting chart-level diagnostics for parsing algorithms based on the described ambiguity geometry.
Performing ambiguity stress tests for parsers based on the controlled benchmark examples.
Running weighted deduction experiments for probabilistic context-free grammars (PCFGs) using the provided data.
Evaluating parser performance in studies based on the dataset's focus on parse count versus ambiguity concentration.
Strengths
Designed for a specific, peer-reviewed research paper ('Symplectic Inside--Outside Atlas for Ambiguous Grammars'), suggesting a clear academic purpose.
Focuses on a controlled benchmark for a precise task (exact CFG/PCFG parsing), which likely ensures consistency for experiments.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment for large-scale experiments.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Lightcap (author), via Hugging Face.
Collection Method
Created as a benchmark for a specific research paper; exact collection method is not detailed.
Freshness
Last updated 2026-06-15 21:54:27; freshness should be verified.
License is unknown; users should verify terms of use before downloading.