39,260 English sentences from broadcast conversations, newswire, weblogs, and web forums are paired with Abstract Meaning Representation (AMR) graphs. This semantic treebank was developed by the Linguistic Data Consortium, SDL/Language Weaver, the University of Colorado, and the University of Southern California's Information Sciences Institute. AMR graphs represent whole-sentence meaning using PropBank frames, semantic roles, coreference, named entities, modality, and negation.
Use Cases
- Train semantic parsers based on the AMR graph annotations.
- Benchmark models for within-sentence coreference resolution using the annotated coreference links.
- Develop named entity recognition systems leveraging the named entity annotations.
- Study the representation of modality and negation in semantic graphs.
- Evaluate model performance across different text genres like broadcast news and discussion forums.
Strengths
- Over 39,260 annotated sentences provide a substantial resource for training and evaluation.
- Includes data from multiple genres (broadcast, newswire, weblogs, forums) which may support genre robustness.
- Annotation utilizes a detailed framework covering PropBank, coreference, named entities, and modality.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Last update date is unknown; freshness unverified.
- Data may reflect geographic or source bias inherent to the specific DARPA programs and news sources used.
Provenance
- Source
- Linguistic Data Consortium (LDC), SDL/Language Weaver, University of Colorado, University of Southern California Information Sciences Institute.
- Collection Method
- Manual annotation of sentences from specified sources to create AMR graphs.