A trained spaCy part-of-speech tagging pipeline for Older Scots from the sixteenth century. The model was trained on pre-tagged data from Bushnell (2021) and includes Python scripts for application to a 1-million-word historical corpus. It was authored by Beattie, Beth and hosted on Harvard Dataverse.
Use Cases
- Part-of-speech tagging of Older Scots texts based on the CLAWS7 tagset mentioned in the description.
- Applying a pre-trained NLP model to a historical corpus using the provided Python scripts.
- Linguistic analysis of sixteenth-century language features based on the annotated model.
Strengths
- Model trained on pre-tagged data from a 2021 source.
- Includes Python scripts for applying the model to a 1-million-word corpus.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- Harvard Dataverse
- Collection Method
- Trained on pre-tagged data from Bushnell (2021).
- Time Range
- Sixteenth century
- Freshness
- Last updated 2026-06-25 11:46:21; freshness should be verified.