Vol. 4 No. 1 (2025)
Articles

Three-Step Parsing in Kanien’kéha

Max Blackburn
McGill University

Published 2025-04-11

Abstract

Morphological parsing is the task of transforming a surface linguistic form into its underlying morphological components. Parsing is an essential part of studying morphologically complex languages, where surface words can obscure the underlying linguistic structure. Many such languages are endangered and/or under-resourced, which restricts the usage of the more common data-driven methods in NLP that could automate this task. As a result, most parsing programs are implemented using symbolic finite-state machines. One of the most common architectures for finite-state models of morphology uses a two-step process, which first defines grammatical morpheme sequences, then applies a series of morphophonological rules to derive a surface form (Beemer et al., 2020; Koskenniemi, 1986). I propose a three-step architecture that divides morphophonological and phonological alternations. I claim that this addition constrains the power of the model in a way that mirrors predictions of theoretical linguistics on the structure of morphophonology. As a demonstration of these claims, I implement a three-step parser for verbs in Kanien’kéha, a morphologically complex and highly endangered language. I argue that the results demonstrate that this constrained architecture is still powerful enough to model the language and describe some theoretical findings of the structure of the language. I also further discuss the theoretical and practical questions raised by the choice of model architecture.