Skip to main navigation Skip to search Skip to main content

Learning Cross-Dialectal Morphophonology with Syllable Structure Constraints

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We investigate learning surface forms from underlying morphological forms for low-resource language varieties. We concentrate on learning explicit rules with the aid of learned syllable structure constraints, which outperforms neural methods on this small data task and provides interpretable output. Evaluating across one relatively high-resource and two related low-resource Arabic dialects, we find that a model trained only on the high-resource dialect achieves decent performance on the low-resource dialects, useful when no low-resource training data is available. The best results are obtained when our system is trained only on the low-resource dialect data without augmentation from the related higher-resource dialect. We discuss the impact of syllable structure constraints and the strengths and weaknesses of data augmentation and transfer learning from a related dialect.

Original languageEnglish
Title of host publicationVarDial 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects, Proceedings of the Workshop
EditorsYves Scherrer, Tommi Jauhiainen, Nikola Ljubesic, Preslav Nakov, Jorg Tiedemann, Marcos Zampieri
PublisherAssociation for Computational Linguistics (ACL)
Pages157-167
Number of pages11
ISBN (Electronic)9798891762084
StatePublished - 2025
Event12th Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2025 - co-located with the 31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, United Arab Emirates
Duration: Jan 19 2025 → …

Publication series

NameVarDial 2025 - 12th Workshop on NLP for Similar Languages, Varieties and Dialects, Proceedings of the Workshop

Conference

Conference12th Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial 2025 - co-located with the 31st International Conference on Computational Linguistics, COLING 2025
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period01/19/25 → …

Fingerprint

Dive into the research topics of 'Learning Cross-Dialectal Morphophonology with Syllable Structure Constraints'. Together they form a unique fingerprint.

Cite this