Skip to main navigation Skip to search Skip to main content

Improving word segmentation by simultaneously learning phonotactics

  • University of Delaware

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

The most accurate unsupervised word segmentation systems that are currently available (Brent, 1999; Venkataraman, 2001; Goldwater, 2007) use a simple unigram model of phonotactics. While this simplifies some of the calculations, it overlooks cues that infant language acquisition researchers have shown to be useful for segmentation (Mattys et al., 1999; Mattys and Jusczyk, 2001). Here we explore the utility of using bigram and trigram phonotactic models by enhancing Brent's (1999) MBDP-1 algorithm. The results show the improved MBDP-Phon model outperforms other unsupervised word segmentation systems (e.g., Brent, 1999; Venkataraman, 2001; Goldwater, 2007).

Original languageEnglish
Title of host publicationCoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning
PublisherAssociation for Computational Linguistics (ACL)
Pages65-72
Number of pages8
ISBN (Print)1905593481, 9781905593484
DOIs
StatePublished - 2008
Event12th Conference on Computational Natural Language Learning, CoNLL 2008 - Manchester, United Kingdom
Duration: Aug 16 2008Aug 17 2008

Publication series

NameCoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning

Conference

Conference12th Conference on Computational Natural Language Learning, CoNLL 2008
Country/TerritoryUnited Kingdom
CityManchester
Period08/16/0808/17/08

Fingerprint

Dive into the research topics of 'Improving word segmentation by simultaneously learning phonotactics'. Together they form a unique fingerprint.

Cite this