TY - GEN
T1 - Improving word segmentation by simultaneously learning phonotactics
AU - Blanchard, Daniel
AU - Heinz, Jeffrey
PY - 2008
Y1 - 2008
N2 - The most accurate unsupervised word segmentation systems that are currently available (Brent, 1999; Venkataraman, 2001; Goldwater, 2007) use a simple unigram model of phonotactics. While this simplifies some of the calculations, it overlooks cues that infant language acquisition researchers have shown to be useful for segmentation (Mattys et al., 1999; Mattys and Jusczyk, 2001). Here we explore the utility of using bigram and trigram phonotactic models by enhancing Brent's (1999) MBDP-1 algorithm. The results show the improved MBDP-Phon model outperforms other unsupervised word segmentation systems (e.g., Brent, 1999; Venkataraman, 2001; Goldwater, 2007).
AB - The most accurate unsupervised word segmentation systems that are currently available (Brent, 1999; Venkataraman, 2001; Goldwater, 2007) use a simple unigram model of phonotactics. While this simplifies some of the calculations, it overlooks cues that infant language acquisition researchers have shown to be useful for segmentation (Mattys et al., 1999; Mattys and Jusczyk, 2001). Here we explore the utility of using bigram and trigram phonotactic models by enhancing Brent's (1999) MBDP-1 algorithm. The results show the improved MBDP-Phon model outperforms other unsupervised word segmentation systems (e.g., Brent, 1999; Venkataraman, 2001; Goldwater, 2007).
UR - https://www.scopus.com/pages/publications/77952548445
U2 - 10.3115/1596324.1596336
DO - 10.3115/1596324.1596336
M3 - Conference contribution
AN - SCOPUS:77952548445
SN - 1905593481
SN - 9781905593484
T3 - CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning
SP - 65
EP - 72
BT - CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning
PB - Association for Computational Linguistics (ACL)
T2 - 12th Conference on Computational Natural Language Learning, CoNLL 2008
Y2 - 16 August 2008 through 17 August 2008
ER -