Abstract
We propose a tone recognition approach that employs linear-chain Conditional Random Fields (CRF) to model tone variation due to intonation effects. We implement three linear-chain CRFs which aim at modeling intonation effects at phrasesentence-and story-level boundaries, where we show that standard recognition techniques degrade and common normalization approaches do not improve. We show that all linear-chain CRFs outperform the baseline unigram model, and the biggest improvement is found in recognizing 3rd tones, (4%) in overall accuracy. In particular, Phrase Bigram CRFs show a drastic 39% improvement in recognizing 3rd tones located at initial boundaries. This improvement shows that the position specific modeling of initial tones in bigram CRFs captures the intonation effects better than the baseline unigram model.
| Original language | English |
|---|---|
| Pages (from-to) | 2289-2292 |
| Number of pages | 4 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| State | Published - 2011 |
| Event | 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy Duration: Aug 27 2011 → Aug 31 2011 |
Keywords
- Broad context
- Conditional random fields
- Prosody
- Tone recognition
Fingerprint
Dive into the research topics of 'Modeling broad context for tone recognition with Conditional Random Fields'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver