Skip to main navigation Skip to search Skip to main content

Building sentiment lexicons for all major languages

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

167 Scopus citations

Abstract

Sentiment analysis in a multilingual world remains a challenging problem, because developing language-specific sentiment lexicons is an extremely resourceintensive process. Such lexicons remain a scarce resource for most languages. In this paper, we address this lexicon gap by building high-quality sentiment lexicons for 136 major languages. We integrate a variety of linguistic resources to produce an immense knowledge graph. By appropriately propagating from seed words, we construct sentiment lexicons for each component language of our graph. Our lexicons have a polarity agreement of 95.7% with published lexicons, while achieving an overall coverage of 45.2%. We demonstrate the performance of our lexicons in an extrinsic analysis of 2,000 distinct historical figures' Wikipedia articles on 30 languages. Despite cultural difference and the intended neutrality of Wikipedia articles, our lexicons show an average sentiment correlation of 0.28 across all language pairs.

Original languageEnglish
Title of host publicationLong Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages383-389
Number of pages7
ISBN (Print)9781937284732
DOIs
StatePublished - 2014
Event52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Baltimore, MD, United States
Duration: Jun 22 2014Jun 27 2014

Publication series

Name52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference
Volume2

Conference

Conference52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014
Country/TerritoryUnited States
CityBaltimore, MD
Period06/22/1406/27/14

Fingerprint

Dive into the research topics of 'Building sentiment lexicons for all major languages'. Together they form a unique fingerprint.

Cite this