Skip to main navigation Skip to search Skip to main content

Freshman or fresher? Quantifying the geographic variation of language in online social media

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

42 Scopus citations

Abstract

In this paper we present a new computational technique to detect and analyze statistically significant geographic variation in language. While previous approaches have primarily focused on lexical variation between regions, our method identifies words that demonstrate semantic and syntactic variation as well. We extend recently developed techniques for neural language models to learn word representations which capture differing semantics across geographical regions. In order to quantify this variation and ensure robust detection of true regional differences, we formulate a null model to determine whether observed changes are statistically significant. Our method is the first such approach to explicitly account for random variation due to chance while detecting regional variation in word meaning. To validate our model, we study and analyze two different massive online data sets: millions of tweets from Twitter as well as millions of phrases contained in the Google Book Ngrams. Our analysis reveals interesting facets of language change across countries.

Original languageEnglish
Title of host publicationProceedings of the International AAAI Conference on Web and Social Media, ICWSM 2016
PublisherAAAI Press
Pages615-618
Number of pages4
Edition1
ISBN (Electronic)9781577357582
StatePublished - 2016
Event10th International AAAI Conference on Web and Social Media, ICWSM 2016 - Cologne, Germany
Duration: May 17 2016May 20 2016

Publication series

NameProceedings of the 10th International Conference on Web and Social Media, ICWSM 2016
Number1
Volume10

Conference

Conference10th International AAAI Conference on Web and Social Media, ICWSM 2016
Country/TerritoryGermany
CityCologne
Period05/17/1605/20/16

Fingerprint

Dive into the research topics of 'Freshman or fresher? Quantifying the geographic variation of language in online social media'. Together they form a unique fingerprint.

Cite this