Skip to main navigation Skip to search Skip to main content

The remarkable benefit of user-level aggregation for lexical-based population-level predictions

  • Salvatore Giorgi
  • , Daniel Preoţiuc-Pietro
  • , Anneke Buffone
  • , Daniel Rieman
  • , Lyle H. Ungar
  • , H. Andrew Schwartz
  • University of Pennsylvania

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

30 Scopus citations

Abstract

Nowcasting based on social media text promises to provide unobtrusive and near real-time predictions of community-level outcomes. These outcomes are typically regarding people, but the data is often aggregated without regard to users in the Twitter populations of each community. This paper describes a simple yet effective method for building community-level models using Twitter language aggregated by user. Results on four different U.S. county-level tasks, spanning demographic, health, and psychological outcomes show large and consistent improvements in prediction accuracies (e.g. from Pearson r = .73 to .82 for median income prediction or r = .37 to .47 for life satisfaction prediction) over the standard approach of aggregating all tweets. We make our aggregated and anonymized community-level data, derived from 37 billion tweets - over 1 billion of which were mapped to counties, available for research.

Original languageEnglish
Title of host publicationProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
EditorsEllen Riloff, David Chiang, Julia Hockenmaier, Jun'ichi Tsujii
PublisherAssociation for Computational Linguistics
Pages1167-1172
Number of pages6
ISBN (Electronic)9781948087841
StatePublished - 2018
Event2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - Brussels, Belgium
Duration: Oct 31 2018Nov 4 2018

Publication series

NameProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018

Conference

Conference2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
Country/TerritoryBelgium
CityBrussels
Period10/31/1811/4/18

Fingerprint

Dive into the research topics of 'The remarkable benefit of user-level aggregation for lexical-based population-level predictions'. Together they form a unique fingerprint.

Cite this