Skip to main navigation Skip to search Skip to main content

Using social media to track geographic variability in language about diabetes: Infodemiology analysis

  • Heather Griffis
  • , David A. Asch
  • , H. Andrew Schwartz
  • , Lyle Ungar
  • , Alison M. Buttenheim
  • , Frances K. Barg
  • , Nandita Mitra
  • , Raina M. Merchant
  • University of Pennsylvania
  • University of Pennsylvania

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Background: Social media posts about diabetes could reveal patients' knowledge, attitudes, and beliefs as well as approaches for better targeting of public health messages and care management. Objective: This study aimed to characterize the language of Twitter users' posts regarding diabetes and describe the correlation of themes with the county-level prevalence of diabetes. Methods: A retrospective study of diabetes-related tweets identified from a random sample of approximately 37 billion tweets from the United States from 2009 to 2015 was conducted. We extracted diabetes-specific tweets and used machine learning to identify statistically significant topics of related terms. Topics were combined into themes and compared with the prevalence of diabetes by US counties and further compared with geography (US Census Divisions). Pearson correlation coefficients are reported for each topic and relationship with prevalence. Results: A total of 239,989 tweets from 121,494 unique users included the term diabetes. The themes emerging from the topics included unhealthy food and drink, treatment, symptoms/diagnoses, risk factors, research, recipes, news, health care, management, fundraising, diet, communication, and supplements/remedies. The theme of unhealthy foods most positively correlated with geographic areas with high prevalence of diabetes (r=0.088), whereas tweets related to research most negatively correlated (r=−0.162) with disease prevalence. Themes and topics about diabetes differed in overall frequency across the US geographical divisions, with the East South Central and South Atlantic states having a higher frequency of topics referencing unhealthy food (r range=0.073-0.146; P<.001). Conclusions: Diabetes-related tweets originating from counties with high prevalence of diabetes have different themes than tweets originating from counties with low prevalence of diabetes. Interventions could be informed from this variation to promote healthy behaviors.

Original languageEnglish
Article number14431
JournalJMIR Diabetes
Volume5
Issue number1
DOIs
StatePublished - Jan 2020

Keywords

  • Diabetes
  • Epidemiology
  • Infodemiology
  • Prevalence
  • Social media
  • Twitter

Fingerprint

Dive into the research topics of 'Using social media to track geographic variability in language about diabetes: Infodemiology analysis'. Together they form a unique fingerprint.

Cite this