TY - GEN
T1 - Domain Adaptation from User-level Facebook Models to County-level Twitter Predictions
AU - Rieman, Daniel
AU - Schwartz, H. Andrew
AU - Jaidka, Kokil
AU - Ungar, Lyle
N1 - Publisher Copyright:
©2017 AFNLP.
PY - 2017
Y1 - 2017
N2 - Several studies have demonstrated how language models of user attributes, such as personality, can be built by using the Facebook language of social media users in conjunction with their responses to psychology questionnaires. It is challenging to apply these models to make general predictions about attributes of communities, such as personality distributions across US counties, because it requires 1. the potentially inavailability of the original training data because of privacy and ethical regulations, 2. adapting Facebook language models to Twitter language without retraining the model, and 3. adapting from users to county-level collections of tweets. We propose a two-step algorithm, Target Side Domain Adaptation (TSDA) for such domain adaptation when no labeled Twitter/county data is available. TSDA corrects for the different word distributions between Facebook and Twitter and for the varying word distributions across counties by adjusting target side word frequencies; no changes to the trained model are made. In the case of predicting the Big Five county-level personality traits, TSDA outperforms a state-of-the-art domain adaptation method, gives county-level predictions that have fewer extreme outliers, higher year-to-year stability, and higher correlation with county-level outcomes.
AB - Several studies have demonstrated how language models of user attributes, such as personality, can be built by using the Facebook language of social media users in conjunction with their responses to psychology questionnaires. It is challenging to apply these models to make general predictions about attributes of communities, such as personality distributions across US counties, because it requires 1. the potentially inavailability of the original training data because of privacy and ethical regulations, 2. adapting Facebook language models to Twitter language without retraining the model, and 3. adapting from users to county-level collections of tweets. We propose a two-step algorithm, Target Side Domain Adaptation (TSDA) for such domain adaptation when no labeled Twitter/county data is available. TSDA corrects for the different word distributions between Facebook and Twitter and for the varying word distributions across counties by adjusting target side word frequencies; no changes to the trained model are made. In the case of predicting the Big Five county-level personality traits, TSDA outperforms a state-of-the-art domain adaptation method, gives county-level predictions that have fewer extreme outliers, higher year-to-year stability, and higher correlation with county-level outcomes.
UR - https://www.scopus.com/pages/publications/105019644280
M3 - Conference contribution
AN - SCOPUS:105019644280
T3 - 8th International Joint Conference on Natural Language Processing - Proceedings of the IJCNLP 2017, System Demonstrations
SP - 764
EP - 773
BT - 8th International Joint Conference on Natural Language Processing - Proceedings of the IJCNLP 2017
PB - Association for Computational Linguistics (ACL)
T2 - 8th International Joint Conference on Natural Language Processing, IJCNLP 2017
Y2 - 27 November 2017 through 1 December 2017
ER -