TY - GEN
T1 - A comparative study of demographic attribute inference in twitter
AU - Chen, Xin
AU - Wang, Yu
AU - Agichtein, Eugene
AU - Wang, Fusheng
N1 - Publisher Copyright:
© Copyright 2015, Association for the Advancement of Artificial Intelligence. All rights reserved.
PY - 2015
Y1 - 2015
N2 - Social media platforms have become a major gateway to receive and analyze public opinions. Understanding users can provide invaluable context information of their social media posts and significantly improve traditional opinion analysis models. Demographic attributes, such as ethnicity, gender, age, among others, have been extensively applied to characterize social media users. While studies have shown that user groups formed by demographic attributes can have coherent opinions towards political issues, these attributes are often not explicitly coded by users through their profiles. Previous work has demonstrated the effectiveness of different user signals such as users' posts and names in determining demographic attributes. Yet, these efforts mostly evaluate linguistic signals from users' posts and train models from artificially balanced datasets. In this paper, we propose a comprehensive list of user signals: self-descriptions and posts aggregated from users' friends and followers, users' profile images, and users' names. We provide a comparative study of these signals side-by-side in the tasks on inferring three major demographic attributes, namely ethnicity, gender, and age. We utilize a realistic unbalanced datasets that share similar demographic makeups in Twitter for training models and evaluation experiments. Our experiments indicate that self-descriptions provide the strongest signal for ethnicity and age inference and clearly improve the overall performance when combined with tweets. Profile images for gender inference have the highest precision score with overall score close to the best result in our setting. This suggests that signals in self-descriptions and profile images have potentials to facilitate demographic attribute inferences in Twitter, and are promising for future investigation.
AB - Social media platforms have become a major gateway to receive and analyze public opinions. Understanding users can provide invaluable context information of their social media posts and significantly improve traditional opinion analysis models. Demographic attributes, such as ethnicity, gender, age, among others, have been extensively applied to characterize social media users. While studies have shown that user groups formed by demographic attributes can have coherent opinions towards political issues, these attributes are often not explicitly coded by users through their profiles. Previous work has demonstrated the effectiveness of different user signals such as users' posts and names in determining demographic attributes. Yet, these efforts mostly evaluate linguistic signals from users' posts and train models from artificially balanced datasets. In this paper, we propose a comprehensive list of user signals: self-descriptions and posts aggregated from users' friends and followers, users' profile images, and users' names. We provide a comparative study of these signals side-by-side in the tasks on inferring three major demographic attributes, namely ethnicity, gender, and age. We utilize a realistic unbalanced datasets that share similar demographic makeups in Twitter for training models and evaluation experiments. Our experiments indicate that self-descriptions provide the strongest signal for ethnicity and age inference and clearly improve the overall performance when combined with tweets. Profile images for gender inference have the highest precision score with overall score close to the best result in our setting. This suggests that signals in self-descriptions and profile images have potentials to facilitate demographic attribute inferences in Twitter, and are promising for future investigation.
UR - https://www.scopus.com/pages/publications/84960947033
M3 - Conference contribution
AN - SCOPUS:84960947033
T3 - Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015
SP - 590
EP - 593
BT - Proceedings of the 9th International Conference on Web and Social Media, ICWSM 2015
PB - AAAI Press
T2 - 9th International Conference on Web and Social Media, ICWSM 2015
Y2 - 26 May 2015 through 29 May 2015
ER -