TY - GEN
T1 - Vector-based similarity measurements for historical figures
AU - Chen, Yanqing
AU - Perozzi, Bryan
AU - Skiena, Steven
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Historical interpretation benefits from identifying analogies among famous people: Who are the Lincolns, Einsteins, Hitlers, and Mozarts? We investigate several approaches to convert approximately 600,000 historical figures into vector representations to quantify similarity according to their Wikipedia pages. We adopt an effective reference standard based on the number of human-annotated Wikipedia categories being shared and use this to demonstrate the performance of our similarity detection algorithms. In particular, we investigate four different unsupervised approaches to representing the semantic associations of individuals: (1) TF-IDF, (2) Weighted average of distributed word embedding, (3) LDA Topic analysis and (4) Deepwalk embedding from page links. All proved effective, but Deepwalk embedding yielded an overall accuracy of 91.33% in our evaluation to uncover historical analogies. Combining LDA and Deepwalk yielded even higher performance.
AB - Historical interpretation benefits from identifying analogies among famous people: Who are the Lincolns, Einsteins, Hitlers, and Mozarts? We investigate several approaches to convert approximately 600,000 historical figures into vector representations to quantify similarity according to their Wikipedia pages. We adopt an effective reference standard based on the number of human-annotated Wikipedia categories being shared and use this to demonstrate the performance of our similarity detection algorithms. In particular, we investigate four different unsupervised approaches to representing the semantic associations of individuals: (1) TF-IDF, (2) Weighted average of distributed word embedding, (3) LDA Topic analysis and (4) Deepwalk embedding from page links. All proved effective, but Deepwalk embedding yielded an overall accuracy of 91.33% in our evaluation to uncover historical analogies. Combining LDA and Deepwalk yielded even higher performance.
KW - Deepwalk
KW - People similarity
KW - Vector representations
UR - https://www.scopus.com/pages/publications/84951826801
U2 - 10.1007/978-3-319-25087-8_17
DO - 10.1007/978-3-319-25087-8_17
M3 - Conference contribution
AN - SCOPUS:84951826801
SN - 9783319250861
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 179
EP - 190
BT - Similarity Search and Applications - 8th International Conference, SISAP 2015, Proceedings
A2 - Connor, Richard
A2 - Amato, Giuseppe
A2 - Falchi, Fabrizio
A2 - Gennaro, Claudio
PB - Springer Verlag
T2 - 8th International Conference on Similarity Search and Applications, SISAP 2015
Y2 - 12 October 2015 through 14 October 2015
ER -