TY - GEN
T1 - Access
T2 - 19th International World Wide Web Conference, WWW2010
AU - Bautin, Mikhail
AU - Ward, Charles B.
AU - Patil, Akshay
AU - Skiena, Steven S.
PY - 2010
Y1 - 2010
N2 - The social sciences strive to understand the political, social, and cultural world around us, but have been impaired by limited access to the quantitative data sources enjoyed by the hard sciences. Careful analysis of Web document streams holds enormous potential to solve longstanding problems in a variety of social science disciplines through massive data analysis. This paper introduces the TextMap Access system, which provides ready access to a wealth of interesting statistics on millions of people, places, and things across a number of interesting web corpora. Powered by a flexible and scalable distributed statistics computation framework using Hadoop, continually updated corpora include newspapers, blogs, patent records, legal documents, and scientific abstracts; well over a terabyte of raw text and growing daily. The Lydia Textmap Access system, available through http://www.textmap.com/access, provides instant access for students and scholars through a convenient web user-interface. We describe the architecture of the TextMap Access system, and its impact on current research in political science, sociology, and business/marketing.
AB - The social sciences strive to understand the political, social, and cultural world around us, but have been impaired by limited access to the quantitative data sources enjoyed by the hard sciences. Careful analysis of Web document streams holds enormous potential to solve longstanding problems in a variety of social science disciplines through massive data analysis. This paper introduces the TextMap Access system, which provides ready access to a wealth of interesting statistics on millions of people, places, and things across a number of interesting web corpora. Powered by a flexible and scalable distributed statistics computation framework using Hadoop, continually updated corpora include newspapers, blogs, patent records, legal documents, and scientific abstracts; well over a terabyte of raw text and growing daily. The Lydia Textmap Access system, available through http://www.textmap.com/access, provides instant access for students and scholars through a convenient web user-interface. We describe the architecture of the TextMap Access system, and its impact on current research in political science, sociology, and business/marketing.
KW - blog analysis
KW - hadoop
KW - news analysis
KW - social sciences
UR - https://www.scopus.com/pages/publications/77954599071
U2 - 10.1145/1772690.1772889
DO - 10.1145/1772690.1772889
M3 - Conference contribution
AN - SCOPUS:77954599071
SN - 9781605587998
T3 - Proceedings of the 19th International Conference on World Wide Web, WWW '10
SP - 1229
EP - 1232
BT - Proceedings of the 19th International Conference on World Wide Web, WWW '10
Y2 - 26 April 2010 through 30 April 2010
ER -