Skip to main navigation Skip to search Skip to main content

Effective Scalable and Integrative Geocoding for Massive Address Datasets

  • Sina Rashidian
  • , Xinyu Dong
  • , Amogh Avadhani
  • , Prachi Poddar
  • , Fusheng Wang
  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

With increased accessibility of large scale open data, public health studies are able to take advantage of integrative spatial big data to increase the spatial resolution to community or neighborhood level. One critical information for such studies is the large number of addresses of patients, which is private and highly sensitive. Geocoding such massive private addresses poses major challenges for public health researchers. Many geocoders provide only Web APIs which require sending private addresses over the Internet, which is not feasible. Commercial geocoders require high licensing fee and often have limitations on daily usage, which becomes a major hurdle for researchers. Scalability is another major challenge for large scale address dataset. In this paper, we present EaserGeocoder, a novel open source geocoder for effectively geocoding massive address datasets. EaserGeocoder takes an integrative approach by using multiple references based on open address data sources contributed by governments or communities. It takes a machine learning approach to automatically find the best answer from candidates produced by multiple references. The system provides high scalability through parallel processing. Our comparative studies demonstrate EaserGeocoder outperforms open source geocoders and is comparable to commercial ones in terms of both accuracy and error. It provides a cost-effective and feasible solution for large scale public health studies.

Original languageEnglish
Title of host publicationGIS
Subtitle of host publicationProceedings of the ACM International Symposium on Advances in Geographic Information Systems
EditorsSiva Ravada, Erik Hoel, Roberto Tamassia, Shawn Newsam, Goce Trajcevski, Goce Trajcevski
PublisherAssociation for Computing Machinery
ISBN (Print)9781450354905
DOIs
StatePublished - Nov 7 2017
Event25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017 - Redondo Beach, United States
Duration: Nov 7 2017Nov 10 2017

Publication series

NameGIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
Volume2017-November

Conference

Conference25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017
Country/TerritoryUnited States
CityRedondo Beach
Period11/7/1711/10/17

Keywords

  • Geocoding
  • Geographic Information System
  • Text Searching

Fingerprint

Dive into the research topics of 'Effective Scalable and Integrative Geocoding for Massive Address Datasets'. Together they form a unique fingerprint.

Cite this