Skip to main navigation Skip to search Skip to main content

SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Scopus citations

Abstract

Much effort has been devoted to support high performance spatial queries on large volumes of spatial data in distributed spatial computing systems, especially in the MapReduce paradigm. Recent works have focused on extending spatial MapReduce frameworks to leverage high performance in-memory distributed processing capabilities of systems such as Spark. However, the performance advantage comes with the requirement of having enough memory and comprehensive configuration. Failing to fulfill this falls back to disk IO, defeating the purpose of such systems or in worst case gets out of memory and fails the job. The problem is aggravated further for spatial processing since the underlying in-memory systems are oblivious of spatial data features and characteristics. In this paper we present SparkGIS - an in-memory oriented spatial data querying system for high throughput and low latency spatial query handling by adapting Apache Spark’s distributed processing capabilities. It supports basic spatial queries including containment, spatial join and k-nearest neighbor and allows extending these to complex query pipelines. SparkGIS mitigates skew in distributed processing by supporting several dynamic partitioning algorithms suitable for a rich set of contemporary application scenarios. Multilevel global and local, pre-generated and on-demand in-memory indexes, allow SparkGIS to prune input data and apply compute intensive operations on a subset of relevant spatial objects only. Finally, SparkGIS employs dynamic query rewriting to gracefully manage large spatial query workflows that exceed available distributed resources. Our comparative evaluation has shown that the performance of SparkGIS is on par with contemporary Spark based platforms for relatively smaller queries and outperforms them for larger data and memory intensive workflows by dynamic query rewriting and efficient spatial data management.

Original languageEnglish
Title of host publicationGIS
Subtitle of host publicationProceedings of the ACM International Symposium on Advances in Geographic Information Systems
EditorsSiva Ravada, Erik Hoel, Roberto Tamassia, Shawn Newsam, Goce Trajcevski, Goce Trajcevski
PublisherAssociation for Computing Machinery
ISBN (Print)9781450354905
DOIs
StatePublished - Nov 7 2017
Event25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017 - Redondo Beach, United States
Duration: Nov 7 2017Nov 10 2017

Publication series

NameGIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
Volume2017-November

Conference

Conference25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL GIS 2017
Country/TerritoryUnited States
CityRedondo Beach
Period11/7/1711/10/17

Keywords

  • In-Memory processing
  • MapReduce
  • Spark
  • Spatial processing

Fingerprint

Dive into the research topics of 'SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing'. Together they form a unique fingerprint.

Cite this