Skip to main navigation Skip to search Skip to main content

Understanding the Intrinsic Characteristics of Spatial Partitioning in Distributed Spatial Join

  • Shandong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Spatial join has become a frequently used yet resource-intensive operation in geospatial applications, driven by the increasing volume and complexity of geospatial data. With Hadoop and Spark becoming the de facto standard platforms for distributed computing, scalable spatial data processing is primarily achieved by partitioning the input space to form parallel units on these platforms. Effective spatial data partitioning is critical for task parallelization and load balancing, but it faces significant challenges due to data skew and the geometric and topological complexity of spatial objects, particularly in supporting spatial joins. This paper examines the interplay among query performance, spatial data partitioning, query types, data, and system characteristics. We qualitatively and quantitatively analyze the features of representative partitioning algorithms that impact overall query performance. Along with these analyses, we propose a data sampling-based approach for selecting optimized partitioning strategies. Extensive experiments on large and complex datasets using MapReduce frameworks are conducted to validate the correctness of our analysis and the effectiveness of our optimization approach.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
EditorsWei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages403-412
Number of pages10
ISBN (Electronic)9798350362480
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Big Data, BigData 2024 - Washington, United States
Duration: Dec 15 2024Dec 18 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
ISSN (Print)2639-1589
ISSN (Electronic)2573-2978

Conference

Conference2024 IEEE International Conference on Big Data, BigData 2024
Country/TerritoryUnited States
CityWashington
Period12/15/2412/18/24

Keywords

  • distributed processing
  • spatial partitioning

Fingerprint

Dive into the research topics of 'Understanding the Intrinsic Characteristics of Spatial Partitioning in Distributed Spatial Join'. Together they form a unique fingerprint.

Cite this