TY - GEN
T1 - Linear Probing Revisited
T2 - 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021
AU - Bender, Michael A.
AU - Kuszmaul, Bradley C.
AU - Kuszmaul, William
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The linear-probing hash table is one of the oldest and most widely used data structures in computer science. However, linear probing famously comes with a major draw-back: as soon as the hash table reaches a high memory utilization, elements within the hash table begin to cluster together, causing insertions to become slow. This phenomenon, now known as primary clustering, was first captured by Donald Knuth in 1963; at a load factor of 1-1/x, the expected time per insertion is Θ(x 2}), rather than the more desirable Θ(x). We show that there is more to the story than the classic analysis would seem to suggest. It turns out that small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions. If these design decisions are made correctly, then even a hash table that is continuously at a load factor 1-Θ(1/x) can achieve average insertion time tildeO(x). A key insight is that the tombstones left behind by deletions cause a surprisingly strong 'anti-clustering' effect, and that when insertions and deletions are one-for-one, the anti-clustering effects of deletions actually overpower the clustering effects of insertions. We also present a new variant of linear probing, which we call graveyard hashing, that completely eliminates primary clustering on any sequence of operations. If, when an operation is performed, the current load factor is 1-1/x for some x, then the expected cost of the operation is O(x). One corollary is that, in the external-memory model with a data block size of B, graveyard hashing offers the following remarkable guarantee: at any load factor 1-1/x satisfying x=o(B), graveyard hashing achieves 1 +o(1) expected block transfers per operation. Past external-memory hash tables have only been able to offer a 1 +o(1) guarantee when the block size B is at least Ω(x 2}). Our results come with actionable lessons for both theoreticians and practitioners, in particular, that well-designed use of tombstones can completely change the asymptotic landscape of how the linear probing behaves (and if there are no deletions).
AB - The linear-probing hash table is one of the oldest and most widely used data structures in computer science. However, linear probing famously comes with a major draw-back: as soon as the hash table reaches a high memory utilization, elements within the hash table begin to cluster together, causing insertions to become slow. This phenomenon, now known as primary clustering, was first captured by Donald Knuth in 1963; at a load factor of 1-1/x, the expected time per insertion is Θ(x 2}), rather than the more desirable Θ(x). We show that there is more to the story than the classic analysis would seem to suggest. It turns out that small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions. If these design decisions are made correctly, then even a hash table that is continuously at a load factor 1-Θ(1/x) can achieve average insertion time tildeO(x). A key insight is that the tombstones left behind by deletions cause a surprisingly strong 'anti-clustering' effect, and that when insertions and deletions are one-for-one, the anti-clustering effects of deletions actually overpower the clustering effects of insertions. We also present a new variant of linear probing, which we call graveyard hashing, that completely eliminates primary clustering on any sequence of operations. If, when an operation is performed, the current load factor is 1-1/x for some x, then the expected cost of the operation is O(x). One corollary is that, in the external-memory model with a data block size of B, graveyard hashing offers the following remarkable guarantee: at any load factor 1-1/x satisfying x=o(B), graveyard hashing achieves 1 +o(1) expected block transfers per operation. Past external-memory hash tables have only been able to offer a 1 +o(1) guarantee when the block size B is at least Ω(x 2}). Our results come with actionable lessons for both theoreticians and practitioners, in particular, that well-designed use of tombstones can completely change the asymptotic landscape of how the linear probing behaves (and if there are no deletions).
KW - amortized analysis
KW - graveyard hashing
KW - linear probing
KW - primary clustering
KW - tombstones
UR - https://www.scopus.com/pages/publications/85127191514
U2 - 10.1109/FOCS52979.2021.00115
DO - 10.1109/FOCS52979.2021.00115
M3 - Conference contribution
AN - SCOPUS:85127191514
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 1171
EP - 1182
BT - Proceedings - 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science, FOCS 2021
PB - IEEE Computer Society
Y2 - 7 February 2022 through 10 February 2022
ER -