Skip to main navigation Skip to search Skip to main content

An Upper Bound and Linear-Space Queries on the LZ-End Parsing

  • University of California at Berkeley

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Scopus citations

Abstract

Lempel-Ziv (LZ77) compression is the most commonly used lossless compression algorithm. The basic idea is to greedily break the input string into blocks (called "phrases"), every time forming as a phrase the longest prefix of the unprocessed part that has an earlier occurrence. In 2010, Kreft and Navarro introduced a variant of LZ77 called LZ-End, that additionally requires the previous occurrence of each phrase to end at the boundary of an already existing phrase. Due to its excellent practical performance as a compression algorithm and a compressed index, they conjectured that it achieves a compression that can be provably upper-bounded in terms of the LZ77 size. Despite the recent progress in understanding such relation for other compression algorithms (e.g., the run-length encoded Burrows-Wheeler transform), no such result is known for LZ-End. We prove that for any string of length n, the number ze of phrases in the LZ-End parsing satisfies ze = O(z log2 n), where z is the number of phrases in the LZ77 parsing. This puts LZ-End among the strongest dictionary compressors and solves a decade-old open problem of Kreft and Navarro. Using our techniques we also derive bounds for other variants of LZ-End and with respect to other compression measures. Our second contribution is a data structure that implements random access queries to the text in O(ze) space and O(polylog n) time. This is the first linear-size structure on LZ-End that efficiently implements such queries. All previous data structures either incur a logarithmic penalty in the space or have slow queries. We also show how to extend these techniques to support longest-common-extension (LCE) queries.

Original languageEnglish
Title of host publicationACM-SIAM Symposium on Discrete Algorithms, SODA 2022
PublisherAssociation for Computing Machinery
Pages2847-2866
Number of pages20
ISBN (Electronic)9781611977073
DOIs
StatePublished - 2022
Event33rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2022 - Alexander, United States
Duration: Jan 9 2022Jan 12 2022

Publication series

NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
Volume2022-January
ISSN (Print)1071-9040
ISSN (Electronic)1557-9468

Conference

Conference33rd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2022
Country/TerritoryUnited States
CityAlexander
Period01/9/2201/12/22

Fingerprint

Dive into the research topics of 'An Upper Bound and Linear-Space Queries on the LZ-End Parsing'. Together they form a unique fingerprint.

Cite this