Abstract
Computation over compressed data is a new paradigm in the design of algorithms and data structures that can reduce space usage and speed up computation by orders of magnitude. One of the most frequently employed compression frameworks, capturing many practical compression methods (such as the Lempel-Ziv family, dictionary methods, and others), is grammar compression. In this framework, a string T of length N is represented as a context-free grammar of size n whose language contains only the string T. In this paper, we focus on studying the limitations of these techniques. Previous work focused on proving lower bounds for algorithms and data structures operating over grammars constructed using algorithms that achieve the approximation ratio ρ = O(polylog N) (since finding the smallest grammar representation is NP-hard, every polynomial-time grammar compressor can be viewed as an approximation algorithm). Unfortunately, for many grammar compressors we either have ρ = ω(polylog N) or it is not known whether ρ = O(polylog N) holds. In their seminal paper, Charikar, Lehman, Liu, Panigrahy, Prabhakaran, Sahai, and Shelat [IEEE Trans. Inf. Theory 2005] studied seven popular grammar compression algorithms: RePair, Greedy, LongestMatch, Sequential, Bisection, LZ78, and α-Balanced. Only one of them (α-Balanced) is known to achieve ρ = O(polylog N). In this paper, we develop the first technique for proving lower bounds for data structures and algorithms on grammars that is fully general and does not depend on the approximation ratio ρ of the used grammar compressor. Our first set of results concerns compressed data structures. In 2013, Verbin and Yu proved that implementing random access to T using a grammar constructed by an algorithm with ρ = O(polylog N) requires Ω(log N/log log N) time in the worst case. This lower bound applies to any structure using O(npolylog N) space and matches the existing upper bounds. We prove that this lower bound holds also for RePair, Greedy, LongestMatch, Sequential, and Bisection, while Ω(log log N) time is required for random access to LZ78. Our lower bounds apply to any structure using O(npolylog N) space and match the existing upper bounds. Moreover, we show that our technique generalizes to the class of global algorithms (that includes, e.g., the RePair algorithm), i.e., the lower bound Ω(log N/log log N) applies to the whole class. This makes a significant step forward in a long-standing open problem of analyzing global algorithms. Our second set of results concerns compressed computation, i.e., computation that runs in time that depends on the size of the input in compressed form. Recently, Abboud, Backurs, Bringmann, and Künnemann [FOCS 2017 and NeurIPS 2020] proved numerous limitations of compressed computation under popular conjectures (such as SETH, k-Clique, k-OV, and k-SUM). Similarly as above, however, their framework also displays a dependence on ρ. For example, their results imply that, assuming the k-Clique Conjecture, there is no algorithm to solve CFG Parsing (for which the best algorithm has a time complexity of O(Nω), where ω is the exponent of matrix multiplication) on grammars constructed using Bisection (which satisfies ρ = Θ(e N1/2)) that runs in O(nc · Nω−ϵ) time for constants ϵ > 0 and c < 2ϵ. Using our new techniques, we improve these and other conditional lower bounds. For example, for CFG parsing on Bisection, we rule out algorithms with runtime O(nc · Nω−ϵ) for all constants ϵ > 0 and c > 0.
| Original language | English |
|---|---|
| Pages | 3376-3392 |
| Number of pages | 17 |
| DOIs | |
| State | Published - 2024 |
| Event | 35th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2024 - Alexandria, United States Duration: Jan 7 2024 → Jan 10 2024 |
Conference
| Conference | 35th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2024 |
|---|---|
| Country/Territory | United States |
| City | Alexandria |
| Period | 01/7/24 → 01/10/24 |
Fingerprint
Dive into the research topics of 'Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver