TY - GEN
T1 - Hybrid indexing revisited
AU - Hactor, F.
AU - Kempa, Dominik
AU - Puglisi, Simon J.
N1 - Publisher Copyright:
Copyright © 2018 by SIAM.
PY - 2018
Y1 - 2018
N2 - Hybrid indexing is a recent approach to text indexing that allows the space-usage of conventional text indexes (e.g., su x trees, su x arrays, FM-indexes) to scale well with the text size, n, when z, the size of the Lempel-Ziv parsing of the text, is small relative to n. The price for this improved scalability is that an upper bound M on the pattern length that can be searched for must be declared at index construction time. Because the size of the resulting index contains an O(Mz) term, M must be kept reasonably small, though it has been shown that M ? 100 leads to acceptable performance in some genomic applications. However, despite its promise, the practical performance of hybrid indexing relative to other compressed index data structures is poorly understood. This paper addresses that need, detailing experiments that show hybrid indexing — when carefully implemented — to be significantly smaller and faster than alternative approaches on a broad range of data of di erent levels of compressibility. We also describe practical extensions to hybrid indexing that obviate the restriction on M, supporting search for patterns of arbitrary length.
AB - Hybrid indexing is a recent approach to text indexing that allows the space-usage of conventional text indexes (e.g., su x trees, su x arrays, FM-indexes) to scale well with the text size, n, when z, the size of the Lempel-Ziv parsing of the text, is small relative to n. The price for this improved scalability is that an upper bound M on the pattern length that can be searched for must be declared at index construction time. Because the size of the resulting index contains an O(Mz) term, M must be kept reasonably small, though it has been shown that M ? 100 leads to acceptable performance in some genomic applications. However, despite its promise, the practical performance of hybrid indexing relative to other compressed index data structures is poorly understood. This paper addresses that need, detailing experiments that show hybrid indexing — when carefully implemented — to be significantly smaller and faster than alternative approaches on a broad range of data of di erent levels of compressibility. We also describe practical extensions to hybrid indexing that obviate the restriction on M, supporting search for patterns of arbitrary length.
UR - https://www.scopus.com/pages/publications/85041396827
U2 - 10.1137/1.9781611975055.1
DO - 10.1137/1.9781611975055.1
M3 - Conference contribution
AN - SCOPUS:85041396827
T3 - Proceedings of the Workshop on Algorithm Engineering and Experiments
SP - 1
EP - 8
BT - 2018 Proceedings of the 20th Workshop on Algorithm Engineering and Experiments, ALENEX 2018
A2 - Venkatasubramanian, Suresh
A2 - Pagh, Rasmus
PB - Society for Industrial and Applied Mathematics Publications
T2 - 20th Workshop on Algorithm Engineering and Experiments, ALENEX 2018
Y2 - 7 January 2018 through 8 January 2018
ER -