Project Details
Description
This research project aims to understand and develop systems for
maintaining superlinear indexes for streaming data. A superlinear
index provides search capability over an abstract space that cannot
easily be linearized (totally ordered). In contrast, a linear index,
typified by a B-tree, supports point and range queries on totally
ordered data.
Examples of superlinear indexes include multidimensional indexes,
which can be over a geometric domain, such as geographic data, or
which can be over multiple linear indexes; and full text queries,
which can include searching for a particular word or substring.
The superlinear indexes found in today's databases cannot support high
rates of insertion. On traditional mechanical disk drives, the
existing superlinear indexes can only support about one hundred
insertions per second in the worst case. For many important
applications, that is too slow, and so database users often avoid
superlinear indexing. Even traditional linear indexes based on
B-trees cannot support the high insertion rates demanded by many
databases.
This research investigates streaming superlinear indexes, that is,
indexes that efficiently support full text or multidimensional
queries, and can be updated at speeds that are related to disk
bandwidth rather than seeks per second.
Among the significant research issues are the following: (1) design
efficient files structures for streaming superlinear indexes; (2)
investigate how streaming superlinear indexes might pave the way to
improved file systems; (3) determine whether cache-oblivious
algorithms technology can enhance streaming superlinear indexes; and
(4) program complex data structures for transactions and recovery.
If successful, this research will show how to build filesystems that
achieve dramatically better performance than today's B-tree-based
filesystems, how to maintain rich geometrical data and
multidimensional nongeographical databases in real time, and how to
maintain full-text searchable databases in real time. For example,
some of today's file systems try to maintain an full-text index to
find strings in files quickly, but these systems often fall behind at
high data write rates. A streaming superlinear index would allow such
a file system to keep up, and would improve the usability of both
high-end storage systems and relatively small consumer storage systems
that are nonetheless too large to index with today's indexes.
The researchers are developing course materials on streaming indexing
technology which will be made freely available under the MIT
OpenCourseWare initiative (http://ocw.mit.edu).
Further information on this project may be found at the project
web page: http://supertech.csail.mit.edu/superlinear-indexes
| Status | Finished |
|---|---|
| Effective start/end date | 09/1/09 → 08/31/13 |
Funding
- National Science Foundation: $199,993.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.