Skip to main navigation Skip to search Skip to main content

Toward terabyte pattern mining: An architecture-conscious solution

  • Ohio State University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

47 Scopus citations

Abstract

We present a strategy for mining frequent item sets from terabyte-scale data sets on cluster systems. The algorithm embraces the holistic notion of architecture-conscious datamining, taking into account the capabilities of the processor, the memory hierarchy and the available network interconnects. Optimizations have been designed for lowering communication costs using compressed data structures and a succinct encoding. Optimizations for improving cache, memory and I/O utilization using pruningand tiling techniques, and smart data placement strategies are also employed. We leverage the extended memory spaceand computational resources of a distributed message-passing clusterto design a scalable solution, where each node can extend its metastructures beyond main memory by leveraging 64-bit architecture support. Our solution strategy is presented in the context of FPGrowth, a well-studied and rather efficient frequent pattern mining algorithm. Results demonstrate that the proposed strategy result in near-linearscaleup on up to 48 nodes.

Original languageEnglish
Title of host publicationProceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'07
Pages2-12
Number of pages11
DOIs
StatePublished - 2007
Event2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'07 - San Jose, CA, United States
Duration: Mar 14 2007Mar 17 2007

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Conference

Conference2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP'07
Country/TerritoryUnited States
CitySan Jose, CA
Period03/14/0703/17/07

Keywords

  • Data mining
  • Itemset mining
  • Out of core
  • Parallel

Fingerprint

Dive into the research topics of 'Toward terabyte pattern mining: An architecture-conscious solution'. Together they form a unique fingerprint.

Cite this