Skip to main navigation Skip to search Skip to main content

Pattern discovery and cancer gene identification in integrated cancer genomic data

  • Qianxing Mo
  • , Sijian Wang
  • , Venkatraman E. Seshan
  • , Adam B. Olshen
  • , Nikolaus Schultz
  • , Chris Sander
  • , R. Scott Powers
  • , Marc Ladanyi
  • , Ronglai Shen
  • Memorial Sloan-Kettering Cancer Center
  • Baylor College of Medicine
  • University of Wisconsin-Madison
  • University of California at San Francisco

Research output: Contribution to journalArticlepeer-review

396 Scopus citations

Abstract

Large-scale integrated cancer genome characterization efforts including the cancer genome atlas and the cancer cell line encyclopedia have created unprecedented opportunities to study cancer biology in the context of knowing the entire catalog of genetic alterations. A clinically important challenge is to discover cancer subtypes and their molecular drivers in a comprehensive genetic context. Curtis et al. [Nature (2012) 486(7403):346-352] has recently shown that integrative clustering of copy number and gene expression in 2,000 breast tumors reveals novel subgroups beyond the classic expression subtypes that show distinct clinical outcomes. To extend the scope of integrative analysis for the inclusion of somatic mutation data by massively parallel sequencing, we propose a framework for joint modeling of discrete and continuous variables that arise from integrated genomic, epigenomic, and transcriptomic profiling. The core idea is motivated by the hypothesis that diverse molecular phenotypes can be predicted by a set of orthogonal latent variables that represent distinct molecular drivers, and thus can reveal tumor subgroups of biological and clinical importance. Using the cancer cell line encyclopedia dataset, we demonstrate our method can accurately group cell lines by their cell-of-origin for several cancer types, and precisely pinpoint their known and potential cancer driver genes. Our integrative analysis also demonstrates the power for revealing subgroups that are not lineage-dependent, but consist of different cancer types driven by a common genetic alteration. Application of the cancer genome atlas colorectal cancer data reveals distinct integrated tumor subtypes, suggesting different genetic pathways in colon cancer progression.

Original languageEnglish
Pages (from-to)4245-4250
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume110
Issue number11
DOIs
StatePublished - Mar 12 2013

Keywords

  • Multidimensional data
  • Multivariate generalized linear model
  • Penalized regression

Fingerprint

Dive into the research topics of 'Pattern discovery and cancer gene identification in integrated cancer genomic data'. Together they form a unique fingerprint.

Cite this