Skip to main navigation Skip to search Skip to main content

Stratified Subsampling Based p-values for Hypothesis Tests in Genomics Research

  • Northwestern University
  • Pondicherry University

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Multiple testing, which refers to testing of more than one hypothesis in an experiment, is routinely performed in statistical analysis of genome-wide data, such as testing the association of single-nucleotide polymorphisms (SNPs) with a particular phenotype. A common practice is application of multiple-testing correction methods to exclude candidate SNPs that could otherwise be spuriously marked as statistically significant. However, in many cases such methods are overly conservative and often result in no significant SNPs at all. In this paper, we summarize commonly used multiple-testing correction procedures and Monte Carlo simulation-based methods. We propose a simple modification to subsampling-based simulation method to estimate empirical p-values by borrowing the principles of stratified sampling. Using real datasets from the cancer genome atlas (TCGA) data repository, we demonstrate that the traditional multiple testing correction methods yielded almost none or very few significant risks associated SNPs, whereas the proposed stratified subsampling successfully resulted in appropriate number of significant candidate SNPs. We also show that the proposed modification has provided meaningful p-values and made the test more powerful as compared to simple subsampling without stratification.

Original languageEnglish
Pages (from-to)209-221
Number of pages13
JournalStatistics and Applications
Volume19
Issue number1
StatePublished - May 2021

Keywords

  • Multiple comparison test
  • p-value
  • Stratified sampling
  • Subsampling

Fingerprint

Dive into the research topics of 'Stratified Subsampling Based p-values for Hypothesis Tests in Genomics Research'. Together they form a unique fingerprint.

Cite this