TY - GEN
T1 - Pre-phaser
T2 - 10th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2019
AU - Yurovsky, Alisa
AU - Futcher, Bruce
AU - Skiena, Steven
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/9/4
Y1 - 2019/9/4
N2 - Precise cell cycle phase identification in scRNA-seq improves differential expression analysis. However, the sparseness of scRNA-seq data contributes to the challenge of combining external cell cycle information with gene expression for precise cell phase identification. Existing techniques select their own set of marker genes for each coarse cell cycle phase, and assign cells with enriched average marker expression to this coarse phase. We observe that precise points along the cell cycle are associated with time course points from microarray experiments that identify cycling genes. In this work, we present the effectiveness of using k-nearest neighbors (kNN) to predict coarse cell cycle phase, and extend the kNN methodology to present the first method to identify precise cell cycle phases by correlating single cell transcript counts for significant cycling genes with time course points. We demonstrate that tuned kNN outperforms existing methods in assigning coarse cell cycle phase, getting an average F1 score of 0.641 on four previously published scRNA-seq datasets. We then describe Pre-Phaser, which establishes a general computational approach for precise cell phase assignment using kNN. Our k-fold cross-validation has an accuracy of 0.872 in precise prediction, and conversion to coarse phase yields an F1 score of 0.716. Our results motivate research into precise precise cell phase assignment to enable fine-tuned differential expression analyses.
AB - Precise cell cycle phase identification in scRNA-seq improves differential expression analysis. However, the sparseness of scRNA-seq data contributes to the challenge of combining external cell cycle information with gene expression for precise cell phase identification. Existing techniques select their own set of marker genes for each coarse cell cycle phase, and assign cells with enriched average marker expression to this coarse phase. We observe that precise points along the cell cycle are associated with time course points from microarray experiments that identify cycling genes. In this work, we present the effectiveness of using k-nearest neighbors (kNN) to predict coarse cell cycle phase, and extend the kNN methodology to present the first method to identify precise cell cycle phases by correlating single cell transcript counts for significant cycling genes with time course points. We demonstrate that tuned kNN outperforms existing methods in assigning coarse cell cycle phase, getting an average F1 score of 0.641 on four previously published scRNA-seq datasets. We then describe Pre-Phaser, which establishes a general computational approach for precise cell phase assignment using kNN. Our k-fold cross-validation has an accuracy of 0.872 in precise prediction, and conversion to coarse phase yields an F1 score of 0.716. Our results motivate research into precise precise cell phase assignment to enable fine-tuned differential expression analyses.
KW - Cell Cycle
KW - Nearest Neighbor Classification
KW - RNA-seq
KW - Single Cell
UR - https://www.scopus.com/pages/publications/85073164398
U2 - 10.1145/3307339.3342174
DO - 10.1145/3307339.3342174
M3 - Conference contribution
AN - SCOPUS:85073164398
T3 - ACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
SP - 376
EP - 382
BT - ACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
PB - Association for Computing Machinery, Inc
Y2 - 7 September 2019 through 10 September 2019
ER -