Mathematics, Statistics, and Computer Science
data sets, gene expression, transcriptomics
Gene set analysis methods continue to be a popular and powerful method of evaluating genome-wide transcriptomics data. These approach require a priori grouping of genes into biologically meaningful sets, and then conducting downstream analyses at the set (instead of gene) level of analysis. Gene set analysis methods have been shown to yield more powerful statistical conclusions than single-gene analyses due to both reduced multiple testing penalties and potentially larger observed effects due to the aggregation of effects across multiple genes in the set. Traditionally, gene set analysis methods have been applied directly to normalized, log-transformed, transcriptomics data. Recently, efforts have been made to transform transcriptomics data to scales yielding more biologically interpretable results. For example, recently proposed models transform log-transformed transcriptomics data to a confidence metric (ranging between 0 and 100%) that a gene is active (roughly speaking, that the gene product is part of an active cellular mechanism). In this manuscript, we demonstrate, on both real and simulated transcriptomics data, that tests for differential expression between sets of genes using are typically more powerful when using gene activity state estimates as opposed to log-transformed gene expression data. Our analysis suggests further exploration of techniques to transform transcriptomics data to meaningful quantities for improved downstream inference.
Source Publication Title
Proceedings of the Pacific Symposium on Biocomputing
World Scientific Publishing Company
Kamp, Thomas; Adams, Micah; Disselkoen, Craig; and Tintle, Nathan L., "Improved Performance of Gene Set Analysis on Genome-Wide Transcriptomics Data When Using Gene Activity State Estimates" (2017). Faculty Work Comprehensive List. 674.