Mathematics, Statistics, and Computer Science
imputation, dosage, genome-wide association studies
Objective: The use of haplotypes to impute the genotypes of unmeasured single nucleotide variants continues to rise in popularity. Simulation results suggest that the use of the dosage as a one-dimensional summary statistic of imputation posterior probabilities may be optimal both in terms of statistical power and computational efficiency; however, little theoretical understanding is available to explain and unify these simulation results. In our analysis, we provide a theoretical foundation for the use of the dosage as a one-dimensional summary statistic of genotype posterior probabilities from any technology. Methods: We analytically evaluate the dosage, mode and the more general set of all one-dimensional summary statistics of two-dimensional (three posterior probabilities that must sum to 1) genotype posterior probability vectors. Results: We prove that the dosage is an optimal one-dimensional summary statistic under a typical linear disease model and is robust to violations of this model. Simulation results confirm our theoretical findings. Conclusions: Our analysis provides a strong theoretical basis for the use of the dosage as a one-dimensional summary statistic of genotype posterior probability vectors in related tests of genetic association across a wide variety of genetic disease models.
Source Publication Title
Liu K, Luedtke A, and Tintle NL (2013) “Optimal methods for using posterior probabilities in association testing” Human Heredity. 75(1): 2-11. doi:10.1159/000349974