Faculty Work Comprehensive List

Inflated Type I Error Rates When Using Aggregation Methods to Analyze Rare Variants in the 1000 Genomes Project Exon Sequencing Data in Unrelated Individuals: Summary Results from Group 7 at Genetic Analysis Workshop 17

Nathan L. Tintle, Dordt CollegeFollow
Hugues Aschard, Harvard School of Public Health
Inchi Hu, Hong Kong University of Science and Technology
Nora Nock, Case Western Reserve University
Haitian Wang, Hong Kong University of Science and Technology
Elizabeth Pugh, Johns Hopkins University

Document Type

Article

Publication Date

2011

Department

Mathematics, Statistics, and Computer Science

Keywords

population structure, correlated markers, next-generation sequencing, Genetic Analysis Workshop 17, 1000 Genomes Project

Abstract

As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype-phenotype association in next-generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next-generation sequencing data: type I error and false-positive report probability rates higher than those expected based on empirical type I error levels (as high as 90%). Two main causes emerged: population stratification and long-range correlation (gametic phase disequilibrium) between rare variants. Population stratification was expected because of the diverse sample. Correlation between rare variants was attributable to both random causes (e.g., nearly 10,000 of 25,000 markers were private variants, and the sample size was small [n = 697]) and nonrandom causes (more correlation was observed than was expected by random chance). Principal components analysis was used to control for population structure and helped to minimize type I errors, but this was at the expense of identifying fewer causal variants. A novel multiple regression approach showed promise to handle correlation between markers. Further work is needed, first, to identify best practices for the control of type I errors in the analysis of sequencing data and then to explore and compare the many promising new aggregating approaches for identifying markers associated with disease phenotypes.

Comments

This is a pre-publication author manuscript of the following final, published article: Tintle, N., Aschard, H., Hu, I., Nock, N., Wang, H. and Pugh, E. (2011), Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17. Genet. Epidemiol., 35: S56–S60. doi: 10.1002/gepi.20650

The definitive version is published by Wiley and available from Wiley Online Library (wileyonlinelibrary.com) at http://onlinelibrary.wiley.com/doi/10.1002/gepi.20650/abstract.

Source Publication Title

Genetic Epidemiology

Publisher

Wiley

Volume

Issue

Supplement 1

First Page

S56

DOI

10.1002/gepi.20650

Recommended Citation

Tintle, N., Aschard, H., Hu, I., Nock, N., Wang, H. and Pugh, E. (2011), Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17. Genet. Epidemiol., 35: S56–S60. doi: 10.1002/gepi.20650

Download

Find in your library

Included in

Bioinformatics Commons, Genetics and Genomics Commons, Statistics and Probability Commons

COinS

Faculty Work Comprehensive List

Inflated Type I Error Rates When Using Aggregation Methods to Analyze Rare Variants in the 1000 Genomes Project Exon Sequencing Data in Unrelated Individuals: Summary Results from Group 7 at Genetic Analysis Workshop 17

Document Type

Publication Date

Department

Keywords

Abstract

Comments

Source Publication Title

Publisher

Volume

Issue

First Page

DOI

Recommended Citation

Included in

Browse

Author Corner

Links

Faculty Work Comprehensive List

Inflated Type I Error Rates When Using Aggregation Methods to Analyze Rare Variants in the 1000 Genomes Project Exon Sequencing Data in Unrelated Individuals: Summary Results from Group 7 at Genetic Analysis Workshop 17

Authors

Document Type

Publication Date

Department

Keywords

Abstract

Comments

Source Publication Title

Publisher

Volume

Issue

First Page

DOI

Recommended Citation

Included in

Share

Browse

Author Corner

Links