The power of simple statistical techniques in the era of big and complex data: some recent examples from genetic association studies

作者:sds-admin 发布时间:2017-05-25

Title: The power of simple statistical techniques in the era of big and complex data: some recent examples from genetic association studies
Speaker: Prof. Lei Sun, University of Toronto
Time: 1:30pm-2:30pm, 2017-6-6 (Tuesday)
Location: Room N205, Zibin Building,Fudan University
Abstract: Genetic association studies aim to identify genetic markers associated with a heritable trait/outcome of interest.  Data used are typically big and challenging.  Whole-genome studies scan through millions of variables, missing data and measurement errors are often present, individuals from the same family are correlated, and complex genetic etiologies imply complex models.  Development favoring machine learning type of approaches is rising rapidly, across different scientific studies.  As a complementary approach, in this talk I present some recent examples where we efficiently and reliably extract information from large-scale genetic association studies, by reconsidering some of the classical statistical techniques in newer settings.   We first revisit the well-known Fisher’s method, commonly used in meta-analyses to combine p-values from the same test applied to K independent samples.  Here we propose to use it to combine p-values from different tests applied to the same sample, when analyzing multiple genetic variants simultaneously (Derkach, Lawless and Sun 2014, Statistical Science; 2015, Genetic Epidemiology), or when jointly capturing main and interaction effects (Soave et al. 2015, American Journal of Human Genetics).  In both settings, we show that there are two classes of complementary tests that are asymptotically independent of each other under a global null hypothesis; this is a desirable feature for analyzing big data.  We then revisit the simple linear regression and its celebrated extensions in novel context.   We first show that Levene’s scale-test for variance heterogeneity can be derived from a two-stage regression framework, and this allows us to generalize the test, with ease, for more complex data (Soave and Sun 2017, Biometrics).  If time permits, I will discuss on-going work, with graduate student Lin Zhang, on how to use a regression model to test Hardy-Weinberg equilibrium; this leads to a new allele-based association test with theoretical insights on its robustness.  We also provide supporting evidence from applications including genetic association studies of complications related to type 1 diabetes and cystic fibrosis.