Abstract
Variables are everywhere and the problem of variable selection is an active area of research across a variety of fields. In this talk I will provide an overview of multiple variable selection developments and applications completed by my research group. Most of the early applications are to genetic data, answering the question of what genes area associated with a disease. The first method, Stochastic search gene suggestion (SSGS), developed in 2006, incorporated genetic structures into the hierarchical priors for model selection. Specifically, we incorporated linkage disequilibrium, a type of genetic correlation. Originally the selection priors were designed to model the biological structure of multiple alleles within genetic loci, common in the genetic markers of the time, known as microsatellites. Then SSGS was updated to analyze more contemporary genetic markers known as Single nucleotide polymorphisms (SNPs) and applied to search for SNPs associated with Acute Lymphoblastic Leukemia (2013). Originally designed for family trios to be genotyped (both parents and one affected child) SSGS was next developed for case control data and also updated for computationally intense larger data sets by incorporating the EM algorithm for posterior mode hunting (2017). Most recently, SSGS evolved into TRIO_RVEMVS to jointly identify rare and common variants using trio family data and the EM algorithm (2018-present).