The Rockefeller University | 1230 York Avenue, New York, NY 10065 – 7 October 2021
Based on our recent paper describing methodology.
Aim: To detect pairs of genotypes from different genes that
significantly discriminate between cases and controls while individual
genotypes have little effect.
Course on Genotype Pattern Mining for Digenic Traits
Date: February 21-25, 2022 (KompoZer)
Rockefeller University, New York; Weiss Building room 305
Frequent Pattern Mining
Various examples of the joint actions of two variants (digenic inheritance) have been published [1;2]. There has been much debate about the definition of epistasis and how to detect it [3-5]. Here, as outlined below, we are simply concerned with diplotypes having different frequencies in cases and controls, where we consider diplotypes of length 2, that is, genotype patterns consisting of two genotypes, each from a different variant. Combined effects of two variants may be assessed in 3 × 3 tables of genotypes, with rows corresponding to genotypes at one variant and columns referring to another variant, and one table referring to cases and another such table to controls. The well-known Multidimensionality Dimension Reduction (MDR) method applies a specific machine-learning approach to such tables [6-10] while our approach, GPM, uses a general-purpose FPM algorithm tweaked into finding genotype patterns with frequencies different in cases and controls.
The first FPM approach, the Apriori algorithm , was developed to handle the ever increasing databases of consumer transactions. It was of interest to learn what consumers tend to buy together so that predictions (so-called association rules) can be made, for example, how likely will a consumer buy wine when they buy bread and cheese? In our implementation of FPM methods, we focus on individual genotype patterns, that is, sets of two genotypes (diplotypes), one each from a different genomic location (possibly from different genes). A 3 × 3 table of genotypes exhibits nine genotype patterns. The specific FPM algorithm used is fpgrowth , and we built code around it so it works in a case-control setting . The whole approach is embedded in a straightforward permutation framework [13;14]. As applied to case-control studies, our implementation develops predictions, based on the presence of specific genotype patterns, whether an individual is likely to be or become a case. These methods are particularly important in situations where single variants show little or no disease association, in which case it would be very difficult or downright impossible by standard statistical methods to detect digenic genotype patterns associated with disease. We previously developed an approach to harness Frequent Pattern Mining for assessing the combined effects on disease of two DNA variants  and recently updated this approach  with a modern FPM engine and implemented it in a computer program, GPM , for Genotype Pattern Mining.
In this course, we will largely focus on detecting combined effects of two DNA variants on disease. The course is being planned for in-person attendance. Should this turn out to be impossible, we may hold the course virtually by Zoom. Details will be forthcoming. If you are interested in attending please send me email so I can put you on our attendance list. Please note that you cannot enter the Rockefeller campus without being vaccinated against Covid-19.
The course will be taught by Profs. Jurg Ott (Rockefeller University, New York), Taesung Park, and Atsuko Okazaki, Juntendo University, Tokyo, Japan, with a guest lecture by Prof. Suzanne Leal, Columbia University, New York. Prof. Park is Professor of Statistics at Seoul National University in Korea and has published in the area of pattern mining.
Costs for the course are $950 for academics and $1,900 for non-academics. An initial deposit of $100 will be required, refundable until December 31, 2021. Payment details will be provided shortly. At this point, no money is due but you may want to reserve your spot on the participant list (first come first served) by sending me email. You will be notified when a deposit is due.
As in previous courses, there will be lectures followed by exercises. Most exercises can be done in Windows, but a small number of programs run only in Linux. We will provide accounts on our Linux servers. Course participants are expected to bring their own Windows laptops, perhaps with dual-boot installed so the laptop can be booted up in Windows or Linux (Kubuntu preferred). If you want to prepare for the course, a recently published review on FPM methods is very useful .