Interactions for pairs of DNA variants

Jurg Ott, Rockefeller University, New York
10 Feb 2020

Introduction

These analysis programs refer to statistical interaction between a pair of variants, where one variant is from one gene and the other variant from another gene. There are a number of genes, and variants within a gene, and we want to evaluate all possible pairs of genes and pairs of variants under the given restrictions.

In our publication [1], there are 5 genes and, thus, 10 gene pairs. Each gene had a varying number of SNPs leading to a total of 840,899 pairs of SNPs. Here, we work with a sample dataset. Originally, this was a case-control dataset for schizophrenia [2] with 255,053 genome-wide SNPs. We used the same genes as in our publication [1] even though these genes refer to hyperlipidemia, but we simply wanted to prepare a toy dataset. Here is the resulting data layout, with a total number of 28 SNPs:

Gene
chrom
range of positions in toy dataset
number of SNPs
LDLRAP1
1
25,865,531 - 25,893,371
3
PCSK9
1
55,499,783 - 55,537,080
5
APOB
2
21,208,207 - 21,274,333
5
LDLR
19
11,195,179 - 11,237,675
11
APOE
19
45,403,173 - 45,418,176
4

To run the toy dataset with our analysis program, GenePairsPerm, we need to prepare a parameter file, genepairs.param, which will be read by the program. An example parameter file is as follows:
schizoT2genes	Input file names, schizoT2genes.tped and schizoT2genes.tfam
LDLRAP1 1 3
PCSK9 4 8
APOB 9 13
LDLR 14 24
APOE 25 28 need blank line below

1 Test number, 1 = trend test, 2 = geno test
10000 Number of permutations
0 min. chisquare for output
0.5 empty, value of empty cells in contingency tables, might be 0
To run the GenePairsPerm program you may double-click on its name or, preferably, run it in a Windows or Linux command window (terminal). Upon program termination, you will see various files as follows:

GPair1Test1.out ... GPair10Test1.out: For a given gene pair, these files list results for each SNP pair analyzed, where Test1 refers to the trend test and Test2 would refer to the genotype test.

gpairsLimitItest1.txt: This file contains all null data but only for chisqD exceeding the minimum chisquare given in the parameter file

permgenepairsItest1.log: This log file presents information on the current run.

permgenepairsItest1.log.null: This file contains all null data; it may be used to establish significance limits for chi-square.

seed.txt contains a seed for the random permutation analysis and is updated after every run.


References

[1] Okazaki A, et al: Main and interaction effects between variants underlying hyperlipidemia (in preparation)

[2] Ott J, Macciardi F, Shen Y, Carta MG, Murru A, Triunfo R, Robledo R, Rinaldi A, Contu L, Siniscalco M: Pilot study on schizophrenia in Sardinia. Hum Hered 2010;70:92-96