Jurg Ott / 4 Feb 2013
Rockefeller University New York and
Institute of Psychology, Chinese Academy of Sciences, Beijing
Disease-Associated Genotype Patterns
document refers to our publication (Long et al 2009) on
genotype patterns (diplotypes) and testing for frequency
differencies between case and control individuals. Here is a brief
description on how to use our software called randompat,
RP. It is currently
available for Windows PCs, but the source code (included)
may be compiled for Linux PCs with the Free Pascal compiler. A
Ukrainian (Belorussion?) translation of this document is available here.
the software package
and in a suitable
all files. One of these files is a sample dataset, ZeeData.txt.
with 88 SNPs and 779 individuals (cases plus controls). While this
dataset is small, the RP program is preferably run on genome-wide
case-control datasets (SNP markers).
- Open a command window ("DOS box"), cmd. Change
until you are in the folder containing the randompat files.
- The program requires two input files, a
parameter file (for
and a datafile (for example, ZeeData.txt).
- To run the program with these two input files,
command, randompat Zee.par.
You should then see various intermediate program output and, in the
end, a note saying that the final output has been stored in the file, Zee.out.
- In the current program design, the number m
for which genotype patterns can be constructed, is limited to m
mentioned above, there are two input files, a parameter file and a
datafile. The sample parameter file provides a brief
description on how to set up this file. The datafile must have the
structure of a sumstat
file and may or may not contain chromosomal information for the SNPs.
Briefly, rows in the datafile correspond to SNPs and columns represent
individuals, while the body of the file contains genotype codes, for
example, 1 = AA, 2 = AB, 3 = BB, 0 = unknown. The last row contains
indicator codes for disease status, for example, 1 = control, 2 = case
(affected). The last three columns are optional and may specify
chromosome number, position, and marker identifier.
Your data may be in plink
format, in which case you may use the p2s program to convert from plink to sumstat format..
Interpretation of Output
As described in our publication (Long et al 2009), the randompat program
SNPs on the basis of their individual significance for association.
This can be done with the allele test (based on 2x2 tables of alleles)
or the genotype test. Naturally, the order of SNPs picked is generally
different depending on which test statistic is used to pick SNPs. The
of SNPs for which genotype patterns are formed is an input quantity.
Two parameter files are included in this package, RPparamZee2.txt and
They differ in the test type used to pick SNPs. The sample dataset
provided here was previously described (Hoh et al 2001).
Running the ZeeData.txt
file produces the following output:
Program RANDOMPAT version 04 Feb 2013
Input file = ZeeData.txt
Current time: 7 Feb 2013 10:37:07h
Pattern is rare when exp #obs < 1.00 in cases or controls
Number of permutations = 5000
SNPs picked by allele test. Lambda = 1.0000
Input file = ZeeData.txt
Number of individuals = 779
Number of SNPs = 88
=== Observed data, best 2 SNPs ===
p-value chr position name
1 75 6.4891
1.0854E-002 17 113
2 44 4.7591
2.9144E-002 7 236
779 of 779 individuals showed complete patterns
Observed table of genotype patterns and odds ratios (zero cell entries replaced by 0.5)
Controls Cases Pattern OR 95% CI
91 66 3
101 71 2
73 46 2
104 100 3
21 38 3
29 13 2
12 3 1
13 10 1
2 4 1
sum 440 348 Total = 788
p = 3.2162E-003 for table of genotype patterns
=== Randomized adjusted p-values ===
Test SNP 1 2842/5000 = 5.6840E-001 = 0.5684
Test SNP 2 3493/5000 = 6.9860E-001 = 0.6986
Table 697/5000 = 1.3940E-001 = 0.1394
Initial seed = 72201310377753
Current time: 7 Feb 2013 10:37:13h
For this run, the program picked the two SNPs with smallest p-values
in the allele association test (chi-square with 1 df), then formed
patterens and listed all patterns with an expected number of
observations >1 in cases and controls each. It turns out that
all possible 3 x 3 = 9 patterns occur in this dataset. The pattern with
strongest disease association, judged by the odds ratio, OR = 2.97, is
1-3 (AA-BB) at test SNPs 1 and 2, respectively. However, due to the
small number of individuals showing this pattern, the confidence
interval is very wide and includes OR = 1. On the other hand, the
pattern showing the strongest association with absence of disease is
(1/OR = 3.71). The final significance level of these results, adjusted for testing multiple SNPs, is p = 0.1394.
Wille, A., and Ott, J. 2001. Trimming, weighting, and grouping SNPs in
case-control association studies. Genome
Res 11(12): 2115-2119.
Zhang, Q., and Ott, J. 2009. Detecting disease-associated genotype
patterns. BMC Bioinformatics 10(Suppl 1): S75.