Jurg Ott / 10 August 2021
Rockefeller University, New
York
ott@rockefeller.edu
The Tlinkage programs
described here are an extension of the general
LINKAGE programs for
genetic linkage analysis. The extension consists of allowing for a
disease phenotype to be under the control of two loci. The current
version (20 Apr 2007) corrected a bug that had an effect only with
relatively large numbers of alleles at the marker loci. Note our publication for two-locus (digenic) disease analysis with case-control data.
The
two postulated disease loci are typically unlinked (on two different
chromosomes), each with two alleles (one normal, one being the
disease allele), although there is no such restriction in this
implementation. Below, the two disease loci are implemented as "null
loci". Each of the disease loci may be linked with a marker or
map of markers. Typically, two recombination fractions will be
estimated, that between disease locus 1 and marker 1, and that
between disease locus 2 and marker 2.
If
each of the two disease loci has two alleles and, thus, three
genotypes, there is a total of 9 possible genotype patterns at the
two loci jointly. Which of these confer susceptibility (are
associated with positive penetrance) is often unknown but specific
patterns of interaction between the two disease loci have been
described (Risch 1990). Below, technicalities of implementation of
the TLINKAGE programs are provided. The programs are written in Free
Pascal and have been compiled for
Windows and Linux. Note that the programs will not check
whether program constants are exceeded. In case of problems check the
first few lines of program output – they will tell you the
maximum parameter values allowed in the programs.
If
research papers refer to these programs, the appropriate reference is
Lathrop and Ott (1990) (see below).
---------------------------
Program name Corresponds to
---------------------------
TUNK
UNKNOWN
TMLINK
MLINK
TLINKM
LINKMAP
TILINK
ILINK
---------------------------
The TLINKAGE programs implement a new locus
type #4, called the null type because it may have no associated
phenotype in the pedigree file.
Each
null locus is associated with the same phenotype. The number of null
loci corresponds to the number of loci jointly responsible for a
disease phenotype. That number is indicated as the last entry on the
first line of the datafile (after the program number). In this
version of TLINKAGE, you must have two null loci, or zero null loci.
If two null loci are defined, there are thus two consecutive locus
descriptions for these null loci in the datafile, but they correspond
to only one phenotype in the pedfile. That phenotype must be an
affection status phenotype, and it may be at any position among the
phenotypes (ie, the null loci are not restricted to be the first two
loci in the datafile). NOTE, however, the following restriction:
If the two null loci are loci 1 and 2, their order must be 1 2 and
not 2 1. For example, with loci 3 and 4 being markers, orders 1 3 2 4
and 4 1 3 2 are all right but orders 2 3 1 4 and 4 2 3 1 are not. An
analogous restriction on order applies when the null loci are
numbered other than 1 and 2.
In
the datafile, the description of a null locus contains only the
number of alleles and gene frequencies, e.g.:
4
2 << null
locus, number of alleles
0.9
0.1 << allele
frequencies
except that after
the last null locus, a line specifying the number of liability
classes must be present, followed by one or more tables of
penetrances (as many tables as there are liability classes). Each
such table (see example below) has a single entry for each genotype
combination at the two loci and is arranged as shown in the example
below. The numbers to be entered in each table are only the
penetrances, in this case the 3 × 3 = 9 numbers in the body of
the table (do not enter any of the genotypes such as 1/2 or
2/2).
Repeat this table with
different entries for different liability classes. Remember that only
the last null locus has an associated phenotype in the pedfile. An
example may look as follows:
-----------------------------
First
Second null locus
null
-----------------
locus
1/1 1/2 2/2
-----------------------------
1/1
0 0 0
1/2
0 0.8 0.8
2/2
0 0.8 1
-----------------------------
The
current 2-locus version of LINKAGE allows analysis of autosomal loci
only. Please make sure you do not use these programs for X
chromosomal loci – there is no check in the programs to ensure
that you are adhering to this restriction.
The
different steps for running the TLINKAGE programs are analogous to
those for the LINKAGE programs:
Prepare datafile using the PREPLINK program and modify it according to the rules given above. Copy this file to a file called DATAFILE.DAT
Prepare pedigree input file
Run this pedigree file through the MAKEPED program (the file, TEST.PED, mentioned below is an example of a file resulting from MAKEPED). Copy this file to a file called PEDFILE.DAT
Run the TUNK program
Run the TMLINK, TLINKM, or TILINK program
Instead of the steps indicated above, after preparing input files you may invoke the RUN command, which will copy the relevant infiles and invoke the analysis programs. Input file names must be given on the command line. Example: RUN TESTML.DAT TEST.PED TMLINK.
In releases prior to 13 Feb 1991, two bugs were present in these programs: The programs did not work right when individuals with unknown disease status were present and when more than one liability class was used. Both bugs were fixed by Joseph Terwilliger.
The files TESTML.DAT, TESTLM.DAT, and
TESTIL.DAT are sample datafiles for TMLINK, TLINKM, and TILINK,
respectively. A test pedigree file (after processing by MAKEPED) is
included as TEST.PED. To run the test example for TMLINK, copy the
test files to your current directory in which you want to carry out
the TLINKAGE runs. Then give the following commands in the Windows
command box:
copy
testml.dat datafile.dat
copy
test.ped pedfile.dat
tunk
tmlink
In a Linux terminal you would type::
cp testml.dat datafile.dat
cp test.ped pedfile.dat
./tunk
./tmlink
InWindows, these commands may all be executed
by one single command (RUN batch file), in this case:
run
testml.dat test.ped tmlink
In the regular LINKAGE programs, various
utility programs such as LCP and PREPLINK may be used for the
creation of the input files. These programs have not been adapted for
the TLINKAGE programs.
The
sample files mentioned above refer to the situation of two disease
loci (one trait) and a single marker locus that is linked with
disease locus 2. The other disease locus is assumed somewhere else in
the genome, unlinked with any tested marker locus. The input datafile
for the TMLINK program looks as follows:
3
0 0 5 2 <<< Number of loci, risk locus, sexlinked (if 1),
program code, # null loci
0
0.0 0.0 0 << Mut Locus, Mut Rates (male, female),
Haplotype Frequencies (if 1)
1
2 3
4
2 <<- null locus, number of alleles
0.97000
0.03000
4
2 <<- null locus, number of alleles
0.99000
0.01000
1
<<- number of liability classes
0.000
0.000 0.000
0.000
0.000 0.000
0.000
0.000 1.000
3
4 << Allele numbers, Number of alleles
0.25000
0.25000 0.25000 0.25000
0
0 << Sex Difference, Interference (If 1 or 2)
0.50000
0.00000 << Recombination Values
2
0.10000 0.40000 << Rec. varied, Increment, Finishing value
On
line 1, the program code (5 for TMLINK) is not used by the program
(it was used in previous versions of LINKAGE). So, the user must be
careful that the structure of the datafile is appropriate for the
program he or she is using. The input pedfile (post-Makeped!)
corresponding to the above datafile looks as shown below. It refers
to two parents, one unaffected, the other affected, and two affected
children.
1 1 0 0 3 0 0 1 1
1 1 2 Ped: 1 Per: 1
1 2 0 0 3
0 0 2 0 2 3 3 Ped: 1 Per: 2
1
3 1 2 0 4 4 2 0 2 1 3 Ped: 1 Per: 3
1
4 1 2 0 0 0 2 0 2 1 3 Ped: 1 Per: 4
In
the file holding the pedigree data, the phenotypes are listed in the
order "disease - marker 1 - marker 2" as this is the order
in which these loci are given in the datafile above (input
order).
If the two disease
loci are taken to be linked with a marker locus each, the
corresponding datafile (for TMLINK) may look as follows, where locus
order is "marker 1 - disease 1 - disease 2 - marker 2"
(chromosome order):
4
0 0 5 2 <<< Number of loci, risk locus, sexlinked (if 1),
program code, # null loci
0
0.0 0.0 0 << Mut Locus, Mut Rates (male, female),
Haplotype Frequencies (if 1)
3
1 2 4
4
2 <<- null locus, number of alleles
0.97000
0.03000
4
2 <<- null locus, number of alleles
0.99000
0.01000
1
<<- number of liability classes
0.000
0.000 0.000
0.000
0.000 0.000
0.000
0.000 1.000
3
4 << Allele numbers, Number of alleles (marker 1)
0.25000
0.25000 0.25000 0.25000
3
3 << Allele numbers, Number of alleles (marker 2)
0.25000
0.25000 0.50000
0
0 << Sex Difference, Interference (If 1 or 2)
0.0001
0.50 0.0 << Recombination Values
1
0.10 0.45 << Rec. varied, Increment, Finishing value
The basic version of TLINKAGE described and available for downloading here is relatively slow. Several extensions have been published (Dietter at al 2004; Wu and Shete 2005; Shete and Zhou 2006) and should perform well although I have not personally tried them.
Dietter J, Spiegel A, an Mey D, Pflug HJ, Al-Kateb H, Hoffmann K, Wienker TF, Strauch K (2004) Efficient two-trait-locus linkage analysis through program optimization and parallelization: application to hypercholesterolemia. Eur J Hum Genet 12(7):542-50
Lathrop GM, Ott J (1990) Analysis of complex diseases under oligogenic models and intrafamilial heterogeneity by the LINKAGE programs. Am J Hum Genet 47, A188 (abstr)
Risch N (1990) Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 46, 222-228
Schork NJ, Boehnke M, Terwilliger JD, Ott J (1993) Two trait locus linkage analysis: a powerful strategy for mapping complex genetic traits. Am J Hum Genet 53, 1127-1136
Shete S, Zhou X (2006) TLINKAGE-IMPRINT: a model-based approach to performing two-locus genetic imprinting analysis. Hum Hered 62(3):145-56. Epub 2006 Oct 20
Wu CC, Shete S (2005) Analysis of genes for alcoholism using two-disease-locus models. BMC Genet Dec 30;6 Suppl 1:S149