Jurg Ott / 20 March 2021
ott@rockefeller.edu
1.
INTRODUCTION
2.
WORKING WITH SEVERAL LOCI
3.
PENETRANCES
4.
NON-AUTOSOMAL LINKAGE
5.
INPUT FILE
5.1.
Problem description
5.2.
Symbols
5.3. Number
of alleles per locus
5.4.
Number of phenotypes per locus
5.5.
Locus type
5.6.
Output options
5.7.
(optional) Recombination fractions
5.8.
Locus description
5.9.
Gene frequencies
5.10.
Mode of inheritance
5.11.
Pedigree information
5.12.
Pedigree data
5.13.
(optional) Identification of "doubled" individuals
5.14. (optional)
Haplotype frequencies
5.15.
(optional) Recombination fractions
5.16.
Direction of further analysis
6.
PROGRAM CONSTANTS
7.
COMPLEX PEDIGREES
8.
MUTATION
9.
QUANTITATIVE PHENOTYPES
10.
AGE-DEPENDENT PENETRANCE
10.1
Age classes with different penetrances
10.2
Age-of-onset distributions
UNAFFECTED
INDIVIDUALS
AFFECTED
INDIVIDUALS
UNKNOWN
DISEASE STATUS
10.3
Lognormal distribution of age of onset
10.4
Straight-line curves for age of onset (locus type 3)
11.
CALCULATION OF GENETIC RISKS
12.
LIKELIHOOD AT A SINGLE LOCUS
13.
MENDELIAN INCONSISTENCIES
14.
EXAMPLES
The LIPED program (for
LIkelihoods
in PEDigrees)
estimates the recombination fraction by calculating pedigree
likelihoods for various assumed values of the recombination fraction.
The algorithm is based on Elston and Stewart (1971) with some
extensions. Its first application (to the large Alaska pedigree,
Schrott et al. 1972) resulted in mild evidence for linkage of
familial hypercholesterolemia to the C3 polymorphism (Ott et al
1974), which was later confirmed by various authors. This disease
locus (LDLR, previously FH and FHC) is now known to be located on
chromosome 19p13.3.
The program contained one error (in the likelihood calculation for
quantitative traits), which was pointed out to me by Dr. Robert
Elston. Details on the theory underlying LIPED and its likelihood
calculations may be found in Thompson (2011).
This manual
describes the LIPED computer program for
genetic linkage analysis. Only two loci can be handled at a time, for
example, a disease locus and a marker locus. Originally written in
Fortran IV (Ott 1974), LIPED requires input in fixed format (numbers
must be provided in a fixed number of spaces or columns). The code is
essentially as originally written, with some additions such as proper
treatment of age of onset data. This manual describes a slightly
updated version of LIPED
suitable for compiling in Fortran 77 (GNU g77 in Windows; gfortran in
Linux by typing gfortran
-static-libgfortran liped.for followed
by cp
a.out liped).
Files
included:
liped.for-- Source code of LIPED program
liped .exe, lipedL -- Executable files for Windows, Linux
liped5.dat-- Input file holding 5 examples. It is highly recommended to carefully look at the examples provided!
EX1.DAT -- The first of the sample files in liped5.dat.
agedep.dat -- Example input file with age-dependent penetrance
LIPED.OUT -- Output file resulting from running LIPED5.DAT.
To initiate the program, type LIPED. It will assume that input is furnished in a file called LIPED.DAT, and will write output to LIPED.OUT. To run the 5 examples, copy LIPED5.DAT to LIPED.DAT and then type LIPED. The appropriate literature reference is Ott (1974) or Ott (1976).
Two kinds of loci are
distinguished in the LIPED program: main locus (internal number zero)
and marker loci (numbered from 1 to NMARK). Lod scores can be
computed for any combination main locus vs. marker locus. With input
item 16 (see section 5, Input File, below), any one of the marker
loci can be declared to represent a new main locus so that lod scores
may also be computed among marker loci. If more than one marker locus
is present, the program creates two temporary disk files that will be
deleted on program termination. Note the following restriction: with
a single marker locus, any number of pedigrees may be analyzed in a
single run. However, with more than one marker locus, only a single
pedigree may be analyzed in a run. One way to overcome this
restriction is as follows. If several independent families are
presented to the program as one single pedigree, LIPED will recognize
this and carry out the proper calculations, the resulting lod score
being the sum over the individual families; however, individual lods
for the families will not be recognizable by the user. Analyzing
several independent families as a single large pedigree requires a
substantial amount of memory. If an error occurs (program constants
MNP or MLIST too small), you may have to analyze the families in the
usual manner as separate pedigrees.
Generally, to analyze
several pedigrees with phenotypes at more than two loci, one might
proceed as follows. First, one decides on the two loci to be
analyzed. If one of these is the main locus, then the comparison main
locus vs. marker is identified on one line of input item 9.
Otherwise, all numbers in input item 9 are set equal to 0, and a
comparison among markers is defined on a new line of input item 9
that appears after a line containing 5000 which immediately follows
the pedigree data. Thus far, one has decided on the 2 loci to be
compared. Now, one must tell LIPED where on the line (in which
columns) to read the phenotypes of these loci (see below).
If
you interrupt LIPED while it is still running and if more than one
marker locus has been defined, scratch files named "ZZ..."
or "for..." will remain on the disk. These files would be
deleted when the program terminates normally. You may simply delete
them.
Penetrance is defined as
the probability of occurrence of a particular phenotype given the
presence of a certain genotype. Accordingly, with respect to a
disease, penetrance is the probability of being affected given a
certain genotype.
Penetrances are needed to describe the
relation between genotypes and phenotypes. In the table below, only
full penetrance (values 0 or 1 only) is is taken to occur. Assume a
locus with 2 alleles, a and b. When this is a disease
locus (dominant or recessive), let a be the disease allele and
consider the phenotypes AFF for affected and NA for unaffected.
|
Dominant, a > b |
Recessive, a < b |
Codominant |
Only one allele visible |
|||||
Genotype |
AFF |
NA |
AFF |
NA |
aa |
ab |
bb |
a |
b |
a a |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
0 |
a b |
1 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
b b |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
0 |
1 |
To code for X-linked
inheritance in LIPED, tables such as the one above are used to
represent the relation between genotypes and phenotypes. They apply
directly for females. For males, for example, the genotype a/a
is interpreted as a/y (hemizygote), and all lines
corresponding to heterozygote genotypes are disregarded. Therefore,
it is not necessary to distinguish male and female phenotypes. For
example, aa can serve as a phenotype for either sex.
To
analyze loci on the Y-chromosome, it is easiest to code Y-linkage as
a special case of autosomal linkage, but precautions must be taken.
For details, see Ott (1986); note that the methods described in that
reference apply only to full penetrance.
For most input quantities, their location (column numbers) on an input line is fixed and must strictly be adhered to. The input file must consist of the following lines here numbered 5.1 through 5.16. The original LIPED version required input of Fortran format statements but these are omitted here.
Col.1-2
NMARK,
number of marker loci in addition to the main locus. LIPED stops when
NMARK<1 is encountered.
Col. 4
= 0 LIPED prints
results to screen and to file, liped.out
= 1 and 2 LIPED
prints internal information not generally interpretable
= 3
LIPED only writes to output file, no screen output
Col.
5
= 0 usual setting
= 1 to prevent underflows as much as
possible. This should be used only when an underflow has occurred
since underflows are unlikely with double precision calculations.
Col.
6
= 0 for autosomal loci
= 1 for loci on the
X-chromosome
Col. 7
= 0 if gene frequencies (rather
than haplotype frequencies) will be read
= 1 if haplotype
frequencies are to be read. Note that in this case, dummy gene
frequencies must still be provided.
Col. 8-20 Mutation rate at
main locus (see MUTATION below).
Col.21-80 Text
In columns |
Provide symbol for |
9-12 |
No parent (e.g., 0) |
13-16 |
Male sex. The first different sex symbol encountered in item 14 will be considered the symbol for female sex. |
17-20 |
Unknown phenotype at main locus (e.g., 0) |
21-24, etc. |
Unknown phenotype at marker locus 1, etc. |
Note that alleles,
phenotypes (also genotypes) are all alphanumeric even through alleles
may be numbered 1, 2, ... For example, genotype symbols might be bb12
or b1b2
(4 spaces for each of these
two symbols, b = blank).
Col. 1-2 Number of alleles
at main locus (locus 0)
Col. 3-4 Number of alleles at marker
locus 1, etc.
Col. 1-2 Number of
phenotypes at main locus
Col. 3-4 Number of phenotypes at marker
locus 1, etc.
Note: For locus types 1, 2,
and 3, the program will not read phenotypes but certain parameters,
for example, mean and variance for a quantitative trait. The number
of these parameters is fixed for each locus type. Here, in line 5.7,
you may specify 1 for number of “phenotypes” (parameters)
as the program will not use any value given here, but a value has to
be provided anyway.
5.5.
Locus type
Col. 1-2 locus type for
main locus
Col. 3-4 locus type for marker 1, etc., where the
following values for locus type apply:
-1 |
Locus with discrete phenotypes and penetrances ranging from 0 through 1. |
0 |
Locus with discrete phenotypes and penetrances of 0 and 1 only; runs faster than locus type = -1. |
1 |
Quantitative phenotypes following conditional normal distributions (see section 9). The number of parameters to be entered for “number of phenotypes” is 2: mean and standard deviation. |
2 |
Locus with age-dependent penetrances following a lognormal distribution (see section 10) |
3 |
Locus with straight-line age-dependent penetrances. |
Col. 2 Output option for
main locus versus marker locus 1
Col. 4 Output option for
main locus versus marker locus 2, etc., where the option values have
the following effect (below, rm
and rf
= male and female
recombination fractions, respectively):
-1 to do checks only and compute likelihood at rm = rf = 0.5
0 when no computation is
desired for main locus versus marker locus i
1
if the values of the recombination fractions are to be read as shown
in 5.15 below. In this case, one set (input line) of values rm
and rf
need to be provided after each
pedigree, for each likelihood to be computed. Note that after line
5.16, additional lines of input of this type may be given to allow
for comparisons among marker loci.
2 to compute lods at rm
= rf
= 0.0001, 0.001, 0.05, 0.10,
0.20, 0.30, 0.40 (7 values, not counting 0.50)
3 to
compute lods at rm
= rf
= 0.0001, 0.001, 0.05, 0.10,
0.15, 0.20, ... (11 values, not counting 0.50)
4-8
to compute lods at values of rm
and rf
shown below
9 to
read recombination fraction values from input line 5.7 once for all
pedigrees and to sum the lod scores over pedigrees. Allowed only with
one marker locus.
10 to compute lods at rm
= 0.0001, 0.001, ... (as with
option 2) whereas rf
= 2rm(1
– rm)
[Ott, 1991, equation (8.7), p. 175], that is, the female map distance
(Haldane) is assumed to be twice the male map distance.
11
to compute lods at rm
= 0.0001, ... (as with option
3) whereas rf
= 2rm(1
– rm)
Below is a graphic
representation of the values of rm and rf at
which lods will be computed depending on the option value; instead of
0 as indicated below, recombination fractions are set to
0.0001.
Option = 4 (22 points)
rm
.50
| x x x x x x x x
.40 | x x
.30 | x x
.20 | x x
.10
| x x
.05 | x x
.001| x x
0 | x
x
+-----------------------------
0 .001 .05 .1 .2 .3 .4 .5
rf
Option
= 5 (34 points)
Analogous to above but with 34 points, recombination fractions at r-values
0
.001 .05 .1 .15 .2 .25 .3 .35 .4 .45 .5
Option
= 6 (9 points) Option = 7 (16 points)
rm
.5
| x x x .50 | x x x x
.3 | x x x .35 | x x x x
.1 | x x x
.20 | x x x x
+----------------- .05 | x x x x
0.1 0.3 0.5
rf
+----------------------
0.05
0.20 0.35 0.50 rf
Option
= 8 (64 points)
rm
.500
| x x x x x x x x
.400 | x x x x x x x x
.300 | x x x x x x
x x
.200 | x x x x x x x x
.100 | x x x x x x x x
.050
| x x x x x x x x
.001 | x x x x x x x x
.0001| x x x x x x
x x
+-------------------------
.0001 .001 .05 .1 .2 .3 .4
.5 rf
Option
8 is useful for approximate factorization of joint male and female
lods into sex-specific lods.
This text applies with option = 9 on line 5.6. Each pedigree will be analyzed at the rm,rf-values provided here.
Col. 1- 5 Value for male
recombination fraction
Col. 6-10 Value for female
recombination fraction. These two values are each read in 5 spaces
with an implied decimal point between space 1 and 2. For example, an
input line for rm
= 0.1 and rf
= 0.45 may look like this:
b1000b4500,
or bbb.1bb.45,
where b
stands for blank (space). For
each likelihood to be computed, one such line of values must be
provided. To terminate the set of rm,rf-values,
enter 60000
as the last line. Maximum
number of lines including the terminating line is equal to MT (see
section 6).
The following lines, 5.8 through 5.10, must be
repeated for each locus in the order main locus, marker locus 1,
marker locus 2, etc.
The following items are
expected on this line, each occupying 4 spaces:
-
name of locus
- symbol for allele 1
- symbol for allele 2,
etc.
- symbol for phenotype 1
- symbol for phenotype 2,
etc.
If col. 7 on line 1 has a value of 1 (haplotype frequencies), the values provided below are irrelevant, however some (dummy) values still need to be provided. An implied decimal point is between the first and second set of each set of 8 columns.
Col. 1- 8 Population frequency of allele
1
Col. 9-16 Population frequency of allele 2, etc.
As many lines specified below are expected as there are genotypes at the given locus. In the case of X-linkage, this refers to the female genotypes. For males, with X-linkage, a genotype A/A is interpreted as A/y while heterozygote genotypes such as A/a are disregarded. On each line (for each genotype), the following values are expected, each in 4 spaces. For the penetrance values, an implied decimal point is to the right of the 4 spaces. For example, bbb1 will be read as 1.0, and b.95 will be read as 0.95, where b stands for a blank space.
-
symbol for first allele
-
symbol for second allele; these two define a genotype
-
probability of observing phenotype 1 under the given genotype
-
probability of observing phenotype 2 under the given genotype,
etc.
The above applies to locus types of 0
or -1 (line 5.5). For quantitative phenotypes (locus type = 1), four
items are expected for each genotype, two alleles (defining the
genotype) plus a mean and a standard deviation, each within 4 spaces.
For age-dependent penetrances (locus types 2 or 3), each line must
contain two allele symbols plus six parameters, ie. three parameters
for females and three parameters for males (the three sex-specific
parameters are defined in section 10).
The following input
lines, 5.11 through no. 5.14, are to be repeated for each pedigree
except that input lines 5.14 are needed only once, after the first
pedigree.
Col. 1- 4 Number of individuals in
pedigree. Count a "doubled" individual as 2 persons
(see
section 7 on complex pedigrees)
Col. 5- 8 Number of (pairs of)
doubled individuals; = 0 for simple pedigrees
Col. 9-68 optional
comments
For each individual, the
following items must be provided, one line per
individual:
max.length
-
symbol identifying the individual (ID) 4
- ID for one of the
parents *) 4
- ID for the other of the parents *) 4
-
symbol for individual's sex 4
- phenotype at main locus 8
-
phenotype at marker locus 1, etc. 8
*) Note
that each individual must either have two parents in the pedigree, or
both parents' ID may be replaced by the symbol for no parent. If you
have information on only one parent, you must provide an ID for the
other parent who will then have unknown phenotypes.
The above limits seem rather restrictive. However, changing them in the source code is tricky as the code had been written to save as much memory as possible and various shortcuts had been implemented to achieve this. So, rather than changing source code it will be simpler to use the Idnum program to replace IDs by consecutive numbers.
Applies only to complex
pedigrees (number greater than zero in col. 5-8, section 5.11). For
simple pedigrees, no input item 5.14a is expected. To be read with 4
spaces each as any ID. For each pair of doubled individuals, the
following two items are required:
- ID of first member of pair
of doubled individuals
- ID of second member of pair of doubled
individuals
This information is needed
only once, after the first pedigree, and only with a value of 1 in
col. 7, section 5.1. The haplotype frequencies are read with 8 spaces
as the allele frequencies. These values must be given in the
following order. Consider a main locus and a marker locus where n
is the number of alleles at the marker locus. Then, the order of the
haplotype corresponding to the i-th allele at the main locus
and the j-th allele at the marker locus is given by n(i
– 1) + j. As an example with 2 alleles at the main locus
and 3 alleles at the marker locus, the haplotypes are numbered as
follows:
j=1
j=2 j=3
------------------
i=1 1 2 3
i=2 4 5 6
Note:
there is no check that the haplotype frequencies sum to 1.
This information is needed
only with output option = 1 (section 5.6). Then, as many likelihood
calculations will be carried out as sets of recombinations (lines)
are provided here. Each line is read with 5 spaces (cf. section
5.7):
Col. 1- 5 value for male recombination
fraction
Col. 6-10 value for female recombination
fraction.
As the terminating line, enter 60000. For output
options > 1, multiple sets of recombination fractions,
separated by 60000, must be entered.
Col. 1-4 Value to determine
what action to take next.
= 5000 if new output options
(section 5.6) are to be read (allowed only if no more than one
pedigree is present in this problem) thus allowing for linkage
analyses between marker loci. The new lines are expected immediately
after the 5000 value. On each line, there must be as many values as
there are marker loci. The program will scan these values and the
first marker locus with a value different from zero will be
considered the new main locus. From then on, on that line, the output
optionvalues have the same meaning as in section 5.6.
Multiple
lines of output options may follow a single 5000 value. For example,
consider a total of 5 loci, that is, one main locus (locus no. 0) and
4 marker loci (numbered 1 through 4):
_
_2_2_2_2
← original output options
_
5000
_1_2_2_2
|
_0_1_2_2 | extra lines of output options
_0_0_1_3
|
9000
Each of these extra lines of
output options has one field for each of the original marker loci.
For instance, the following extra line, _0_1_2_2,
would mean: "now take marker locus 2 as the new main locus, and
pair it with marker locus 3 (using option 2), then with marker locus
4 (using option 2)", which could be extended to all marker loci.
Note that whenever option 1 is specified, a corresponding set of
recombination fractions (section 5.15) is expected immediately after
the line containing option 1. To terminate the set of extra lines,
enter a line with 8000 or 9000 (same meaning as below) in col.
1-4.
= 7000 if a new pedigree is to be read. Then, new
lines of pedigree information (section 5.11) etc. are expected. Note
that this is allowed only when no more than one marker locus is
present.
= 8000 if a new problem is to be analyzed. Then,
new lines of input are expected from the beginning (section 5.1).
=
9000 to terminate this run.
Constants
are defined for dimensioning arrays and defining lengths of character
variables. Example values are as follows.
MLIST
= 50 headsibs (nuclear families)
MMARK = 30 marker loci in
addition to the main locus
MNAL = 5 alleles at any locus
MNDI
= 5 pairs of doubled individuals
MNFE = 21 phenotypes at any
locus
MNP = 50 genotype vectors stored in memory
MNPT = 500
individuals in a pedigree
MT = 20 pairs of theta values after
item 5.9, including the terminating 6000 line.
IDLEN
= 20 characters in individual IDs
To change
these, simply adjust the values of the constants in the parameter
statements and recompile the program.
The following
information is for programmers only and is not needed for general
program use. With the abbreviations,
KK
= MNAL*MNAL
KK1 = MNAL*(MNAL+1)/2,
KK2 = KK*(KK+1)/2,
array
dimensions are given as follows (arrays not listed below have fixed
dimensions):
FENO1(MNPT,KK1)
IAD(KK,KK) LIST(MLIST) PHI(KK2)
FENO2(MNPT,KK1) IAU(MMARK)
NAL(MMARK) PHIS(KK)
GEN(MNAL) ID(MNPT) NC(MNPT)
PHPROB(MNFE)
GENO(MNP,KK2) IGENO(MNP) NF(MMARK)
THV1(MT)
GF1(MNAL) ISEX(MNPT) NM(MNPT) THV2(MT)
GF2(MNAL)
KONT(MMARK) NS(MNPT) THVS(MT)
GVX1(MNFE,KK1) LDI(2,MNDI)
PHE1(MNFE) UNK(MMARK)
GVX2(MNFE,KK1) LGC(MNDI)
PHE2(MNFE)
HOLD(MNDI,KK2) LGENO(MNPT) PHEPED(MMARK)
Note:
MNFE must have a value of at least 8.
In
a so-called simple pedigree, tracing the inheritance of genes by
going backwards through the generations (upwards in the pedigree)
always leads to the same pair of founder parents. Pedigrees for which
this is not the case are called complex pedigrees. In particular,
pedigrees with the following features are examples of complex
pedigrees: (1) both members of a pair of parents have themselves
parents in the pedigree; (2) consanguinity loop, i.e., parents are
related; (3) marriage loop, e.g., two brothers are married to two
sisters, or an individual has been married twice, the two spouses
being related with each other but not with the individual who married
twice. An example of the last kind is pedigree 1, below, where []
refers to a female and () refers to a male:
Pedigree 1:
Marriage loop
[1]--.--(2)
|
.---------------.
|
|
(3)--.--[4]--.--(5)
| |
(6) (7)
Without
special measures, LIPED analyzes only simple pedigrees. The analysis
of complex pedigrees is possible by manipulating the pedigree in a
certain way so that it "appears" to LIPED as a simple
pedigree. This manipulation consists of replacing a particular
individual by two individuals as shown in the example below (pedigree
2), and by identifying the two individuals actually corresponding to
the same individual in the original pedigree (input item 5.14a). Note
that such a "doubling" of individuals is necessary for
breaking up loops, and also whenever more than one of the two parents
has parents in the pedigree.
When an individual has been
"doubled", the number of individuals in the pedigree must
be increased by 1 thus counting a pair of doubled individuals as two
persons. Up to MNDI individuals may be "doubled" so that,
e.g., multiple consanguineous loops can be accommodated. At the end
of a pedigree with doubled individuals, the two individuals
corresponding to the one original person must be identified (input
item 5.14a).
Pedigree 2: Example of a pedigree,
manipulated for processing by LIPED
Original
pedigree Modified pedigree, acceptable to
LIPED
[1.1]--.--(1.2)--.--[1.3]
[1.1]--.------(1.2)----.--[1.3]
| | | |
| | |
|
(2.1)--.--[2.2] (2.1a) (2.1b)-.-[2.2]
| |
| |
[3.1]
[3.1]
Here are some important notes
regarding "doubling" of individuals:
Individuals can only be doubled, not tripled. For example, an individual who is an offspring and is also married twice with children from each marriage cannot be manipulated as described.
Computation time generally increases drastically with the number of pairs of doubled individuals. When one has a choice among several candidates to be doubled, it is recommended to take an individual with as much phenotypic information as possible in order to exclude as many genotypes as possible. For example, in pedigree 1, above, any one of individuals 3, 4 or 5 could be chosen for doubling. In the presence of one doubled individual, the QLIK routine for calculating the likelihood is executed for each genotype of that individual, except for those genotypes known to be incompatible with the individual's phenotypes or the phenotypes of his or her offspring. Analogously, for several pairs of doubled individuals, QLIK is called a maximum of m times, where m is calculated as follows. Let n be the number of haplotypes at the two loci jointly, i.e., n is the product of the number of alleles at the two loci under consideration. The number of joint genotypes is then given by g = n(n + 1)/2, so that m = gNDI, where NDI is the number of pairs of doubled individuals. For example, with 2 and 3 alleles at the respective two loci, one has n = 6 haplotypes and g = 21 genotypes. With NDI = 3 pairs of doubled individuals, QLIK may be called up to m = 9261 times. The present version of PC-LIPED counts these calls and displays them on the screen.
Whenever the genotype of an individual can unequivocally be inferred with certainty (including phase), such an individual may be represented as multiple individuals in the pedigree if necessary, and this individual must not be counted as a so-called doubled individual (treat it as separate multiple individuals). The likelihood will then not be correct but the lod score will be unaffected by such a manipulation. For example, if an individual is known to be A/A at locus 1 and B/b at locus 2, the joint genotype is known to be AB/Ab. Note that for doubly heterozygous individuals it will not generally be possible to make use of this feature even though phase may be known, as there is no easy way to identify phases in LIPED on the basis of phenotypes.
Mutation is allowed for at
the current main locus only and is assumed to occur with a constant
rate from any of the alleles no. 2, 3,... towards the first allele,
with the mutation rate being specified in col. 8-20 of input line
5.1. Backmutation is assumed to be negligible. Also, in the
computation of the likelihood, it is assumed that a mutation occurs
only in one or the other of two parents, but not simultaneously in
both parents.
WARNING: when processing a disease locus
with mutation and subsequently, in the same run, testing marker
versus marker, then the mutation rate keeps applying to the current
main locus unless a new run is carried out with the mutation rate set
equal to zero.
Note that only simple pedigrees can be
processed by LIPED unless special steps are taken to code for a
complex pedigree (see section 7 above).
At
any locus, quantitative rather than qualitative phenotypes can be
read. For a locus with quantitative phenotypes, the following special
rules must be observed.
Input
item
Explanation
---------------------------------------------------------
3
Read one mean and one standard deviation
for each genotype, 4 spaces each as for any phenotype
7
Set the number of phenotypes equal to 2. The program
will
correct wrong numbers.
8
Set the locus type equal to 1.
10
Two phenotype symbols will be read by the program but
they
are not used in any way.
12
After the symbol for the second allele, two items are expected, the
mean and the standard deviation of the phenotype distribution given
the particular genotype specified by the two alleles.
14
The phenotype values must not occupy more than 4 spaces
each.
--------------------------------------------------------
Age-dependent penetrance
refers to the fact that a carrier of a disease gene may not exhibit
the disease at birth but only later in life, that is, the penetrance
(= probability of showing a certain phenotype given a genotype)
depends on the age of an individual.
The easiest way of
implementing age-dependant penetrance is by forming age classes and
having different penetrances in these classes. For affecteds,
irrespective of their age, only one class is required (provided that
no phenocopies are allowed for), but unaffecteds must be grouped into
age classes. For example, in a given disease, if all gene carriers
beyond 10 years of age have expressed the disease, a suitable
assumption is that penetrance rises linearly from 0 at age 0 to 100%
at age 10, as pictured below:
Penetrance
|
1|
-------------
| /
| /
| /
|/
0
+------------------age
0 10 20
One
might then form 6 classes as follows, where AFF stands for the
'affected' phenotype, and NA1, NA2, etc. stands for unaffected in age
class 1, 2, etc.; NA5 denotes unaffected individuals older than 10
years who are taken to be known not to carry the disease gene. The
disease is assumed dominant, the disease being T. Note that the
probability of being unaffected is 1 minus the probability of being
affected.
---------------------------------
Phenotypes
------------------------
Genotype
AFF NA1 NA2 NA3 NA4 NA5
---------------------------------
T
T 1 .88 .63 .38 .13 0
T t 1 .88 .63 .38 .13 0
t t 0 1 1 1 1
1
---------------------------------
Rather than forming age
classes, the distribution of the age at disease onset may be assumed
to follow a certain distribution. In LIPED, two such distributions
are implemented, the lognormal and a straight-line distribution.
Below, F denotes the distribution (cumulative sum) of age at onset
whereas f denotes the corresponding density (histogram).
Whatever
the age of onset distribution used, to represent in a single number
the various pieces of phenotypic information (age at onset, present
age, affection status) at a disease locus, the following conventions
must be observed in LIPED. In principle, the phenotype to be provided
in the input to LIPED is an individual's present age (or age last
seen) or the age at onset, taken with a minus sign for unaffecteds,
and taken to be positive for affecteds. Present age and age at onset
are distinguished as outlined, below.
The phenotype is an individual's present age (or age last seen), taken with a minus sign (the sign distinguishes affecteds from unaffecteds). Example: unaffected, present age is 56, phenotype given in program is -56. If present age is unknown, a guess must be used, for example, based on ages of sibs or parents.
If actual age at disease
onset is unknown, the phenotype is a person's present age. Example:
56. If present age is unknown, a guess must be used, based on ages of
relatives.
If age at disease onset is known, it is entered
into LIPED by the following coding scheme: The phenotype to be
provided is obtained by adding 500 to the age at onset. Example: age
at disease onset is 23 years; phenotype to be provided is 523.
NOTE:
Actual age at disease onset is relevant only when disease can occur
under different genotypes with different penetrances. If this is not
so (it usually is not), then present age may be given for all
affecteds.
The phenotype is given as 0. Alternatively, on input line 5.5, one may define any other code for unknown phenotype, for example, blank (not recommended).
It
is often meaningful to assume that age of onset is lognormally
distributed, that is, that LN(age of onset) follows a normal
distribution where LN denotes natural logarithm (a simpler assumption
for age-of-onset distribution is covered in section 10.4, below).
Mean and standard deviation for the lognormal and normal
distributions are defined and connected with each other as
follows:
Age
(orig. values) LN(age)
(lognormal distr.) (normal
distr.)
-----------------------------------------------
mean
μ u
std. dev. σ
s
-----------------------------------------------
For
ease of presentation, define m = exp(u) and w = exp(s2).
Then one has:
μ = m √w
σ = m √[w(w
– 1)] = μ √(w – 1)
u = 2 LN(μ) –
0.5 LN(μ2 + σ2)
s = √[LN(μ2
+ σ2) – 2 LN(μ)] = √[2{LN(μ) –
u}],
where LN denotes natural logarithm. Also, with given
mean, μ, of the raw data, and standard deviation, s, of the
transformed data, one obtains the mean of the transformed data as u =
LN(μ) – 0.5s2.
Some example values are
given in the following
table:
----------------------------------------
Original
scale LN scale (normal distr.)
-------------
-------------------------
μ σ u
s
----------------------------------------
20 5 2.97
0.25
20 10 2.88 0.47
20 15 2.77 0.67
40 5 3.68
0.12
40 10 3.66 0.25
40 15 3.62
0.36
----------------------------------------
The
LOGNORM program (included) transforms values u and s into the
corresponding values of μ and σ, and vice versa.
If
age at onset for an (affected) individual is known, the corresponding
likelihood is simply f(age at onset), where f is the lognormal
density. If age at onset is unknown, then the likelihood is F(age)
where 'age' denotes current age, or age last seen, and F is the
lognormal distribution function. For unaffecteds, the likelihood is
equal to 1 – F(age). If the final penetrance, t, is less than
100% then f and F above are multiplied by t.
Lognormal
age-dependent penetrance is modeled in analogy to quantitative
phenotypes (see previous section) except that here, 6 parameters must
be specified (3 for females and 3 for males). For each genotype
(input item 5.3), these are
- the mean, u, of LN(age of onset)
-
the standard deviation, s, of LN(age of onset)
- the limiting
penetrance, t, when age is very high, for females, followed by the
analogous three parameters for males.
Depending on the
values of the 6 parameters given (input item 5.3) for each genotype,
the following 2 situations can be distinguished. Assume a disease
locus with two alleles, a dominant disease allele, D, and a normal
allele, d.
1. Age of onset follows a lognormal
distribution with parameters u and s, where the final penetrance
attained (at high age) is equal to t. For example (parameters taken
to be the same for males and females), one may have on input item
5.3:
D D 3.35 0.17 1.0 3.35 0.17
1.0 → u = 3.35, s = 0.17, final penetrance 100%
D
d 3.35 0.17 0.6 3.35 0.17 0.6 → susceptible individuals
express disease with max. penetrance of 60% when they are very old
d
d 3.0 0.1 0.0 3.0 0.1 3.0 → genotype d/d not susceptible
to disease; values of u and s are irrelevant (likelihood is zero for
affecteds and 1 for unaffecteds)
2. Penetrance does not
depend on age but is a fixed value (t for affecteds, 1-t for
unaffecteds). To accommodate this situation, set s = 0.0. The value
of u is then irrelevant. For example, one may have
d
d 0.0 0.0 0.01 0.0 0.0 0.01 → t = 0.01, that is, d/d
genotypes express the disease with probability 1%, irrespective of
age (likelihood is 0.01 for affecteds and 0.99 for unaffecteds); the
value of u is irrelevant. This case should be used with great care
since it does not differentiate between age of onset known or
unknown.
In summary, for a locus with age-dependent
(lognormal) penetrance, the following special rules must be
observed.
Input
item
Explanation
-----------------------------------------------------------
3
Provide six values on each line (genotype): One mean, one standard
deviation and one final penetrance for each sex.
7 Set the
number of phenotypes equal to 1 (the program will set the correct
number of phenotypes).
8 Set the locus type equal to
2.
10 Six phenotype symbols will be read by the program,
but they are not used in any way.
12 After the symbol for
the second allele, six items are expected: mean, standard deviation
and final penetrance for females, and the analogous three parameters
for males.
14 The value for the phenotype (age) must not
occupy more than 4 spaces. A positive age value refers to an affected
individual, a negative age figure identifies an unaffected
individual. Phenotypes are coded following the rules given in section
10.2,
above.
-------------------------------------------------------------
F
|
t
| --------
| /.
| / .
| / .
| / .
| / .
0
+--------------- age
A1 A2
F is
the probability of being affected, that is, the penetrance (or
likelihood) is equal to F for an affected individual (age at onset
unknown) and equal to 1 – F for an unaffected
individual. According to the figure, above, the age-of-onset curve is
defined as
/
0 if a ≤ A1
F = | t(a - A1)/(A2 - A1) if A1 < a < A2
\
t if a ≤ A2
where "a" is an
individual's present age, or age last seen. If age at onset is known
(for an affected individual) then the likelihood (density) is equal
to f = t/(A2 – A1) if the age of onset is between
A1 and A2, and equal to zero otherwise. If age at onset is considered
a random variable, according to the present definition and with t
= 1, it follows a uniform distribution with mean (A2 – A1)/2
and standard deviation (A2 – A1)/3.464.
For a locus
of type 3 (straight line age of onset), coding is very similar to the
conventions used for lognormal age of onset (specific instructions
are given below). The phenotypes are the ages of each individual,
taken to be positive for affected individuals and taken with a minus
sign for unaffected individuals. Zero will be interpreted as unknown,
but any other symbol may also be designated to represent unknown
phenotype. For affected individuals with known age at onset, enter a
number equal to 500 plus age at onset as the phenotype (see section
10.2, above).
As in lognormal age-dependent penetrance, 6
parameters must be specified but here, they have the following
meaning. For each genotype (input item 5.3), they are (see graph,
above)
- the age, A1, at which penetrance becomes positive
-
the age, A2, at which penetrance reaches its final values
- the
limiting penetrance, t, when age is very high, for females, followed
by the analogous three quantities for males.
Depending on
the values of the 6 parameters given (input line 5.3) for each
genotype, the following 2 situations can be distinguished. Assume a
disease locus with two alleles, a dominant disease allele, D, and a
normal allele, d.
1. Age of onset follows a straight-line
distribution with parameters A1 and A2, where the final penetrance
attained (at high age) is equal to t. For example (parameter values
taken to be the same for females and for males), one may have on
input item 5.3:
D D 10 60 1.0 10 60
1.0 → for individuals with D/D genotype, susceptibility
to disease starts at age 10 and penetrance reaches its maximum of
100% at age 60.
D d 10 60 0.6 10 60
0.6 → susceptible individuals express disease with max.
penetrance of 60% when they are 60 years or older.
d
d 10 11 0.0 10 11 0.0 → genotype d/d not susceptible to
disease; values of A1 and A2 are irrelevant (given genotype d/d,
likelihood is zero for affecteds and 1 for unaffecteds).
2.
Penetrance does not depend on age but is a fixed value (t for
affecteds, 1 – t for unaffecteds). To accommodate this
situation, set A2 = 0.0. The value of A1 is then irrelevant. For
example (same parameter values for males and females), one may
have
d d 0.0 0.0 .01 0.0 0.0 .01 →
t = 0.01, that is, d/d genotypes express the disease with probability
1%, irrespective of age (likelihood is 0.01 for affecteds and 0.99
for unaffecteds); the value of A1 is irrelevant. This case should be
used with great care since it does not differentiate between age of
onset known or unknown.
In summary, for a locus with
straight-line age-dependent penetrance, the following special rules
must be observed. An example input file, agedep.dat, is
provided in the program package (run it by copying this file to
liped.dat and then typing
liped).
-----------------------------------------------------------
Set
the number of phenotypes equal to 1 (the program will set the correct
numbers).
Set the locus type equal to 3.
For
genotype data, after the symbol for the second allele, six items are
expected: starting age (A1), finishing age (A2) and final penetrance
(t) for females, and the analogous three parameters for males.
The
value for the phenotype (age) must occupy 4 spaces. A positive age
value refers to an affected individual, a negative age figure
identifies an unaffected individual. Phenotypes are coded following
the rules given in section 10.2,
above.
-----------------------------------------------------------
To calculate conditional
genotype probabilities for a specific individual, given all the
family data, one must carry out several likelihood computations and
combine their results as follows. For example, consider an individual
with phenotype 'unaffected' and penetrances as given in the table
below, where D is the disease allele and d is the
normal allele at the main
locus.
----------------------------------
Penetrance
for phenotypes
-------------------------
Genotype affected
unaffected XDd
----------------------------------
D D 0.9
0.1 0
D d 0.6 0.4 0.4
d d 0 1
0
----------------------------------
For
this unaffected individual, one wants to compute the risk that he or
she has genotype D/d. To obtain this risk, one runs
LIPED twice, each time with a different phenotype assigned to this
individual, that is, in run 1, the individual has phenotype
unaffected, and in run 2, the individual has phenotype XDd (see table
above). Denote the resulting likelihoods (not lod scores) by L(ua)
and L(XDd). Then, the risk to this individual of having genotype D/d
is given by L(XDd)/L(ua). Note that other programs, such as the MLINK
program of Dr. Mark Lathrop, can compute genetic risks
directly.
With X-linked recessive deleterious traits, for
a female founder individual (no parents in pedigree), the prior
probability, q, of being a carrier of the disease gene is a
multiple of the mutation rate, u. For example, in Duchenne
muscular dystrophy (DMD), q = 4u (Murphy and Chase,
"Principles of Genetic Counseling"). In the likelihood
calculation of pedigree data, on the other hand, the prior
probability of a founder's genotype is determined solely by the gene
frequency, p. For example, the prior probability that a
founder is heterozygous is given by 2p(1 – p).
Therefore, to implement the prior probability, q, that a woman
is heterozygous for an X-linked recessive deleterious gene, in the
likelihood calculation, one must choose the gene frequency of the
deleterious gene, p, such that q = 2p(1 –
p) or, approximately, p = q/2 (in DMD, thus, p
= 2u).
In some applications, the likelihood at a single (disease) locus is needed. For example, one may want to estimate from family data gene frequencies or age-of-onset parameters at a single locus. In LIPED, single-locus calculations are accommodated easiest by defining a dummy second locus with a single allele of frequency 1.
In a pedigree to be
processed by LIPED, any individual must have either both parents in
the pedigree, or be a founder individual, that is, have both parents
unknown (not in pedigree). Note that siblings cannot be recognized as
such unless their parents are also in the pedigree. If parents are
not actually known, they still must be present in the pedigree,
possibly with all phenotypes coded as unknown.
When there
is at least one known recombination in a pedigree but the value of
the recombination fraction is set equal to zero, then the likelihood
will be equal to zero, and the log likelihood equal to -∞. On
output, -∞ is represented as -99.99.
When the
likelihood is equal to zero, either because recombinants are present
while the recombination fraction (θ, theta) is set to zero or
because of a mendelian inconsistency (incompatible genotypes
of some individuals), LIPED will report this with the message, "L(rm,
rf = 0 at rm, rf =...", and
will print the male and female recombination fractions at which the
likelihood is zero, and sequential number and ID code of the
individual at which this was first detected. An incompatibility then
exists among the indicated individual and his or her spouse(s) and
descendants. Note that additional incompatibilities not yet detected
may exist in the given pedigree. Using θ = 0, this helps to
find recombinations in a pedigree. In families with loops (eg,
inbreeding), this scheme does not work, and LIPED will report a zero
likelihood only at θ = 0.5 but cannot pinpoint where this was
first detected.
Mendelian inconsistencies at marker loci may be "avoided" by allowing for small misclassification probabilities. For example, for a SNP, the following penetrance scheme will avoid mendelian inconsistencies:
|
Phenotype |
||
Genotype |
11 |
12 |
22 |
01/01/21 |
0.98 |
0.015 |
0.005 |
01/02/21 |
0.01 |
0.98 |
0.01 |
02/02/21 |
0.005 |
0.015 |
0.98 |
In the locus
descriptions, when there are several alleles, the number of possible
phenotypes may become quite large. For the analysis, however, it is
not necessary to list all phenotypes that might possibly occur. One
only needs to identify those phenotypes that are actually present in
at least one family member.
Th EX1.DAT file contains
input corresponding to a family pedigree with the structure as shown
in section 7, "Complex pedigrees" (pedigree 2), above. Two
dominant loci are used with two alleles each, where A > a and B >
b. The gene frequencies are p = P(A) = 0.4 and q = P(B) = 0.3.
Calculation of the pedigree likelihood by first principles
yields
L(θ) = (1 – p)5 p(1 –
q)3 q2 (1 – θ)[1 + q(1 – θ)2
+ qθ2]/8,
where θ denotes the
recombination fraction. With this, one obtains, for example, log
[L(0.5)] = –4.16106927 and log[L(0.2)] = –3.93702064,
which agrees with the output given by LIPED. The lod score at θ
= 0.2 is thus 0.224.
The EX2.DAT file shows an example with 3 pedigrees and the use of output option 9, ie, summation of lod scores over pedigrees. The second pedigree in this set of 3 pedigrees requires much more computer time per lod score than either pedigree 1 or 3.
Compare linkage relationships among 6 gene markers, here labelled main locus and marker loci 1 through 5. The first comparison made takes more computer time per lod score than the other comparisons. This example shows the use of various output options in combination with locus comparisons.
The EX4.DAT file shows a published pedigree with Norrie's disease (X-linked recessive) and 2 marker loci. Analysis is disease versus each marker and marker versus marker.
Mode of inheritance of disease locus in example 1 is changed such that penetrance rises linearly from age 0 to 10. As no individual is in that age range, the output is the same as in example 1.
Ban Y, Davies TF, Greenberg
DA, Concepcion ES, Tomer Y (2002) The influence of human leucocyte
antigen (HLA) genes on autoimmune thyroid disease (AITD): results of
studies in HLA-DR3 positive AITD families. Clin
Endocrinol (Oxf) 57:81-88
[an example of the use of the LIPED program]
Cheung KH,
Nadkarni P, Silverstein S, Kidd JR, Pakstis AJ, Miller P, Kidd KK
(1996) PhenoDB: an integrated client/server database for linkage and
population genetics. Comput
Biomed Res 29:327-337
[an example of the use of the LIPED program]
Elston RC,
Stewart J (1971) A general model for the analysis of pedigree data.
Hum Hered
21:523-542
Ott J
(1974) Estimation of the recombination fraction in human pedigrees:
efficient computation of the likelihood for human linkage studies. Am
J Hum Genet 26:588-597
Ott
J, Schrott HG, Goldstein JL, Hazzard WR, Allen FH Jr, Falk CT,
Motulsky AG (1974) Linkage studies in a large kindred with familial
hypercholesterolemia. Am
J Hum Genet 26,
598-603
Ott J (1976) A computer program for linkage
analysis of general human pedigrees. Am
J Hum Genet 28:528-529
Ott
J (1986) Y-linkage and pseudoautosomal linkage. Am
J Hum Genet 38:891-897
Ott
J (1999) Analysis of
Human Genetic Linkage,
3rd edition. Johns Hopkins University Press, Baltimore
Schrott
HG, Goldstein JL, Hazzard WR, McGoodwin MM, Motulsky AG (1972)
Familial hypercholesterolemia in a large kindred. Evidence for
monogenic mechanism. Annals
of Internal Medicine 76:711-720
[description of the Alaska pedigree]
Terwilliger JD, Ott J
(1994) Handbook of
Human Genetic Linkage.
Johns Hopkins University Press, Baltimore (now available online
as a pdf file)
Thompson E (2011) The structure of
genetic linkage data: From LIPED to 1 M SNPs. Hum
Hered 71:86-96