Jurg Ott / 2 November 2013
ott@rockefeller.edu
Table of Contents
2. WORKING WITH SEVERAL LOCI 4
5.2. Format for locus symbols, allele symbols and phenotype symbols 7
5.3. Format to read input item 5.12 (symbols for a pair of alleles and values of penetrances) 7
5.4. Format for pedigree data 7
5.5. Symbols, to be read with Format as on input item 5.4 7
5.6. Number of alleles per locus 8
5.7. Number of phenotypes per locus 8
5.8. Locus type, indicated by the variable KONT 8
5.9. Output options as indicated by the variable IAU(i) 8
5.9.a (optional) Recombination fractions when IAU(1)=9 in input item 5.9 10
5.11. Gene frequencies (dummy values required with a value of 1 in col.7 of input item 5.1) 11
5.14a (optional) Identification of "doubled" individuals 12
5.14b (optional) Haplotype frequencies 12
5.15. (optional) Recombination fractions 13
5.16. Direction of further analysis 13
10. AGEDEPENDENT PENETRANCE 18
10.1 Age classes with different penetrances 18
10.2 Ageofonset distributions 19
10.3 Lognormal distribution of age of onset 19
10.4 Straightline curves for age of onset (locus type 3) 22
11. CALCULATION OF GENETIC RISKS 24
The LIPED program (for LIkelihoods
in PEDigrees)
estimates the recombination fraction by calculating pedigree
likelihoods for various assumed values of the recombination fraction.
The algorithm is based on Elston and Stewart (1971) with some
extensions. Its first application (to the large Alaska pedigree,
Schrott et al. 1972) resulted in mild evidence for linkage of
familial hypercholesterolemia to the C3 polymorphism (Ott et al
1974), which was later confirmed by various authors. This disease
locus (LDLR, previously FH and FHC) is now located on chromosome
19p13.3. The program
contained one error (in the likelihood calculation for quantitative
traits), which was pointed out to me by Dr. Robert Elston. Details on
the theory underlying LIPED and its likelihood calculations may be
found in Thompson (2011). LIPED is still used today (see poster
presented at HGM 2012 in Sydney) although rarely and in updated
form.
This manual describes the PC version (June 1995) of the
LIPED computer program for genetic linkage
analysis. Only two loci can be handled at a time, for example, a
disease locus and a marker locus. Originally written in Fortran IV
(Ott 1974), LIPED requires input in fixed format (numbers must be
provided in a fixed number of spaces or columns). The code is
essentially as originally written, with some additions such as proper
treatment of age of onset data. This describes an updated version
(June 1995) of LIPED suitable for use on PCs. Only minor
modifications were made in the latest revision. The program has been
compiled with Microsoft Fortran PowerStation 4.0. for Windows.
KNOWN
BUGS: For a pedigree consisting of a single individual, LIPED does
not calculate a likelihood. This "problem" may be avoided
by including two parents with unknown phenotypes. In practice, this
bug is irrelevant for linkage analyses.
Files included:
LIPED.FOR  Source code of LIPED program
LIPED.EXE  Executable code. Max. 12 alleles per locus, 21 phenotypes per locus.
LIPED16.EXE  older program version. Allows for 16 alleles per locus.
LOGNORM.EXE  Program to convert means and standard deviations from normal to lognormal distributions and vice versa, and to compute penetrances and age classes.
LIPED.DAT Input file holding example data.
EXi.DAT  individual example input files, i = 1..5, corresponding to examples in section 14.
LIPED.OUT  Output file resulting from running LIPED.DAT.
To initiate the program, type LIPED. It will assume
that input is furnished in the file, LIPED.DAT.
When LIPED is used
for research, the appropriate literature reference is Ott (1974) or
Ott (1976) or Ott (1999).
Two
kinds of loci are distinguished in the LIPED program: main locus
(internal number zero) and marker loci (numbered from 1 to NMARK).
Lod scores can be computed for any combination main locus vs. marker
locus. With input item 16 (see section 5, Input File, below), any one
of the marker loci can be declared to represent a new main locus so
that lod scores may also be computed among marker loci. If more than
one marker locus is present, the program creates two temporary disk
files that will be deleted on program termination. Note the following
restriction: with a single marker locus, any number of pedigrees may
be analyzed in a single run. However, with more than one marker
locus, only a single pedigree may be analyzed in a run. One way to
overcome this restriction is as follows. If several independent
families are presented to the program as one single pedigree, LIPED
will recognize this and carry out the proper calculations, the
resulting lod score being the sum over the individual families;
however, individual lods for the families will not be recognizable by
the user. Analyzing several independent families as a single large
pedigree requires a substantial amount of memory. If an error occurs
(MNP or MLIST too small), you may have to analyze the families in the
usual manner as separate pedigrees.
Generally, to analyze
several pedigrees with phenotypes at more than two loci, one might
proceed as follows. First, one decides on the two loci to be
analyzed. If one of these is the main locus, then the comparison main
locus vs. marker is identified on one line of input item 9.
Otherwise, all numbers in input item 9 are set equal to 0, and a
comparison among markers is defined on a new line of input item 9
that appears after a line containing 5000 which immediately follows
the pedigree data. Thus far, one has decided on the 2 loci to be
compared. Now, one must tell LIPED where on the line (in which
columns) to read the phenotypes of these loci, which is done with
FORTRAN Format expressions (see beginning of section 5) furnished in
input item 4.
If you interrupt LIPED while it is still running
and if more than one marker locus has been defined, scratch files
named "ZZ..." or "for..." will remain on the
disk. These files would be deleted when the program terminates
normally. You may simply delete them.
Penetrance is defined as the probability of
occurrence of a particular phenotype given the presence of a certain
genotype. Accordingly, with respect to a disease, penetrance is the
probability of being affected given a certain genotype.
Penetrances
are needed to describe the relation between genotypes and phenotypes.
In the following three simple and common cases, only full penetrance
(values 0 or 1 only) is assumed to occur. Assume a locus with 2
alleles, T and t. When this is a disease locus, let T be the dominant
disease allele and consider the phenotypes AFF for affected and NA
for unaffected.

Phenotypes

Geno Dominant
Recessive Codominant
type
disease disease
case
AFF NA
AFF NA TT Tt tt

T T 1
0 1 0
1 0 0
T t 1
0 0 1
0 1 0
t t 0
1 0 1
0 0 1

To code for Xlinked inheritance in LIPED, tables
such as the one above are used to represent the relation between
genotypes and phenotypes. They apply directly for females. For males,
for example, the genotype T/T is interpreted as T/y (hemizygote), and
all lines corresponding to heterozygote genotypes are disregarded.
Therefore, it is not necessary to distinguish male and female
phenotypes. For example, TT can serve as a phenotype for either
sex.
To analyze loci on the Ychromosome, it is easiest to
code Ylinkage as a special case of autosomal linkage, but
precautions must be taken. For details, see Ott (1986); note that the
methods described in that reference apply only to full penetrance.
For most input quantities, their location (column
numbers) on an input line is fixed and must strictly be adhered to.
For some input quantities, the user is flexible and can determine
with socalled Format expressions where on a line that quantity will
be found by the program. For each of the Format expressions, below, a
recommendation is given that will accommodate most situations
occurring in practice. It provides four spaces (columns) for each
input quantity. A short explanation of FORTRAN Formats is as
follows.
An input quantity is read either with an AFormat
(alphanumeric) or an FFormat (floating point quantities) where it is
determined by the program which of the two forms much be used for
each input quantity (described in this section). For example, (A4)
means that an input quantity such as a phenotype symbol should occupy
4 spaces (columns). If several input quantities must be read by the
program, each requires its own Format, for example, (A4, A4, A4) or,
equivalently, (3A4). To skip reading over a number of spaces, the
XFormat is used. For example, the Format expression (2A4, 8X, A4)
means that the program will read input quantity 1 in columns 14,
quantity 2 in columns 58, and quantity 3 in columns 1720. For
alphanumeric quantities (A format), the position within the space
provided is critical; preferably the same number of spaces is always
used for the same quantity and it is rightjustified within the
allotted space.
The input file must consist of the following
"lines", here being numbered as input item 5.1, item 5.2,
etc.
Col.12
NMARK,
number of marker loci in addition to the main locus. LIPED stops when
NMARK<1 is encountered.
Col. 4
= 0 (a value of 1
prints internal information not generally interpretable)
Col.
5
= 0 usual setting
= 1 to prevent underflows as much as
possible. This should be used only
when
an underflow has occurred since underflows
are unlikely with
double precision calculations.
Col. 6
= 0
for autosomal loci
= 1 for loci on the Xchromosome
Col.
7
= 0 if gene frequencies (rather than haplotype frequencies) will
be read
= 1 if haplotype frequencies are to be read (input item
5.14b). Note that in
this case, dummy gene frequencies must still be provided (input item 5.11).
Col. 820 Mutation rate at main locus (see
MUTATION below).
Col.2180 Text
Col. 180
Format to read input item 5.10
(symbols for loci, alleles and phenotypes). Only Aformat is allowed;
the maximum length is A4 for locus and allele symbols, and A8 for
phenotype symbols. EXAMPLE: (20A4). The format for the phenotype
symbols must correspond in length to the one used to read the actual
phenotypes of the pedigree data (input item 5.14).
Col. 180 Two Aformats and then Fformats. EXAMPLE: (2A4,21F4.0)
Col. 180
Format to read input item 5.5,
5.14 and 5.14a. Aformat only. EXAMPLE: (25A4). The first four items
must not contain more than 4 characters each whereas phenotypes can
be up to 8 characters long, ie, may be read with an A8 Format, for
example.
At place of Provide
symbol for
item
no.

3
no parent (e.g.: blank)
4
male sex (e.g.: m). The first different sex symbol
encountered
in input item 14 will be
considered the symbol for female sex.
5
unknown phenotype at main locus (e.g.: blank)
6,
etc. unknown phenotype at marker locus 1,
etc.

Col. 12 Number of alleles at main locus
(locus 0)
Col. 34 Number of alleles at marker locus
1, etc.
Col. 12 Number of phenotypes at main
locus
Col 34 Number of phenotypes at marker
locus 1, etc.
For loci of type KONT = 1, 2, or 3, the number
of phenotypes is predetermined and will be set by the program. Here
you may simply use 1 for number of phenotypes when KONT > 0.
Col. 12 KONT for main locus
Col. 34
KONT for marker locus 1, etc., where he following values for KONT
apply:
for a locus with discrete phenotypes but penetrances possibly other than 0 and 1. This is more general but runs slower than KONT = 0.
for a locus with discrete phenotypes and penetrances of 0 and 1 only.
for a locus with quantitative phenotypes following a conditional normal distribution (see section on quantitative phenotypes).
for a locus with agedependent penetrances following a lognormal distribution (see section on agedependant penetrance).
for a locus with straightline agedependant penetrances.
Col. 2 IAU(1) for main locus versus marker
locus 1
Col. 4 IAU(2) for main locus versus marker locus 2,
etc., where the values of IAU(i) have the following effect (below, rm
and rf = male and female recombination fractions, respectively):
to do checks only and compute likelihood at rm = rf = 0.5
when no computation is desired for main locus versus marker locus i
if the values of the recombination fractions are to be read by input item 5.15 below. In this case, one set (input line) of values rm and rf after each pedigree, for each likelihood to be computed. Note that after input item 5.16, additional lines of input of this type may be given to allow for comparisons among marker loci.
to compute lods at rm = rf = 0, .001, .05, .10, .20, .30, .40
to compute lods at rm = rf = 0, .001, .05, .10, .15, ...
to compute lods at values of rm and rf shown below.
to compute lods at values of rm and rf shown below
to compute lods at values of rm and rf shown below
to compute lods at values of rm and rf shown below
to compute lods at values of rm and rf shown below
to read recombination fraction values from input item 5.9a once for all pedigrees and to sum the lod scores over pedigrees. Allowed only with NMARK = 1 on input item 5.1, i.e., when no more than one marker locus is specified.
to compute lods at rm = 0, .001, ... (as with option 2) whereas r_{f} = 2r_{m}(1 – r_{m}) [Ott, 1991, equation (8.7), p. 175], that is, the female map distance (Haldane) is assumed to be twice the male map distance.
to compute lods at rm = 0, ... (as with option 3) whereas r_{f} = 2r_{m}(1 – r_{m})
Below is a graphic representation of the values of rm
and rf
at which lods will be computed depending on the value of
IAU(i):
IAU(i) = 4 (22 points)
rm
.5
x x x x x
x x x
.4

x x
.3

x x
.2

x x
.1

x
x
.05 
x
x
.001
x
x
0
x
x
+
0 .001 .05 .1 .2 .3 .4
.5 rf
IAU(i) = 5 (34 points)
rm
.5
x x x x x
x x x x x
x x
.45

x x
.4

x x
.35

x x
.3

x
x
.25 
x
x
.2 
x
x
.15 
x
x
.1 
x
x
.05 
x
x
.001
x
x
0
x
x
+
0 .001 .05 .1 .15 .2 .25 .3
.35 .4 .45 .5 rf
IAU(i) = 6 (9 points)
IAU(i) = 7 (16 points)
rm
rm
.5  x
x x .5 
x x x
x
.3  x
x x .35
x x x
x
.1  x
x x .2 
x x x
x
+
.05 x x x
x
0.1
0.3 0.5 rf
+
0.05 0.20 0.35 0.50
rf
IAU(i) = 8 (64 points)
rm
.5
x x x x x x x
x
.4 x x
x x x x x x
.3
x x x x x x x
x
.2 x x
x x x x x x
.1
x x x x x x x
x
.05 x x
x x x x x x
.001x
x x x x x x x
0
x x x x x x x
x
+
0 .001 .05 .1 .2 .3 .4 .5 rf
Option
8 is useful for approximate factorization of joint male and female
lods into sexspecific lods.
Each pedigree will be analyzed at the rm,rfvalues
provided here.
Col. 1 5 Value for male recombination
fraction
Col. 610 Value for female recombination
fraction. These two values are read with Format 2F5.4. For example,
an input line for rm = 0.1 and rf = 0.45 may look like this:
b1000b4500, or bbb.1bb.45, where b stands for blank (space). For each
likelihood to be computed, one such line of values must be provided.
To terminate the set of rm,rfvalues, enter 60000 as the last line.
Maximum number of lines including the terminating line is equal to
MT.
The following input items, no. 5.10 through no. 5.12, must
be repeated for each locus in the order main locus, marker locus 1,
marker locus 2, etc.
To be read with Format as provided on input item 5.2.
The following items are expected:

name of locus
(at most 4 characters)

symbol for allele 1
(at most 4 characters)

symbol for allele 2, etc. (at most 4 characters)

symbol for phenotype 1 (at most 8
characters)
 symbol for
phenotype 2, etc. (at most 8 characters)
Col. 1 8 Population frequency of allele
1
Col. 916 Population frequency of allele 2, etc. These
values are read with format 10F8.4, that is, every ten numbers must
be on a single line. Each number occupies at most 8 spaces with an
implied decimal point between the first four and last four spaces.
For example, bbbb9500 is equivalent to bbbbb.95 and represents 0.95.
To be read with the Format as provided on input item
5.3. As many lines of input item 5.12 are expected as there are
genotypes at the given locus. In the case of Xlinkage, this refers
to the female genotypes. For males, with Xlinkage, a genotype A/A is
interpreted as A/y while heterozygote genotypes such as A/a are
disregarded. On each line (for each genotype), the following items
are expected:
 symbol for first allele
 symbol
for second allele; these two define a genotype
 probability
of observing phenotype 1 under the given genotype

probability of observing phenotype 2 under the given genotype,
etc.
The above applies to loci with KONT = 0 or KONT = 1 on
input item 5.8. For quantitative phenotypes (KONT = 1), four items
are expected for each genotype, two alleles (defining the genotype)
plus a mean and a standard deviation. For agedependent penetrances
(KONT = 2 or KONT = 3), each line must contain two allele symbols
plus six parameters, ie. three parameters for females and three
parameters for males (the sexspecific three parameters are defined
in chapter 10). Note that in this microcomputer implementation of
LIPED, all phenotypes must be read with AFormats even though they
may be quantitative measurements or age values.
The following
input items, no. 5.13 through no. 5.16, are to be repeated for each
pedigree except that input item 5.14b is needed only once, after the
first pedigree:
Col. 1 4 Number of individuals in pedigree.
Count a "doubled" individual as 2 persons (see section on
complex pedigrees).
Col. 5 8 Number of (pairs of)
doubled individuals; = 0 for simple pedigrees
Col. 968
optional remarks
To be read with Format as given on input item 5.4.
For each individual, the following items must be
given:
max.length

symbol identifying the individual (ID)
4
 ID for one of the parents
*)
4
 ID for the other of the
parents *)
4
 symbol for individual's
sex
4
 phenotype at main
locus
8
 phenotype at marker locus
1, etc.
8
*) Note that each individual must either have
two parents in the pedigree, or both parents' ID may be replaced by
the symbol for no parent. If you have information on only one parent,
you must provide an ID for the other parent who will then have
unknown phenotypes.
Applies only to complex pedigrees (number greater
than zero in col. 58 of input item 5.13). For simple pedigrees, no
input item 5.14a is expected. To be read with Format as given on
input item 5.4. For each pair of doubled individuals, the following
two items are required:
 ID of first member of pair of
doubled individuals
 ID of second member of pair of doubled
individuals
This information is needed only once, after the first
pedigree, and only with a value of 1 in col. 7 of input item 5.1. The
haplotype frequencies are read with format 10F8.4 (as are the gene
frequencies). These values must be given in the following order.
Consider a main locus and a marker locus where n is the number
of alleles at the marker locus. Then, the order of that haplotype
corresponding to the ith allele at the main locus and the
jth allele at the marker locus is given by n(i
– 1) + j. As an example with 2 alleles at the main locus
and 3 alleles at the marker locus, the haplotypes are numbered as
follows:
j=1
j=2 j=3

i=1 1 2
3
i=2 4
5 6
Note: there is no check
that the haplotype frequencies sum to 1.
This information is needed only when IAU(i) = 1 on
input item 5.9. Then, as many likelihood calculations will be carried
out as there are lines of input item 5.15. Each line is read with
format 2F5.4 (cf. input item 5.9a):
Col. 1 5 value for
male recombination fraction
Col. 610 value for female
recombination fraction.
As the terminating line, enter 60000. For
IAU(i) = 1 and i > 1, multiple sets of recombination
fractions (i of them), separated by 60000, must be entered.
Col. 14 Value to determine what action to take
next.
= 5000 if new lines of input item 5.9 are to be
read (allowed only if no more than one pedigree is present in this
problem) thus allowing for linkage analyses between marker loci. The
new lines are expected immediately after input item 5.16. On each
line, there must be as many values as there are marker loci. The
program will scan these values, IAU(i),i = 1,2,.., and the first
marker locus with associated value IAU different from zero will be
considered the new main locus. From then on, on that line, the values
of IAU have the same meaning as on input item 5.9.
Multiple
lines of input item 5.9 may follow a single 5000 value on input item
5.16. For example, consider a total of 5 loci, that is, one main
locus (locus no. 0) and 4 marker loci (numbered 1 through
4):
_
_2_2_2_2
← original input item 5.9
_
5000
_1_2_2_2

_0_1_2_2
 extra lines of input item 5.9
_0_0_1_3

9000
Each
of these extra lines of input item 5.9 has one field for each of the
original marker loci. For instance, the following extra line,
_0_1_2_2 would mean: "now take marker locus 2 as the new main
locus, and pair it with marker locus 3 (using option 2), then with
marker locus 4 (using option 2)", which could be extended to all
marker loci. Note that whenever option 1 is specified, a
corresponding set of lines of input item 5.15 is expected immediately
after the line containing option 1. To terminate the set of extra
lines of input item 5.9, enter a line with 8000 or 9000 (same meaning
as below) in col. 14.
= 7000 if a new pedigree is to be
read. Then, new lines of input item 5.13 etc. are expected. Note that
this is allowed only when no more than one marker locus is
present.
= 8000 if a new problem is to be analyzed.
Then, new lines of input item 5.1 etc. are expected.
=
9000 to terminate this run.
In the CONSTANT.INC (CONSTANT.FOR in Win version)
file, constants are given which are used for dimensioning arrays.
Example values are as follows.
MLIST =
50 headsibs (nuclear families)
MMARK
= 30 marker loci in addition to the main locus
MNAL
= 5 alleles at any locus
MNDI
= 5 pairs of doubled individuals
MNFE
= 21 phenotypes at any locus
MNP
= 25 genotype vectors stored in memory
MNPT
= 200 individuals in a pedigree
MT
= 20 pairs of theta values after item 5.9, including the
terminating 60000 line.
To change these, simply adjust
the values of the constants in the parameter statements and recompile
the program.
The following information is for programmers only
and is not needed for general program use. With the
abbreviations,
KK =
MNAL*MNAL
KK1 =
MNAL*(MNAL+1)/2,
KK2 =
KK*(KK+1)/2,
the array dimensions are given as follows (the
arrays not listed below have fixed dimensions):
FENO1(MNPT,KK1)
IAD(KK,KK) LIST(MLIST)
PHI(KK2)
FENO2(MNPT,KK1)
IAU(MMARK) NAL(MMARK)
PHIS(KK)
GEN(MNAL)
ID(MNPT) NC(MNPT)
PHPROB(MNFE)
GENO(MNP,KK2)
IGENO(MNP) NF(MMARK)
THV1(MT)
GF1(MNAL)
ISEX(MNPT) NM(MNPT)
THV2(MT)
GF2(MNAL)
KONT(MMARK) NS(MNPT)
THVS(MT)
GVX1(MNFE,KK1)
LDI(2,MNDI) PHE1(MNFE)
UNK(MMARK)
GVX2(MNFE,KK1)
LGC(MNDI)
PHE2(MNFE)
HOLD(MNDI,KK2)
LGENO(MNPT) PHEPED(MMARK)
Note: MNFE
must have a value of at least 8.
In a socalled simple pedigree, tracing the
inheritance of genes by going backwards through the generations
(upwards in the pedigree) always leads to the same pair of founder
parents. Pedigrees for which this is not the case are called
complex pedigrees. In particular, pedigrees with the following
features are examples of complex pedigrees: (1) both members of
a pair of parents have themselves parents in the pedigree; (2)
consanguinity loop, i.e., parents are related; (3) marriage
loop, e.g., two brothers are married to two sisters, or an individual
has been married twice, the two spouses being related with each other
but not with the individual who married twice. An example of
the last kind is pedigree 1, below, where [] refers to a female and
() refers to a male:
Pedigree 1: marriage
loop
[1].(2)

..


(3).[4].(5)


(6) (7)
Without
special measures, LIPED analyzes only simple pedigrees. The analysis
of complex pedigrees is possible by manipulating the pedigree in a
certain way so that it "appears" to LIPED as a simple
pedigree. This manipulation consists of replacing a particular
individual by two individuals as shown in the example below (pedigree
2), and by identifying the two individuals actually corresponding to
the same individual in the original pedigree (input item 5.14a). Note
that such a "doubling" of individuals is necessary for
breaking up loops, and also whenever more than one of the two parents
has parents in the pedigree.
When an individual has been
"doubled", the number of individuals in the pedigree must
be increased by 1 thus counting a pair of doubled individuals as two
persons. Up to MNDI individuals may be "doubled" so that,
e.g., multiple consanguineous loops can be accommodated. At the end
of a pedigree with doubled individuals, the two individuals
corresponding to the one original person must be identified (input
item 5.14a).
Pedigree 2: Example of a pedigree, manipulated
for processing by LIPED
Original pedigree
Manipulated pedigree, acceptable to
LIPED
[1.1].(1.2).[1.3]
[1.1].(1.2).[1.3]








(2.1).[2.2]
(2.1a) (2.1b).[2.2]




[3.1]
[3.1]
Here are some important notes regarding
"doubling" of individuals:
Individuals can only be doubled, not tripled. For example, an individual who is an offspring and is also married twice with children from each marriage cannot be manipulated as described.
Computation time generally increases drastically with the number of pairs of doubled individuals. When one has a choice among several candidates to be doubled, it is recommended to take an individual with as much phenotypic information as possible in order to exclude as many genotypes as possible. For example, in pedigree 1, above, any one of individuals 3, 4 or 5 could be chosen for doubling. In the presence of one doubled individual, the QLIK routine for calculating the likelihood is executed for each genotype of that individual, except for those genotypes known to be incompatible with the individual's phenotypes or the phenotypes of his or her offspring. Analogously, for several pairs of doubled individuals, QLIK is called a maximum of m times, where m is calculated as follows. Let n be the number of haplotypes at the two loci jointly, i.e., n is the product of the number of alleles at the two loci under consideration. The number of joint genotypes is then given by g = n(n + 1)/2, so that m = g^{NDI}, where NDI is the number of pairs of doubled individuals. For example, with 2 and 3 alleles at the respective two loci, one has n = 6 haplotypes and g = 21 genotypes. With NDI = 3 pairs of doubled individuals, QLIK may be called up to m = 9261 times. The present version of PCLIPED counts these calls and displays them on the screen.
Whenever the genotype of an individual can unequivocally be inferred with certainty (including phase), such an individual may be represented as multiple individuals in the pedigree if necessary, and this individual must not be counted as a socalled doubled individual (treat it as separate multiple individuals). The likelihood will then not be correct but the lod score will be unaffected by such a manipulation. For example, if an individual is known to be A/A at locus 1 and B/b at locus 2, the joint genotype is known to be AB/Ab. Note that for doubly heterozygous individuals it will not generally be possible to make use of this feature even though phase may be known, as there is no easy way to identify phases in LIPED on the basis of phenotypes.
Mutation is allowed for at the current main locus
only and is assumed to occur with a constant rate from any of the
alleles no. 2, 3,... towards the first allele, with the mutation rate
being specified in col. 820 of input item 5.1. Backmutation is
assumed to be negligible. Also, in the computation of the likelihood,
it is assumed that a mutation occurs only in one or the other of two
parents, but not simultaneously in both parents.
WARNING: when
processing a disease locus with mutation and subsequently, in the
same run, testing marker versus marker, then the mutation rate keeps
applying to the current main locus unless a new run is carried out
with the mutation rate set equal to zero.
Note that only
simple pedigrees can be processed by LIPED unless special steps are
taken to code for a complex pedigree (see that section above).
At any locus, quantitative
rather than qualitative phenotypes can be read. For a locus with
quantitative phenotypes, the following special rules must be
observed.
Input
item
Explanation

3 With two Fformats, read one mean and
one standard deviation for each genotype.
7 Set the number of phenotypes equal to
2. The program will correct wrong numbers.
8 Set KONT equal to 1.
10 Two phenotype symbols will be read by
the program but they are not used in any way.
12 After the symbol for the second
allele, two items are expected, the mean and the standard
deviation of the phenotype distribution given the particular genotype specified by the two alleles.
14 The
phenotype values must not occupy more than 8 spaces
each.

Agedependant penetrance refers to the fact that a
carrier of a disease gene may not exhibit the disease at birth but
only later in life, that is, the penetrance (= probability of showing
a certain phenotype given a genotype) depends on the age of an
individual.
The easiest way of implementing agedependant
penetrance is by forming age classes and having different penetrances
in these classes. For affecteds, irrespective of their age, only one
class is required (provided that no phenocopies are allowed for), but
unaffecteds must be grouped into age classes. For example, in a given
disease, if all gene carriers beyond 10 years of age have expressed
the disease, a suitable assumption is that penetrance rises linearly
from 0 at age 0 to 100% at age 10, as pictured below:
Penetrance

1
 
 /
 /
 /
/
0
+age
0 10
20
One might then form 6 classes as follows, where AFF
stands for the 'affected' phenotype, and NA1, NA2, etc. stands for
unaffected in age class 1, 2, etc.; NA5 denotes unaffected
individuals older than 10 years who are taken to be known not to
carry the disease gene. The disease is assumed dominant, the disease
being T. Note that the probability of being unaffected is 1 minus the
probability of being
affected.

Phenotypes

Genotype
AFF NA1 NA2 NA3 NA4
NA5

T T
1 .88 .63 .38 .13 0
T t
1 .88 .63 .38 .13 0
t t
0 1 1 1
1 1

Rather than forming age classes, the distribution of
the age at disease onset may be assumed to follow a certain
distribution. In LIPED, two such distributions are implemented, the
lognormal and a straightline distribution. Below, F denotes the
distribution (cumulative sum) of age at onset whereas f denotes the
corresponding density (histogram).
Whatever the age of onset
distribution used, to represent in a single number the various pieces
of phenotypic information (age at onset, present age, affection
status) at a disease locus, the following conventions must be
observed in LIPED. In principle, the phenotype to be provided in the
input to LIPED is an individual's present age (or age last seen) or
the age at onset, taken with a minus sign for unaffecteds, and taken
to be positive for affecteds. Present age and age at onset are
distinguished as outlined, below.
The phenotype is an individuals present age (or age last seen), taken with a minus sign (the sign distinguishes affecteds from unaffecteds). Example: unaffected, present age is 56, phenotype given in program is 56. If present age is unknown, a guess must be used, for example, based on ages of sibs or parents.
If actual age at disease onset is unknown, the
phenotype is a person's present age. Example: 56. If present age is
unknown, a guess must be used, based on ages of relatives.
If
age at disease onset is known, it is entered into LIPED by the
following coding scheme: The phenotype to be provided is obtained by
adding 500 to the age at onset. Example: age at disease onset is 23
years; phenotype to be provided is 523.
NOTE: actual age at
disease onset is relevant only when disease can occur under different
genotypes with different penetrances. If this is not so (it usually
is not), then present age may be given for all affecteds.
The phenotype is given as 0. Alternatively, on input item 5.5, one may define any other code for unknown phenotype, for example, blank.
It is often meaningful to assume that age of onset is
lognormally distributed, that is, that LN(age of onset) follows a
normal distribution where LN denotes natural logarithm (a simpler
assumption for ageofonset distribution is covered in section 10.4,
below). Mean and standard deviation for the lognormal and normal
distributions are defined and connected with each other as
follows:
Age (orig. values)
LN(age)
(lognormal distr.)
(normal
distr.)

mean
μ
u
std.
dev.
σ
s

For
ease of presentation, define m = exp(u) and w = exp(s^{2}).
Then one has:
μ = m √w
σ = m √[w(w –
1)] = μ √(w – 1)
u = 2 LN(μ) – 0.5
LN(μ^{2} + σ^{2})
s = √[LN(μ^{2}
+ σ^{2}) – 2 LN(μ)] = √[2{LN(μ) –
u}],
where LN denotes natural logarithm. Also, with given
mean, μ, of the raw data, and standard deviation, s, of the
transformed data, one obtains the mean of the transformed data as u =
LN(μ) – 0.5s^{2}.
Some example values are
given in the following table:

Original scale
LN scale (normal distr.)


μ σ
u
s

20 5
2.97 0.25
20 10
2.88 0.47
20 15
2.77 0.67
40 5
3.68 0.12
40 10
3.66 0.25
40 15
3.62 0.36

The
LOGNORM program (included) transforms values u and s into the
corresponding values of μ and σ, and vice versa.
If
age at onset for an (affected) individual is known, the corresponding
likelihood is simply f(age at onset), where f is the lognormal
density. If age at onset is unknown, then the likelihood is F(age)
where 'age' denotes current age, or age last seen, and F is the
lognormal distribution function. For unaffecteds, the likelihood is
equal to 1 – F(age). If the final penetrance, t, is less than
100% then f and F above are multiplied by t.
Lognormal
agedependent penetrance is modeled in analogy to quantitative
phenotypes (see previous section) except that here, 6 parameters must
be specified (3 for females and 3 for males). For each genotype
(input item 5.3), these are
 the mean, u, of LN(age of
onset)
 the standard deviation, s, of LN(age of onset)
 the limiting penetrance, t, when age is very high, for females,
followed by the analogous three parameters for males.
Depending
on the values of the 6 parameters given (input item 5.3) for each
genotype, the following 2 situations can be distinguished. Assume a
disease locus with two alleles, a dominant disease allele, D, and a
normal allele, d.
1. Age of onset follows a lognormal
distribution with parameters u and s, where the final penetrance
attained (at high age) is equal to t. For example (parameters taken
to be the same for males and females), one may have on input item
5.3:
D D 3.35 0.17 1.0 3.35 0.17 1.0 →
u = 3.35, s = 0.17, final penetrance 100%
D
d 3.35 0.17 0.6 3.35 0.17 0.6 → susceptible individuals
express disease with max. penetrance of 60% when they are very old
d
d 3.0 0.1 0.0 3.0 0.1 3.0 → genotype d/d not
susceptible to disease; values of u and s are irrelevant (likelihood
is zero for affecteds and 1 for unaffecteds)
2. Penetrance
does not depend on age but is a fixed value (t for affecteds, 1t for
unaffecteds). To accommodate this situation, set s = 0.0. The value
of u is then irrelevant. For example, one may have
d
d 0.0 0.0 0.01 0.0 0.0 0.01 → t = 0.01, that
is, d/d genotypes express the disease with probability 1%,
irrespective of age (likelihood is 0.01 for affecteds and 0.99 for
unaffecteds); the value of u is irrelevant. This case should be used
with great care since it does not differentiate between age of onset
known or unknown.
In summary, for a locus with agedependent
(lognormal) penetrance, the following special rules must be
observed.
Input
item
Explanation

3 Provide six Fformats to read on each line
(genotype) one mean, one standard deviation and one final penetrance
for each sex (note that these Formats will apply to all loci).
7 Set the number of phenotypes equal to 1 (the
program will set the correct number of phenotypes).
8 Set KONT equal to 2.
10
Six phenotype symbols will be read by the program, but they are not
used in any way.
12 After
the symbol for the second allele, six items are expected: mean,
standard deviation and final penetrance for females, and the
analogous three parameters for males.
14
The value for the phenotype (age) must not occupy more than 8 spaces
(4 recommended), the actual number of spaces used being determined by
the format statement given in input item 5.4. A positive age value
refers to an affected individual, a negative age figure identifies an
unaffected individual. Phenotypes are coded following the rules given
in section 10.2,
above.

F

t 


/.

/ .

/ .

/ .
 /
.
0
+ age
A1 A2
F
is the probability of being affected, that is, the penetrance (or
likelihood) is equal to F for an affected individual (age at onset
unknown) and equal to 1 – F for an unaffected individual.
According to the figure, above, the ageofonset curve is defined
as
/
0
if a ≤ A1
F
=  t(a  A1)/(A2  A1) if A1 < a < A2
\ t
if a ≤ A2
where "a" is
an individual's present age, or age last seen. If age at onset is
known (for an affected individual) then the likelihood (density) is
equal to f = t/(A2 – A1) if the age of onset is between A1 and
A2, and equal to zero otherwise. If age at onset is considered a
random variable, according to the present definition and with t = 1,
it follows a uniform distribution with mean (A2 – A1)/2 and
standard deviation (A2 – A1)/3.464.
For a locus of type
3 (straight line age of onset), coding is very similar to the
conventions used for lognormal age of onset (specific instructions
are given below). The phenotypes are the ages of each individual,
taken to be positive for affected individuals and taken with a minus
sign for unaffected individuals. Zero will be interpreted as unknown,
but any other symbol may also be designated to represent unknown
phenotype. For affected individuals with known age at onset, enter a
number equal to 500 plus age at onset as the phenotype (see section
10.2, above).
As in lognormal agedependent penetrance, 6
parameters must be specified but here, they have the following
meaning. For each genotype (input item 5.3), they are (see graph,
above)
 the age, A1, at which penetrance becomes
positive
 the age, A2, at which penetrance reaches its
final values
 the limiting penetrance, t, when age is very
high, for females, followed by the analogous three quantities for
males.
Depending on the values of the 6 parameters given
(input item 5.3) for each genotype, the following 2 situations can be
distinguished. Assume a disease locus with two alleles, a dominant
disease allele, D, and a normal allele, d.
1. Age of onset
follows a straightline distribution with parameters A1 and A2, where
the final penetrance attained (at high age) is equal to t. For
example (parameter values taken to be the same for females and for
males), one may have on input item 5.3:
D
D 10 60 1.0 10 60 1.0 → for
individuals with D/D genotype, susceptibility to disease starts at
age 10 and penetrance reaches its maximum of 100% at age 60.
D
d 10 60 0.6 10 60 0.6 → susceptible
individuals express disease with max. penetrance of 60% when they are
60 years or older.
d
d 10 11 0.0 10 11 0.0 → genotype
d/d not susceptible to disease; values of A1 and A2 are irrelevant
(given genotype d/d, likelihood is zero for affecteds and 1 for
unaffecteds).
2. Penetrance does not depend on age but is a
fixed value (t for affecteds, 1 – t for unaffecteds). To
accommodate this situation, set A2 = 0.0. The value of A1 is then
irrelevant. For example (same parameter values for males and
females), one may have
d
d 0.0 0.0 .01 0.0 0.0 .01 → t
= 0.01, that is, d/d genotypes express the disease with probability
1%, irrespective of age (likelihood is 0.01 for affecteds and 0.99
for unaffecteds); the value of A1 is irrelevant. This case should be
used with great care since it does not differentiate between age of
onset known or unknown.
In summary, for a locus with
straightline agedependent penetrance, the following special rules
must be observed.
Input
item
Explanation

3 On each line
(genotype) provide six Fformats to read one starting age (A1), one
finishing age (A2) and one final penetrance (t) for each sex. Note
that these Formats will apply to all loci.
7 Set the number of
phenotypes equal to 1 (the program will set the correct numbers).
8 Set KONT equal to
3.
10
Six phenotype symbols will be read by the program, but they are not
used in any way.
12
After the symbol for the second allele, six items are expected:
starting age (A1), finishing age (A2) and final penetrance (t) for
females, and the analogous three parameters for males.
14
The value for the phenotype (age) must not occupy more than 8 spaces
(4 recommended), the actual number of spaces used being determined by
the format statement given in input item 5.4. A positive age value
refers to an affected individual, a negative age figure identifies an
unaffected individual. Phenotypes are coded following the rules given
in section 10.2,
above.

To calculate conditional genotype probabilities for a
specific individual, given all the family data, one must carry out
several likelihood computations and combine their results as follows.
For example, consider an individual with phenotype 'unaffected' and
penetrances as given in the table below, where D is the disease
allele and d is the normal allele at the main
locus.

Penetrance for phenotypes

Genotype
affected unaffected
XDd

D D
0.9 0.1
0
D d
0.6 0.4
0.4
d d
0 1
0

For
this unaffected individual, one wants to compute the risk that he or
she has genotype D/d. To obtain this risk, one runs LIPED twice, each
time with a different phenotype assigned to this individual, that is,
in run 1, the individual has phenotype unaffected, and in run 2, the
individual has phenotype XDd. Denote the resulting likelihoods (not
lod scores) by L(ua) and L(XDd). Then, the risk to this individual of
having genotype D/d is given by L(XDd)/L(ua). Note that other
programs, such as the MLINK program of Dr. Mark Lathrop, can compute
genetic risks directly.
With Xlinked recessive deleterious
traits, for a female founder individual (no parents in pedigree), the
prior probability, q, of being a carrier of the disease gene
is a multiple of the mutation rate, u. For example, in
Duchenne muscular dystrophy (DMD), q = 4u (Murphy and
Chase, "Principles of Genetic Counseling"). In the
likelihood calculation of pedigree data, on the other hand, the prior
probability of a founder's genotype is determined solely by the gene
frequency, p. For example, the prior probability that a
founder is heterozygous is given by 2p(1 – p).
Therefore, to implement the prior probability, q, that a woman
is heterozygous for an Xlinked recessive deleterious gene, in the
likelihood calculation, one must choose the gene frequency of the
deleterious gene, p, such that q = 2p(1 –
p) or, approximately, p = q/2 (in DMD, thus, p
= 2u).
In some applications, the likelihood at a single (disease) locus only is needed. For example, one may want to estimate from family data gene frequencies or ageofonset parameters at a single locus. In LIPED, singlelocus calculations are accommodated easiest by defining a dummy second locus with a single allele of frequency 1.
In a pedigree to be processed by LIPED, any
individual must have either both parents in the pedigree, or be a
founder individual, that is, have both parents unknown (not in
pedigree). Note that siblings cannot be recognized as such unless
their parents are also in the pedigree. If such parents are not
actually known, they will must still be present in the pedigree,
possibly with all phenotypes coded as unknown.
When there is
at least one known recombination in a pedigree but the value of the
recombination fraction is set equal to zero, then the likelihood will
be equal to zero, and the log likelihood equal to ∞. On
output, ∞ is represented as 99.99.
When the likelihood
is equal to zero, either because recombinants are present while the
recombination fraction (θ, theta) is set to zero or because of
a genetic inconsistency (incompatible genotypes of some individuals),
LIPED will report this with the message, "L(rm, rf = 0 at rm, rf
=...", and will print the male and female recombination
fractions at which the likelihood is zero, and sequential number and
ID code of the individual at which this was first detected. An
incompatibility then exists among the indicated individual and his or
her spouse(s) and descendants. Note that additional incompatibilities
not yet detected may exist in the given pedigree. Using θ = 0,
this helps to find recombinations in a pedigree. In families with
loops (eg, inbreeding), this scheme does not work, and LIPED will
report a zero likelihood only at θ = 0.5 but cannot pinpoint
where this was first detected.
In the locus descriptions, when
there are several alleles, the number of possible phenotypes may
become quite large. For the analysis, however, it is not necessary to
list all phenotypes that might possibly occur. One only needs to
identify those phenotypes that are actually present in at least one
family member.
When you request output on a disk file, the
input used will be appended to the output file. If you do not want
this, it is easiest to proceed as follows. Pretend running an
additional problem, that is, your last input line is 8000 rather than
9000, and add an additional input line containing 0 (= number of
marker loci) in column 2. LIPED will then stop without appending the
input file to the output file. This method is used in example 5 (file
EX5.DAT), below.
Th EX1.DAT file contains input corresponding to a
family pedigree with the structure as shown in section 7, "Complex
pedigrees" (pedigree 2), above. Two dominant loci are used with
two alleles each, where A > a and B > b. The gene frequencies
are p = P(A) = 0.4 and q = P(B) = 0.3. Calculation of the pedigree
likelihood by first principles yields
L(θ) = (1 –
p)^{5} p(1 – q)^{3} q^{2} (1 –
θ)[1 + q(1 – θ)^{2} + qθ^{2}]/8,
where
θ denotes the recombination fraction. With this, one obtains,
for example, log [L(0.5)] = –4.16106927 and log[L(0.2)] =
–3.93702064, which agrees with the output given by LIPED. The
lod score at θ = 0.2 is thus 0.224.
The EX2.DAT file shows an example with 3 pedigrees and the use of output option 9, ie, summation of lod scores over pedigrees. The second pedigree in this set of 3 pedigrees requires much more computer time per lod score than either pedigree 1 or 3.
Compare linkage relationships among 6 gene markers, here labelled main locus and marker loci 1 through 5. The first comparison made takes more computer time per lod score than the other comparisons. This example shows the use of various output options in combination with locus comparisons.
The EX4.DAT file shows a published pedigree with Norrie's disease (Xlinked recessive) and 2 marker loci. Analysis is disease versus each marker and marker versus marker.
Mode of inheritance of disease locus in example 1 is changed such that penetrance rises linearly from age 0 to 10. As no individual is in that age range, the output is the same as in example 1.
Ban Y, Davies TF, Greenberg DA, Concepcion ES, Tomer
Y (2002) The influence of human leucocyte antigen (HLA) genes on
autoimmune thyroid disease (AITD): results of studies in HLADR3
positive AITD families. Clin Endocrinol (Oxf)
57:8188 [a recent example for the use of the LIPED
program]
Cheung KH, Nadkarni P, Silverstein S, Kidd JR,
Pakstis AJ, Miller P, Kidd KK (1996) PhenoDB: an integrated
client/server database for linkage and population genetics. Comput
Biomed Res 29:327337 [an example of the use
of the LIPED program]
Elston RC, Stewart J (1971) A general
model for the analysis of pedigree data. Hum
Hered 21:523542
Ott J (1974)
Estimation of the recombination fraction in human pedigrees:
efficient computation of the likelihood for human linkage studies. Am
J Hum Genet 26:588597
Ott J, Schrott
HG, Goldstein JL, Hazzard WR, Allen FH Jr, Falk CT, Motulsky AG
(1974) Linkage studies in a large kindred with familial
hypercholesterolemia. Am J Hum Genet
26, 598603
Ott J (1976) A computer program for
linkage analysis of general human pedigrees. Am
J Hum Genet 28:528529
Ott J (1986)
Ylinkage and pseudoautosomal linkage. Am J
Hum Genet 38:891897
Ott J (1999)
Analysis of Human Genetic Linkage,
3rd edition. Johns Hopkins University Press, Baltimore
Schrott
HG, Goldstein JL, Hazzard WR, McGoodwin MM, Motulsky AG (1972)
Familial hypercholesterolemia in a large kindred. Evidence for
monogenic mechanism. Annals of Internal
Medicine 76:711720 [description
of the Alaska pedigree]
Terwilliger JD,
Ott J (1994) Handbook of Human Genetic
Linkage. Johns Hopkins University Press,
Baltimore
Thompson E (2011) The structure of genetic linkage
data: From LIPED to 1 M SNPs. Hum Hered
71:8696