Jurg Ott / 15 April
2022
Rockefeller University
Email: ott@rockefeller.edu
Much of this section is only of historical interest. For a modern implementation of the LINKAGE programs, see Large numbers of markers below.
This document provides a brief introduction to the LINKAGE programs and describes installation procedures under Windows. Detailed user guides to the Linkage Analysis Programs and Linkage Support Programs are in separate documents.The programs provided here are based on the original Pascal version 5.1 and have been compiled with Free Pascal. The main LINKAGE programs have been made more efficient [5] and are available (see below) as FASTLINK for Linux. Note that the FASTLINK programs can handle loops automatically and efficiently, so everything in this document about breaking loops is correct but obsolete; see chapter 7 in our Handbook of Human Genetic Linkage. Thus, the programs described here are mostly of historical interest although the package distributed here does work under Windows; it is highly recommended that readers use the Linux versions based on Fastlink or, ideally, the Linux implementation of pseudomarker.
The LINKAGE programs were conceived in the 1980’s as software to analyze marker genotypes in the CEPH families [1] with the purpose to create a human gene map. At the same time, I had been asked to extend the LIPED program [2] from two-point to multipoint analysis. Thus, I convinced Mark Lathrop to include disease loci in the LINKAGE program and make it a general purpose linkage program for use by the human genetics community. The result was the set of LINKAGE programs [3] still in use today. These programs are able to handle large family pedigrees (execution time scales linearly with family size) but the joint analysis of multiple loci is limited to about 8 (execution time and memory requirements scale exponentially with the number of loci).
The three main LINKAGE programs are mlink (calculates lod scores at fixed values of the recombination fraction in one interval of a genetic map), linkmap (calculates location scores for positions of a disease locus along a marker map), and ilink (iteratively estimates parameters such as recombination fractions, allele frequencies, penetrances, and so on). In addition, there are several auxiliary or support programs (unknown, makeped, lcp, and so on) to prepare data for analysis by the main programs. There are also specialized programs (clinkage) for the analysis of marker loci in CEPH families (full penetrance only, no disease loci); they can handle larger numbers of markers jointly but are not included in this package. Another set of specialized versions are the TLINKAGE programs; they are designed to handle digenic inheritance [4], that is, two susceptibility loci underlying a disease trait. Note our most recent publication on digenic trait analysis for case-control data.
To accommodate large numbers of markers, it is best to analyze one marker at a time. With family data, this may be a good strategy anyway. A shell program is available here to run one marker at a time. It is based on fastlink and automatically resolves loops. Input data are expected in plink format. For details, see the enclosed runML.param text file. This shell program runs in Linux and must be invoked by typing runML <runML.param. The parameter file name is arbitrary. A more sophisticated implementation of running one marker at a time is based on the pseudomarker program.
Table of Contents
LINKAGE programs for Windows and Linux
Windows ME (information supplied by Joanne Edington)
B. BRIEF OVERVIEW OF PROGRAM USAGE
B.1 PEDIGREE FILE and MAKEPED PROGRAM
B.2 DATAFILE and PREPLINK PROGRAM
E.1 PENETRANCE/LIABILITY CLASSES IN LIPED AND LINKAGE
E.2 CODING FOR MUSCULAR DYSTROPHY
E.3 IDENTIFYING OBLIGATE CARRIERS
E.4 EVERYBODY UNKNOWN AT ONE LOCUS
The LINKAGE programs are generally run in command windows (text based). Thus, you cannot simply click on program names. In Linux, you run these programs in a terminal window (ctrl-alt-T), and in Windows, it is best to open a command window, sometimes called a DOS box -- click on Start, then Run, and type cmd. This opens a command box. You may then modify its properties (font size, etc.). You need to know basic Windows commands to work with a command box. These commands are very similar to the ones in Unix. It is convenient to create a shortcut of the file cmd.exe on your desktop; the cmd.exe file usually resides in the c:\windows\system32 folder. In addition to making fonts and font sizes more user-friendly, it is also recommended to modify the basic behavior of the command box; see Setting the path, below.
To work with the LINKAGE programs, it is best to reserve a specific directory on your hard disk, for example, C:\linkage. You may want to put all program files into this directory and any data to be analyzed in another directory, for example, C:\linkage\data. Transfer all files received to the linkage directory. The program package contains sample data from our Handbook (Terwilliger and Ott 1994). You may want to run the programs the way they come. To use them with different program constants (eg. different maximum number of pedigrees), make the necessary changes (see section C.2). For details on compiling, see section C.3.
For LCP and other programs to display correctly, the command window must recognize the ansi.sys driver. Depending on your Windows version, this may be accomplished in different ways as outlined below. For Windows 7 and higher, the updated version of LCP does not use the ansi.sys driver.
Find the config.nt file. It usually resides in the c:\WINNT\system32 folder. In the config.nt file, insert a line that reads: device=c:\WINNT\system32\ansi.sys (modify this line if your ansi.sys file is elsewhere).
If the above procedure does not work, proceed as follows. In the control panel, select System and click on the Environment tab. Define a new environment variable, device, with value: c:\WINNT\system32\ansi.sys.
In the c:\WINDOWS\system32 folder, open the config.nt file with Notepad. Insert a line that reads: device=c:\WINDOWS\system32\ansi.sys
Windows ME does not support DOS nor does it enable you to load the ansi.sys driver. A search in the newsgroups (http://groups.google.com) turns up a few solutions:
1) Install the Real DOS-Mode Patch for Windows ME from http://www.geocities.com/mfd4life_2000/
2) Use PC Magazine's
"ANSI.COM" by Michael Mefford instead of ANSI.SYS, and
simply load it from the command-line before running your old program.
ftp://garbo.uwasa.fi/pc/pcmagutl/ansi132.zip
or
http://www.simtel.net/pub/simtelnet/msdos/pcmag/v8n02.zip
3) Right click on your DOS program's shortcut. Select properties, program, advanced. From here you can create a custom config.sys and autoexec.bat file for your DOS program to use when run. Be sure you direct the config.sys file to the correct path for ansi.sys. Unless you specify a separate config.sys or autoexec.bat file for your DOS program to use it will use the default config.nt and autoexec.nt.
Version 2 appears the simplest and has been reported to work fine.
The procedure is the same as described above for Windows XP. However, you need Administrator privileges to change the config.nt file. The simplest approach is as follows. If you have a shortcut to cmd.exe on your desktop, right-click on that shortcut and select Run as administrator. Otherwise click on Start, start typing comm in the Search box at the bottom left of the screen, and right-click on Command Prompt. Then select Run as administrator. The Administrator:cmd box will take you directly to the c:\windows\system32 folder (if not, you need to cd to that folder). Type notepad config.nt, scroll towards the bottom of the file, copy the line device=%SystemRoot%\system32\himem.sys, and in the copied line replace himem by ansi. The modified config.nt file should then contain the following two lines:
device=%SystemRoot%\system32\himem.sys
device=%SystemRoot%\system32\ansi.sys
Save the config.nt file.
Make sure that the LINKAGE directory is accessed by the system by setting the appropriate path (not needed if you are working in the LINKAGE directory). The easiest approach is as follows and works with all Windows versions. In the drive where you generally work, for example, D:, create a directory (folder), for example, D:\bin, and put all programs and batch files into this folder. Using Notepad, create a file containing the following lines:
set
dircmd=/p/o
set path=D:\bin;%PATH%
Save
it in the bin
folder under the name setbin.bat
but as "All Files", not as a "Text File". Assume
that you want to put the LINKAGE programs into a folder C:\LI. Create
that folder and also prepare a file in Notepad containing the
following lines:
@echo
off
echo *** Setting path to include LI directory
set
PATH=c:\LI;%PATH%
Save this file in the bin folder under the name setli.bat ("All Files"). Save all your Linkage executables in the LI folder. Whenever you want to work with the Linkage programs, open a command box (CMD) and type setbin followed on a new line by setli. The bin and LI folders are now accessed by the system as long as you keep the CMD window open. To make permanent changes to the path proceed as outlined below.
In the Control Panel, click or double click on System / Advanced / Environment Variables. Find the Path variable and edit it so it contains c:\Linkage.
In the Control Panel, click or double click on System / Environment. Click on the Path variable and modify its value as above. Then click on Set and Ok.
Currently the LINKAGE programs for PCs are furnished for general and 3-generation (CEPH) pedigrees. A third category, programs for experimental crosses in the mouse, is not currently supported by me. This documentation is generally oriented towards the general pedigree version; differences between the two versions are pointed out where necessary. A detailed user manual is available (LinkageUser.pdf file, contained in the program package referred to above).
The LINKAGE programs require two input files, a “pedfile” holding the pedigree data, and a “datafile” holding the descriptions of the loci, locus order, etc. (pedfile and datafile are the names of the corresponding files in the program code). Preferably, the first step in the linkage analysis is to create the pedigree file. This must be done using a text editor (word processor) capable of producing ASCII files (see section E.7).
Write one line of input for each individual, where the following items must be given for each individual (more detailed information is found in the program manual). This LINKAGE format, based on the LIPED format, is now a de facto standard for many linkage and association programs, also for plink [6]:
Pedigree name (or number)
ID name (or number) of given individual
ID name (or number) of that individual’s father (0 if father is not in pedigree)
ID name (or number) of that individual’s mother (0 if mother is not in pedigree; either both or no parents must be given)
Sex of individual: 1 = male, 2 = female
Phenotype at locus 1
Phenotype at locus 2, etc.
Each item must be separated from the others by at least one space or tab character.
Phenotype symbols depend on the locus type used. Each locus must be coded in one of four possible locus type formats (only Allele Numbers and Binary Factors locus types may be used in the programs for 3-generation pedigrees, and they must specify codominant inheritance). The locus types and corresponding phenotypes are as follows:
a) Affection status: 2 = affected, 1 = unaffected, 0 = unknown. If more than one liability class is used, a second number must be added designating the liability class. Usually used for coding disease loci.
b) Allele numbers: two numbers, corresponding to the two alleles present, eg, 2 5 (alleles 2 and 5 present), or 1 2. Also, 0 0 denotes unknown. Homozygotes and hemizygotes (males in X-linked case) must be given two identical numbers. Usually used for RFLPs (co-dominant).
c) Binary factors: a sequence of 0’s and 1’s indicating absence or presence of the i-th factor. Used for dominant marker loci, eg, ABO locus.
d) Quantitative traits: quantitative measurement, eg, CPK level.
For more details on phenotypes, please consult the User’s Guide in the Windows LINKAGE package. In the pedigree file, list the phenotypes of all loci known for the individuals. You will later determine (in the LCP program), with which of the loci you want to do calculations.
One pedigree may be entered after another, each pedigree with its own pedigree id. After the last line is entered, make sure that there are no trailing blank (empty) lines after you exit from the editor. The DOS and other editors append an empty line when you press the <Enter> key at the end of the last input line. So, either you do not press <Enter> at the end of the last line (the cursor then stays at the far right ON the last line), or you insert an end-of-file [EOF] character in column 1 after the last input line. To enter [EOF], press Alt-2-6 (press 2 and then 6 on the NUMERIC KEYPAD while holding down the Alt key); you should then see a small right arrow on the screen.
It is recommended to save the file under a name with the extension PRE, eg, as SAMPLE.PRE. It is convenient, although not required, to use the same file name for the input files of a given problem but distinguish datafile and pedfile by using different extensions, dat and pre.
The sample pedigree file so created, SAMPLE.PRE, must now be processed by the MAKEPED program to make it suitable for input to the analysis programs. Invoke the MAKEPED program (actually, the MAKEPED.BAT file) with the input and output file names on the command line, for example, enter
MAKEPED SAMPLE.PRE SAMPLE.PED N
(upper or lower case) where the last N tells the program that no loops are present and that probands should be selected automatically. If N is omitted, follow directions issued by program. Recommended further responses:
Loops present? → n (unless your pedigree contains loops)
Should probands be selected automatically? → y.
If a pedigree contains a marriage or consanguinity loop, answer Y to the corresponding question from the MAKEPED program and indicate one individual per pedigree at which the loop should be broken. If more than one loop is present in any one pedigree (the maximum number of loops is specified by the constant MAXLOOP), proceed as above and identify as many individuals in each pedigree as necessary at which loops should be broken. For example, if in pedigree 1, loops should be broken at individuals 5 and 9, your interaction with the MAKEPED program would look as follows:
Pedigree → 1
Person → 5
Pedigree → 1
Person → 9
Pedigree → 0
MAKEPED will then duplicate each of these individuals and will assign the same positive number (different for each pair) in the proband field (column) to the resulting two duplicated individuals. After exiting from MAKEPED, read the pedigree file into your text editor and verify that MAKEPED has made the appropriate duplications and entries in the proband field. If a duplicate individual is to be the proband, this individual must correspond to the first loop to be broken, and the proband field for the two duplicates has to contain a 1 and a 2 (this rule also applies to a single loop only).
Note that for a pedigree file to be suitable for use by the analysis programs, each individual within a pedigree must be numbered sequentially from 1 through n, except for duplicate individuals (loops broken) who can be out of order, where n is the total number of individuals (including duplicated individuals) in that pedigree. Pedigree id’s, too, must be numbers, but they need not be sequential and can be in any order. It is the MAKEPED program’s job to bring pedigrees into this form required by the LINKAGE programs.
Two example input files (already processed by the MAKEPED program) are provided. PEDIN.DAT contains three-generation pedigrees and one non-CEPH pedigree; PEDIN3.DAT contains only two-generation and three-generation pedigrees and is suitable for testing out the 3-generation programs.
As pointed out above, it is recommended to use the same file names for the same problem but distinguish the associated datafile and pedigree files with the extensions DAT, PRE, and PED, respectively, where PRE refers to the preliminary pedigree file and PED to the one processed by the MAKEPED program. For example, in a study of CF families, the three files would be named CF.DAT, CF.PRE, and CF.PED. For families without loops and automatic proband designation, a third parameter, n, may be given on the command line which tells MAKEPED that no loops are present and that all probands should be chosen automatically. Thus, you might enter: MAKEPED SAMPLE.PRE SAMPLE.PED N.
When loops exist in a pedigree and are not declared in MAKEPED, this error may or may not provoke error messages by the analysis programs. Thus, an undetected loop may lead to an apparently normal termination of the programs yet the resulting likelihoods can be completely wrong. To avoid such problems, a program called LOOPS was developed by Xiaoli Xie. It detects marriage and consanguinity loops and is automatically invoked after each run of the MAKEPED program.
The datafile should reflect the loci given for each individual, where the loci are ordered corresponding to the order of the phenotypes in the pedigree file. The datafile is best created using the PREPLINK program. After PREPLINK is invoked, it will present various menus with default assumptions on number of loci, locus types, etc. Proceed in the following manner:
(1) Choose the number of loci as present in your pedigree file. When prompted to furnish information on new loci beyond locus 2, simply accept default information, ie, exit and go to next higher locus. When asked for the locus order, simply enter 0 (for 1 2 3 etc.), since the particular chromosomal order will be given in the analysis program (LCP) anyway.
(2) Select locus types. It is important to do this first, before any more specific locus descriptions are given. Changing a locus type will set most other locus parameters back to their default value.
(3) For each locus, look at its parameters ("see or modify a locus") and adjust where needed. For example, for a disease locus, you may want to adjust gene frequencies to 0.99 and 0.01 so that the disease allele is allele number 2. Generally, choose allele 2 as the disease allele.
(4) If everything is correct, go to the main menu and save the file ("write datafile"), preferably with the extension DAT, for example, under the name of SAMPLE.DAT, corresponding to SAMPLE.PED. Exit from PREPLINK. Should you need to modify a previously created datafile, simply invoke PREPLINK and read in that datafile.
Note that various parameters need not be set in the PREPLINK program as they must be given in the LCP program anyway. These are, for example, locus order, program used, and recombination fractions.
To modify an existing datafile, invoke PREPLINK and read in that file. If parameters other than recombination fractions are to be estimated in the ILINK program, you will need to modify the datafile in your text editor after leaving the PREPLINK program. The last line of the datafile (for an ILINK run) contains a series of 1’s and 0’s indicating whether or not a particular parameter should be estimated, that parameter being defined by the order of appearance of the number 1 or 0 (see manual for full details). For example, with 2 loci, if male recombination and female-to-male map distance are to be estimated, there should be two 1’s on the last line of the datafile.
On the second but last line, the number given identifies the locus which may have iterated parameters such as gene frequencies. In this case (only recombination fractions estimated), the value of that number is irrelevant as no locus-specific parameters are estimated. Hint: If no locus-specific parameters are to be estimated, choose a "locus with iterated parameters" with only a small number or no penetrance classes since these may then potentially be estimated, which calls for a large value of the constant MAXN.
Two sample datafiles are provided: DATAIN.DAT may be used in connection with PEDIN.DAT, and DATAIN3.DAT corresponds to PEDIN3.DAT (3-generation families).
The LCP program prepares the data for a series of production runs. You will be able to make various choices, eg, loci to be used, and to set parameter values such as recombination fractions. All these choices will be saved in a batch file (command file) that you can run by typing its name after exiting from the LCP program. The default name of that command file is PEDIN.BAT, so will have to type PEDIN in your command box to run this file. To start up LCP, simply type LCP in your command box. alternatively, you may type LCPWIN, which invokes a differently compiled version of this program. However, LCPWIN must use the program-specific keys for text manipulation (see explanations on the screen).
After you invoked LCP, change the file names presented on the first screen as needed. Usually, you will only have to adjust the names of your pedigree file and datafile (parameter file). When you have chosen these file names, move back and forth among the screens with the PgDn and PgUp keys. However, watch for the screen identified by the title, COMMAND SCREEN, shown in reverse video. Pressing PgDn on such a command screen will save in the batch file the choices you just made, and failure to press PgDn on a command screen will not save these choices. Leave the LCP program by pressing Ctrl-Z.
To execute the runs you selected in LCP, enter the name of the batch file (PEDIN by default). If nothing happens, you failed to press the PgDn key on the Command screen in which case you have to invoke LCP again and repeat the selections desired.
Note the following feature of LCP: When choosing ILINK as the analysis program, generally all recombination fractions between loci will be estimated. If you want to keep some of them fixed at their initial value, enter the recombination fraction with an equal sign in front of it.
You may inspect the PEDIN.BAT file with your text editor. It consists of a sequence of commands (DOS commands and calls to programs). Essentially, it extracts loci information from your input files and prepares new input files (called datafile.dat and pedfile.dat) for the Unknown program and then invokes the analysis program. After the runs are completed, all intermediate files are deleted. If you do not want intermediate files deleted, you have to invoke the command file with the command line parameter NODELETE, eg, by entering PEDIN NODELETE. One reason for doing that would be, for example, to retain the files (DATAFILE.DAT, PEDFILE.DAT, IPEDFILE.DAT, SPEEDFIL.DAT) containing the loci extracted from the original files and to modify DATAFILE.DAT so that parameters other than recombination fractions can be estimated by ILINK; currently, this cannot be done through LCP. After the runs invoked by PEDIN have completed you may see a message at the end saying the "speedfile.dat" file was not found. Just disregard this message.
LCP cannot exploit all the features of the analysis programs (MLINK, LINKMAP, ILINK). For example, a female/male distance ratio different from 1 is not allowed for MLINK in LCP although the MLINK program when used directly will accept any such ratio. Also, haplotype frequencies cannot presently be specified through LCP.
The programs may not carry out calculations when only a single locus is used. For such cases, expand the data by adding a dummy marker locus at which everybody is homozygous. Also, if a single individual should be part of your pedigree data, add two parents with unknown phenotypes and have these three individuals form one pedigree.
Most of the discussion below refers to Free Pascal. Differences to other Pascal versions are noted where necessary.
A number of constants may be set by the user prior to recompiling the programs. These constants define upper limits for number of loci, number of alleles per locus, etc. They reside in the CONST section of the main programs, for example, MLINK.PAS. Change the appropriate number; for example, change MAXLIAB = 20 to MAXLIAB = 30 if this is what you need. Then recompile the programs (see below).
The programs discussed here have been compiled with Free Pascal, which is compatible with Borland/Turbo Pascal 7.0 but is much less restricted than Borland Pascal. Large programs may be compiled with FreePascal.
In Free Pascal, the ERRTRAP procedure reports errors in plain English rather than only providing error numbers (exception: stack overflow; see below). Some of the less than obvious error messages are explained below.
Range check error. One of the constants is too small for the problem to be analyzed. Check each of these constants. For example, the number of haplotypes, h, may have to be as large as the product of the number of alleles for all loci. This error message may occasionally be quite cryptic and it may be difficult to determine which of the constants must be increased. For example, in ILINK, having a large number of penetrance classes requires a high value of MAXN, the max. number of parameters that can be estimated in ILINK, since penetrances may potentially be estimated in ILINK (if at the end of the datafile, the locus with iterated parameters is the one for which penetrance classes are defined).
Stack overflow (error number 202). The program ran out of stack space. This may occur when the stack segment is too small to hold all local variables in which case one must increase the stack size in the M compiler switch (the first of the three numbers in curly brackets) at the beginning of the main program. However, the stack segment is usually large enough and the most common reason for the occurrence of this error is the presence of an undeclared loop in a pedigree.
Heap overflow. There is not enough free (dynamically allocated) memory to hold all the data. This error should only occur when you compile for DOS real mode. A program running in DOS protected mode or under Windows can address up to 16MB of memory. To reduce memory requirements the following actions may be taken:
1) Reduce program
constants to their smallest possible values.
2) Analyze only
one pedigree at a time and set the max. number of pedigrees to 1.
3)
Set the compiler switch to R–. Note that this may freeze the
computer when an array bound is exceeded.
Data segment too large. The variables and arrays occupy too much memory. Reduce some of the program constants to make array sizes smaller, or go from double to single precision. It may happen that for the same programs this error occurs when compiling for Windows but not for DOS.
This batch file (Pascal program runlink in Linux) allows running any one of the Linkage programs without going through the LCP shell provided that all the loci in the data file are to be analyzed (no possibility of extracting loci). To initiate this batch file, execute the command
RUN DATNAME PEDNAME PROGNAME
where DATNAME is the name of the file holding the locus descriptions (the datafile, as processed by the PREPLINK program), PEDNAME refers to the file holding the pedigree data (as processed by the MAKEPED program), and PROGNAME is the name of the program to be used.
The major reason for using the RUN batch file is to be able to make use of some features not implemented in LCP (see end of section on LCP, above), in particular, haplotype frequencies which may be important in risk calculation.
The LINKLODS program reads output (FINAL.OUT file) from the LINKMAP or MLINK (LINKAGE) program and, for each family, converts log likelihoods to lod scores. Lod scores may also be obtained as an option from the LRP program but using the LINKLODS program may be more straightforward.
In the input file (FINAL.OUT) to LINKLODS, for a collection of families, an initial set of likelihoods with one of the theta values being equal to 0.5 must precede those sets of likelihoods for which lod scores should be calculated. Several such initial ‘baseline’ sets of likelihoods may occur throughout the input file.
Resulting lod scores will be written to the file FINAL.LOD, and an existing file by that name will be overwritten.
Notice that the LINKLODS program makes certain rigid assumptions on the structure of the input file as produced by the LINKAGE programs. For example, the first likelihood must be on the fourth line after the line, which lists the theta values. Therefore, if the input file has been manipulated, the LINKLODS program may no longer be able to process it properly and will issue error messages.
In the LIPED program,
each phenotype is associated with an array of penetrances, that is,
the conditional probabilities that the phenotype is observed given a
genotype. In the Linkage programs, one may code phenotypes in several
ways, depending on the type of locus considered (binary factors,
affection status or quantitative phenotypes locus). With a binary
factors locus, one may code for codominant or dominant phenotypes but
not both types mixed. This sometimes poses a problem, for example, in
the following situation. Assume a locus with two alleles, A and B,
whose individual presence in a person can usually be detected
(codominant situation). Sometimes, however, a test is used that
detects A only (dominant situation). Using conditional probabilities
(penetrances), this situation is represented in LIPED as follows:
|
Dominant |
Dominant |
Codominant |
Codominant |
Codominant |
Genotype |
A+ |
A– |
AA |
AB |
BB |
A/A |
1 |
0 |
1 |
0 |
0 |
A/B |
1 |
0 |
0 |
1 |
0 |
B/B |
0 |
1 |
0 |
0 |
1 |
In the Linkage programs, it is not possible to allow for all these phenotypes at a binary factor locus. A simple general solution for using tables such as the one above in the Linkage programs is as follows. Define the locus in question as an affection status locus with as many liability classes as there are columns in the table above. In the pedigree file of the Linkage programs, each phenotype is then represented by two numbers, 2 i, where i is the column number in the table above, that is, each individual is defined to be affected, except that the unknown phenotype is coded as 0 1. Each column in that table represents a liability class whose penetrances (the entries in the column) must be furnished in the datafile.
This coding scheme may
be wasteful in the number of liability classes needed. Depending on
the particular situation, one may be able to apply a similar coding
scheme requiring a smaller number of penetrance classes. In the given
example, above, a possible solution is the following. Define an
individual as affected when the A allele is detected, and
distinguish 3 liability classes, depending on whether the A
allele is seen in the dominant or codominant situation. The
correspondence between phenotypes in Liped and in Linkage is then as
follows:
Liped |
A+ |
A– |
AA |
AB |
BB |
Linkage |
2 1 |
1 1 |
2 2 |
2 3 |
1 1 |
In the datafile, the
following penetrances must be given for each genotype and each
liability class, 1-3:
Genotype |
1 |
2 |
3 |
A/A |
1 |
1 |
0 |
A/B |
1 |
0 |
1 |
B/B |
0 |
0 |
0 |
Generally, in the LINKAGE programs, one would code phenotypes (CK levels for females, aff. or unaff. for males) as a quantitative trait locus. A special case in which simple coding as an affection status locus is possible is the following.
Males:
affected or not affected
Females
(not affected): CK+ or CK- (CK = creatine kinase level, high or low)
Alleles:
D = disease allele, d = normal allele
Possible coding scheme for LIPED:
-------------------------------
Phenotypes
------------------
females
males
----------
---------- ------
Genotype
AF CK+ CK- AFF NA <- phenotype codes
-------------------------------
D/D
or D/y 1 0 0
1 0
D/d
0 .66 .34 * *
d/d
or d/y 0 .05 .95 0 1
-------------------------------
Unknown: special phenotype, eg, blank.
AF = affected
female
* = value irrelevant (X-linked case)
In LINKAGE, this case may be treated by the general method outlined in section 1, above, leading to 3 penetrance classes. To code for such a situation with a single liability class, one may adopt the following coding scheme in the LINKAGE programs:
Define disease status as having an elevated CK value. This works fine when only unaffected females are observed (usual situation).
--------------------------
Phenotypes
in
LIPED
MLINK (in pedfile)
--------------------------
CK+
2
\ unaffected
CK-
1
/ females
AFF
2
affected male
NA
1
unaffected male
--------------------------
Unknown phenotype: 0
In the datafile, the penetrances (= probabilities of being affected) are given as follows:
Females
Males
Genotype
Penetrance Allele Penetrance
-------------------
-----------------
1
/ 1 1
1 1
1
/ 2 .66
2 0
2
/ 2 .05
-----------------
-------------------
Note that "affected" (CK+) females potentially are homozygous for the disease allele (CK– females still cannot be homozygous for the disease allele). If this is undesired, or if truly affected females are present, one better uses the scheme with 3 penetrance classes corresponding to the LIPED notation.
To identify an obligate heterozygote in LIPED, one might label such an individual with the phenotype NA2 and define the following penetrances:
Phenotypes
Genotype
AFF NA1 NA2 <- phenotype codes
---------------------
D
/ D 1 0 0
D
/ d 0 1 1
d
/ d 0 1 0
---------------------
In LINKAGE, again, this case may be treated as outlined in section 1, above. Using only 2 rather than 3 liability classes, one may define these in the datafile as follows:
Penetrance
class
Genotype
1 2
-------------------------
D
/ D 1 1
D
/ d 0 0
d
/ d 0 1
-------------------------
In the pedfile, the following phenotype codes are used:
Phenotypes
in
LIPED
MLINK
-------------
AFF
2 1
NA1
1 1
NA2
1 2
-------------
In multipoint linkage analysis, for a given family pedigree, it sometimes happens that all individuals have not been tested at one of the loci and thus have phenotype ‘unknown’ at that locus. In the present implementation of the Linkage programs, the presence of many unknowns slows down execution speed. There is, however, a simple remedy. If everybody in that pedigree is given the same homozygous phenotype (uniquely identifying the homozygous genotype), this will not change the lod score but will considerably increase computing speed. This feature has now been implemented in the UNKNOWN program except when allele frequencies are to be estimated.
With new data and several marker loci, it is often useful to first find or confirm estimates of interlocus distances, that is, to run the ILINK program for the marker loci only. However, before doing that, it is a good idea to do one run with the MLINK program to verify that the likelihood is nonzero in all pedigrees. If the likelihood is zero in one or more pedigrees, for example, due to genotype inconsistencies, then the ILINK program will still try to maximize the likelihood and will, of course, fail but only after running for a possibly very long time.
With X-linked recessive deleterious traits, for a female founder individual (no parents in pedigree), the prior probability, q, of being a carrier of the disease gene is a multiple of the mutation rate, μ. For example, in Duchenne muscular dystrophy (DMD), q = 4μ (Murphy and Chase, "Principles of Genetic Counseling"). In the likelihood calculation of pedigree data, on the other hand, the prior probability of a founder’s genotype is always determined by the gene frequency, p. The prior probability that a founder woman is heterozygous is given by 2p(1 – p). To implement the prior probability, q, that she is heterozygous for an X-linked recessive deleterious gene, in the likelihood calculation, one must choose the gene frequency of the deleterious gene, p, such that q = 2p(1 – p) or, approximately, p = q/2. For example, in DMD, when the mutation rate is assumed to be equal to μ, the gene frequency of the disease allele must be taken to be equal to p = 2μ.
Input files to the LINKAGE programs must be created in ASCII format (text files). Word processors such as WordPerfect or Word write files in their own format but are capable for producing text files when specifically instructed to do so. In Windows, a convenient text editor is notepad. Also, the freely available Crimson Editor is highly recommended. In Ubuntu Linux, the standard text editor, gedit, is just fine. I also like leafedit, which is analogous to Windows’ notepad; also, joe and pico are useful although somewhat simple editors.
In some of the input files to the LINKAGE programs, it is important that no empty (blank) lines follow the last input line. To avoid such trailing blank lines, press Ctrl-End, which will position the cursor at the end of the file. If this position is not in column 1 of the line immediately following the last input line, press the Backspace key repeatedly until the cursor is all the way to the left on the last input line.
1 Dausset J, Cann H, Cohen D, Lathrop M, Lalouel JM, White R: Centre d'etude du polymorphisme humain (ceph): Collaborative genetic mapping of the human genome. Genomics 1990;6:575-577.
2 Ott J: Estimation of the recombination fraction in human pedigrees: Efficient computation of the likelihood for human linkage studies. Am J Hum Genet 1974;26:588-597.
3 Lathrop GM, Lalouel JM, Julier C, Ott J: Strategies for multilocus linkage analysis in humans. Proc Natl Acad Sci U S A 1984;81:3443-3446.
4 Ming JE, Muenke M: Multiple hits during early embryonic development: Digenic diseases and holoprosencephaly. Am J Hum Genet 2002;71:1017-1032.
5 Cottingham RW, Jr., Idury RM, Schaffer AA: Faster sequential genetic linkage computations. Am J Hum Genet 1993;53:252-263.
6 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: Plink: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81:559-575.