Pseudomarker: Running one marker at a time

Jurg Ott / 2 June 2018

The Pseudomarker program [1-3] can estimate parameters for linkage and/or linkage disequilibrium (association) in family data and case-control data. It is based on the Ilink program, which is part of the Linkage package. I have taken the official Pseudomarker program manual and modified it with updated information.

Current version of runPM program: 26 May 2018

This program estimates parameters in a statistically sophisticated manner. Some of its features are outlined below.

Program features

Pseudomarker is available for download and runs on 64 bit Linux PCs. Its main attraction is that it accommodates both family and case-control data and draws information from genetic linkage as well as genetic association. Also, it estimates marker allele frequencies both under linkage and no linkage (treats them as nuisance parameters) so that the analysis becomes virtually independent of marker allele frequencies.

The Pseudomarker program requires error-free data and can check for mendelian inconsistencies. However, in larger families it tends to miss inconsistencies and then crashes when it encounters them. To avoid this problem I wrote a shell program, runPM, that runs Pseudomarker for one marker at a time and prepares an output file containing all marker results. It can run Pseudomarker under dominant (D > +) or recessive (+ > D) modes of inheritance (D = disease allele, + = wild type allele), and under (1) affected-only or (2) incomplete penetrance models. Parameters for these models are shown below and will automatically be implemented by the program.

Dominant

 

 

 

 

Model 1

Model 2

 Genotype

++

+D

DD

p =

0.005

 Frequency

(1 - p)2

2p(1 - p)

p2

f1 =

0

0.01

 Penetrance

f1

f2

f2

f2 =

0.00001

0.96

 

 

Prevalence,

K =

0.02

Recessive

 

 

 

 

Model 1

Model 2

 Genotype

++

+D

DD

p =

0.11

 Frequency

(1 - p)2

2p(1 - p)

p2

f1 =

0

0.01

 Penetrance

f1

f1

f2

f2 =

0.00001

0.83

 

 

Prevalence,

K =

0.02

Output messages from Pseudomarker can be somewhat cryptic. In particular, “this pedigree is not whole” requires some explanation – it refers to the fact that for a given family, relationships are incorrectly specified. For example, a child may have only one parent, or an individual has no relation with another individual with the same family ID.

The runPM program

Written in Pascal, this program is available as a Linux executable. It has been tested in 64 bit Kubuntu. Here are brief instructions on how to use it.

Start with your family data in plink format so you will have files like data.map and data.ped. Make certain that family members are contiguous although, within a family, individuals can occur in any order. Substitute data with whatever file names you are using. Then run plink2 with the following command line options:

--file data --recode 12 transpose --output-missing-phenotype 0 --out data2

where, again, you may choose any name for data2. This operation will result in two new files, data2.tped and data2.tfam, which are required as input files to the runPM (or runML, see below) program; it will ask for an input file name to which you should reply data2 (or whatever name you have been using). Then simply follow instructions. You may want to run 10 markers initially, then another 10 or so to verify that the output file generated (resultsdata2dom.out or resultsdata2rec.out) looks ok. Each new marker analyzed will append a new line to the output. This involves some administrative overhead but provides for a secure analysis. Markers with mendel errors and monomorphic markers are skipped, where the latter are defined as having fewer than 10 known alleles and fewer than 2 minor alleles (these two numbers are program constants and may be changed).

The --output-missing-phenotype 0 option is important; without it, unknown phenotypes will be coded as -9. In addition to the options listed above you may want to use the --geno option to eliminate variants with call rates less than 0.90, for example, which is the plink default, --geno 0.1. Use of the --maf option is discouraged as linkage analysis has no problem with low marker allele frequencies.

The runPM program repeatedly calls the Pseudomarker program and expects relevant program files to reside in a specific folder (directory). The following executables must reside in /usr/local/bin:

pseudomarker, unknownpseudo, makepedpseudo, ilinkpseudo, mlinkpseudo.

These files are included in the pseudomarker download package. In some situations, every marker leads to an error, in which case the output file will be empty save for some header lines. It may then be useful to capture screen output, for example, by invoking the program as runPM <input.txt >output.txt. A sample input.txt file is provided in the program package.

Included in the runPM program package is a small test dataset, Tun.tfam and Tun.tped. It may be run by typing pseudomarker <runPM.param. Please be prepared to expect long running times with pseudomarker; for some of my datasets the program took more than a week to complete analysis.

The runML program

Particularly when family pedigrees contain loops, pseudomarker can be very slow. As an initial look at the data and to choose a subset of markers to run with pseudomarker, the runML program works with mlink and is rather fast. Input files have the same structure as those for runPM. For example, you may type runML <runPM.param. The following executables must reside in /usr/local/bin:

unknownpseudo, makepedpseudo, mlinkpas, unknownpas,

where the first two files come with the pseudomarker program and the last two files are included in the runPM program package. Please note the runML lacks important features of pseudomarker (treating marker allele frequencies as nisance parameters, allowing for LD between marker and disease), so results will not be as efficient as those from pseudomarker.

Output files from runPM and runML are in Linux format. To view them in Windows, the Notepad program is not suitable as it does not recognize Linux line breaks. Instead, the Notepad+ program is highly recommended, which can accommodate large files. The Windows Wordpad program works similarly when you disable Word wrap.

As an alternative to runML, use the two-stage program, which is also available from the pseudomarker website. Copy all files in the twostage folder to /usr/local/bin, then type twostage.py in your Linux terminal window.

References

1. Gertz EM, Hiekkalinna T, Digabel SL, Audet C, Terwilliger JD, et al. (2014) PSEUDOMARKER 2.0: efficient computation of likelihoods using NOMAD. BMC Bioinformatics 15: 47.

2. Goring HH, Terwilliger JD (2000) Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. Am J Hum Genet 66: 1310-1327.

3. Hiekkalinna T, Schaffer AA, Lambert B, Norrgrann P, Goring HH, et al. (2011) PSEUDOMARKER: a powerful program for joint linkage and/or linkage disequilibrium analysis on mixtures of singletons and related individuals. Hum Hered 71: 256-266.