Rockefeller University New York

18 August 2015

HOMOG program

Additional Programs

Construction of Support Intervals

References

This documentation describes various forms of the homogeneity (admixture) test but only the basic HOMOG program is furnished as a downloadable package. The additional programs are somewhat outdated and presumably no longer of interest. They may be obtained from the author upon request.

All HOMOG programs analyze heterogeneity (two or more disease loci) with respect to single marker loci or known maps of markers. In the first case, the programs expect lod scores between disease phenotype and the marker(s), and in the second case they expect multipoint lod scores for disease versus a known map(s) of markers. HOMOG carries out a homogeneity test (A-test) under the following alternative hypothesis: Two family types, one with linkage between a trait (or any gene locus for that matter) to a marker or map of markers, the other without linkage. For more information see Ott (1999). What follows are instructions on setting up the input file for the HOMOG program.

Line 1: Title line

Line 2: N STEPSIZE LDIFF where

- N = number of θ values at which lod scores are available or should be computed. Omit lod = 0 at θ = 0.5.
- STEPSIZE = step size at which α values are incremented in the search over the likelihood surface (for example, 0.05).
- LDIFF (optional) = difference in log likelihood, used in the construction of support intervals. LDIFF is optional; if it is not given (or when LDIFF = 0) no support intervals will be computed. In regular situations, the joint support interval for α and θ corresponds to an approximate 95% confidence region when Ldiff = 3.00.

Line 3: OUT ALOW where the OUTput option is set as follows:

OUT Table of lnL(alpha,theta) Lods for families

0 no no

1 no yes

2 yes no

3 yes yes

ALOW = lowest value of α analyzed (e.g., ALOW=0)

Line 4: N recombination fraction (θ) values, e.g., 0.01, 0.05, 0.1, etc. At these points, lod scores will be computed. A large number N of θs (e.g., 10) will yield more accurate results than a small number. Note that these θ values are not used by the program other than to identify the lod scores.

Line 5: NFAM = number of families for which lods are provided

Line 6: Lod scores for family 1. Lods smaller than -99 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.

Repeat line 6 for families 2, 3, etc.

Sample data: The file HOMOG.DAT shows a specific example based on the analysis by Morton (1956) on Elliptocytosis vs. Rh and is reproduced below.

5 0.05 2.0

3 0

0.01 0.05 0.1 0.2 0.3

14

0.292 0.258 0.215 0.134 0.064

-9.548 -4.232 -2.182 -0.540 0.018

5.022 4.682 4.236 3.264 2.165

1.560 2.321 2.317 1.953 1.376

2.489 3.304 2.735 2.109 1.274

0.276 0.260 0.238 0.191 0.138

0.992 0.908 0.805 0.597 0.380

2.657 2.442 2.161 1.554 0.892

-0.110 -0.091 -0.070 -0.038 -0.016

-7.307 -3.840 -2.399 -1.086 -0.455

-1.402 -0.721 -0.444 -0.194 -0.076

-0.300 -0.101 -0.020 0.059 0.046

-4.036 -2.020 -1.220 -0.531 -0.224

-1.118 -0.471 -0.235 -0.064 -0.014

HOMOG1 is an extension of the homogeneity test, with the following alternative hypothesis: Two family types, one linked the other unlinked, plus a possible sex difference in the recombination fraction. This program comes in two versions, depending on whether the lods for the two sexes are independent or not: HOMOG1a reads independent lods, HOMOG1b reads dependent lods. For the same problem, with independent lods, HOMOG1a is more efficient in terms of memory space.

HOMOG2 is another extension of the homogeneity test. There are two postulated trait loci at different map positions (or different map distances from a marker), with only one of the trait loci occurring in any one family. That is, there are two family types, both with linkage. The recombination fraction between trait locus 1 and the marker is θ

HOMOG3 and HOMOG4 are analogous to HOMOG2 but specify 3 or 4 family types (trait loci). They only calculate the maximum log likelihood and the ML estimates but no support intervals.

HOMOG3R is a specialized version of the HOMOG3 program. It calculates log likelihoods under the assumption of two trait loci, where in a proportion α

HOMOGM is an extension of HOMOG3R to any number of trait loci. It was written in C by Ajita Bhat and uses a numerical maximization routine to maximize the likelihood over parameter values.

POINT4 is interactive and calculates, for a mixture of up to 4 family types, the log likelihood at specific parameter values.

Before running one of the programs, an input file must be constructed according to the rules given below. Input and output files have fixed names. For example, for the HOMOG program, the input file is HOMOG.DAT and the output file is HOMOG.OUT. Output of lod scores and log likelihoods is preset to a width of 80 columns unless the input quantity LL is read.

In each of the homogeneity tests, groups or types of families are assumed where any given family cannot unequivocally be assigned to either of these types. The statistical hypotheses referred to in the programs are defined as follows:

- H
_{0}is the very basic hypothesis of both homogeneity and absence of linkage. - H
_{1}is the usual null hypothesis of homogeneity, i.e., all families belong to a single family type with linkage between the main locus and the marker locus. - H
_{2}refers to the hypothesis of heterogeneity, with two family types, type 1 and type 2, where a is the proportion of families of type 1 or, equivalently, is the probability that a family belongs to type 1. Family type 1 is characterized by a recombination fraction θ (programs HOMOG, HOMOG1a, and HOMOG1b) or θ_{1}(program HOMOG2) while in families of type 2, the recombination fraction is assumed to be equal to ½ (programs HOMOG, HOMOG1a, and HOMOG1b) or θ_{2}(program HOMOG2, θ_{1}< θ_{2}< 0.5). - H
_{3}refers to a particular type of "homogeneity": There is only one family type with recombination fraction θ, but allowance is made for a difference in the recombination fraction between the sexes. - H
_{4}is the heterogeneity alternative to H_{3}, i.e., there are two family types with recombination fractions of θ and ½ and, in addition, there might also be a sex difference in θ between the sexes.

The relationship between the hypotheses 1 through 4 may be displayed as follows:

----------------------------------------------------------

Recombination fraction α = 1 α < 1

in the two sexes (Homogeneity) (Heterogeneity)

----------------------------------------------------------

equal H

unequal H

----------------------------------------------------------

In the programs and on output, genetic distance is labeled in terms of the recombination fraction, θ. However, the programs may also be used when the genetic distances are in centimorgans, x. Free recombination (infinite map distance) is labeled as θ = 99 or -99.

Tests of one hypothesis against another are carried out as likelihood ratio tests, where the likelihood ratio with respect to the two hypotheses is calculated. Asymptotic p-values based on theoretical chi-square distributions are no longer reported because in most of these tests they are not really appropriate.

In the HOMOG programs, support regions/intervals are computed as follows. First, the program determines the highest Ln likelihood, Lmax, under the most general alternative hypothesis, i.e., the one with the largest number of parameters estimated. Then, the program recalculates likelihoods and marks all those parameter values which have an Ln likelihood larger than Lmax – Ldiff (Ln likelihood within Ldiff of the maximum). The marked parameter values then form the support interval. Such a support interval is called an Ldiff-unit support interval. The table below gives examples for the correspondence between Ldiff and the associated likelihood ratio.

Under regular conditions, support intervals may be interpreted as approximate confidence intervals. For example, with two-point analysis and two family types (one linked and the other unlinked), 2 × Ldiff approximately follows a chi-square distribution on 1 df when no heterogeneity is present. In multipoint situations, however, the approximation by chi-square is unreliable because the distribution of the test statistic is unknown.

----------------------------------------------

Difference in units of Approx.

Likelihood ----------------------- p-value

ratio (LR) ln(LR)=Ldiff lod score (1 df)

----------------------------------------------

7.39 2.00 0.87 .046

10 2.30 1 .032

20 3.00 1.30 .014

50 3.91 1.70 .005

100 4.60 2 .002

1000 6.91 3 .0002

----------------------------------------------

----------------------------------------------------------

Male and female Homogeneity (one Heterogeneity (two

rec. fractions family type) family types)

----------------------------------------------------------

equal H

unequal H

----------------------------------------------------------

The test of H

Input to the HOMOG1a program is similar to that for the HOMOG program and is as given in the following table. But refer to the notes below this table. File names are HOMOG1A.DAT for input and HOMOG1A.OUT for output.

Line 1: Title line

Line 2: NM NF STEPSIZE LDIFF where

- NM = no. of male θ values, tm, at which lod scores are available. Do not count θ = 0.5.
- NF = no. of female θ values, t
_{f}, at which lod scores are available. Do not count θ = 0.5. - STEPSIZE = step size at which α values are incremented in the search over the likelihood surface (e.g., 0.05).
- LDIFF (optional) = difference in log likelihood, used in the construction of support intervals (see section 1, above). In regular situations, the joint support interval for α and θ corresponds to an approximate 95% confidence region when Ldiff = 3.91.

Line 3: OUT ALOW LL where the OUTput option is set as follows (Warning: the table of lnL(α, θ) contains [NM × NF – 1]/STEPSIZE lines):

OUT Table of lnL(alpha,theta) Lods for families

0 no no

1 no yes

2 yes no

3 yes yes

ALOW = lowest value of alpha analyzed (e.g., ALOW = 0)

LL (optional) = line length of output (may be missing)

Line 4: NM male θ values, t

Line 5: NF female θ values, t

Line 6: NFAM = number of families

Line 7: NM male lod scores for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.

Line 8: NF female lod scores for family 1.

Repeat lines 7 and 8 for families 2, 3, etc.

As to the θ values at which lod scores are available in each family, the user is essentially free which theta values to choose. However, he or she should make sure that there is a sufficiently large number of pairs both with t

Sample data: The file HOMOG1A.DAT provides an example of data that may be analyzed for heterogeneity as well as for a sex difference in the recombination fraction. Data quoted in Ott (1986).

Line 1: Title line

Line 2: N STEPSIZE LDIFF where

- N = no. of pairs of θ values, t
_{m}and t_{f}(male and female recombination fractions), at which lod scores are available. Do not count θ = 0.5. - STEPSIZE = step size at which α values are incremented in the search over the likelihood surface (e.g., 0.05).
- LDIFF (optional) = difference in log likelihood, used in the construction of support intervals (see section 1, above). In regular situations, the joint support interval for α and θ corresponds to an approximate 95% confidence region when Ldiff = 3.91.

OUT Table of lnL(alpha,theta) Lods for families

---------------------------------------------------

0 no no

1 no yes

2 yes no

3 yes yes

ALOW = lowest value of alpha analyzed (e.g., ALOW=0)

LL (optional) = line length of output (may be missing)

Line 4: N pairs of θ values, t

Line 5: NFAM = number of families

Line 6: Lod scores (N of them) for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.

Repeat line 6 for families 2, 3, etc.

As to the pairs of θ values at which lod scores are available in each family, the user is essentially free which θ values to choose. However, he or she should make sure that there is a sufficiently large number of pairs both with t

tm = 0.5 |

0.3 | x x

0.1 | x x x

0.05| x x x x

0 | x x x x x

---------+-----------------------

tf = 0 0.05 0.1 0.3 0.5

On input, for example, the following θ values would have to be given on line(s) 4:

0 0 0 0.05 0 0.1 0 0.3 0 0.5 0.05 0.05

0.05 0.1 0.05 0.3 0.05 0.5 0.1 0.1 ... 0.3 0.5

Sample data: the file HOMOG1B.DAT contains the same data as the file HOMOG1A.DAT referenced in the previous section except that the joint lod scores, Z(t

Input format is the same as for the HOMOG program and is as given in the following table, but refer to the notes after the table. File names are HOMOG2.DAT for input and HOMOG2.OUT for output.

Line 1: Title line

Line 2: NT STEPSIZE LDIFF where

- NT = no. of θ values (or map distances) at which lod scores are available or should be computed. Omit lod = 0 at θ = 0.5.
- STEPSIZE = step size at which the α values are incremented in the search over the likelihood surface (e.g., 0.05).
- LDIFF (optional) = difference in log likelihood, used in the construction of support intervals. LDIFF is optional. When it is missing, no support intervals will be calculated.

OUT Table of lnL(alpha,theta) Lods for families

---------------------------------------------------

0 no no

1 no yes

2 yes no

3 yes yes

---------------------------------------------------

ALOW = lowest value of a analyzed (e.g., ALOW = 0)

LL = line length of output (optional; if missing: LL = 80)

Line 4: Recombination fraction (θ) values, e.g., 0.01, 0.05, etc. At these points, lod scores will be computed. A rather large number NT of recombination fractions (e.g., 10) will yield more accurate results than a small number.

Line 5: NFAM = number of families for which lod scores are provided.

Line 6: Lod scores for family 1. Lods smaller than -80 are taken to repre-sent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.

Repeat line 6 for each family.

The null hypothesis (H

The HOMOG3 and HOMOG4 programs simply calculate the max. Ln likelihood under the most general hypothesis of heterogeneity. Appropriate significance tests will have to be carried out manually by the user by comparing output from these programs with output from the HOMOG or HOMOG2 programs. Notice that HOMOG3 and HOMOG4 carry out an exhaustive search of the parameter space and may require a large amount of computer time. While they are running, they display the current alpha values so that programs may be interrupted by the user.

Interpreting results of HOMOG3 or HOMOG4 is not straightforward. For example, whenever one of the components (α's) is equal to zero, the associated θ value is irrelevant. Also, there may be more than one parameter constellation with the same maximum likelihood. The HOMOG3 and HOMOG4 programs differ in their output as follows.

In the HOMOG3 program, if the OUTput option (line 3) is set to a value larger than 1, all possible sets of α values will be printed (one set per line), and for each set the maximum likelihood over the θ values will be given along with those θ values at which the maximum occurred.

In the HOMOG4 program, if the OUTput option (line 3) is set to a value larger than 1, a table containing the Ln likelihood for each possible set of parameter values will be written to the output file. WARNING: THIS FILE COULD BE VERY LARGE! For example, when the sample HOMOG.DAT file is analyzed by the HOMOG4 program, the output file will be 1.5MB long. For most practical situations, one should set OUT = 0 on line 3.

Notice that any α component cannot take on the whole range of values from 0 through 1. Also, for computational efficiency, only α

Default file names are HOMOG3R.DAT for input and HOMOG3R.OUT for output.

Line 1: Title line

Line 2: NT1 NT2 STEPSIZE where

- NT1 = no. of θ values (or map locations) at which lod scores are available for trait versus marker 1. Omit lod = 0 at θ = 0.5.
- NT2 analogous for marker 2.
- STEPSIZE = step size at which the α values are incremented in the search over the likelihood surface (e.g., 0.05).

---------------------------------------------------

OUT Table of lnL(alpha,theta) Lods for families

---------------------------------------------------

0 no no

1 no yes

2 yes no

3 yes yes

---------------------------------------------------

The table of lnL(α, θ) will print one line for each pair of α

ALOW = lowest value of α analyzed (e.g., ALOW=0).

Line 4: All NT1 + NT2 θ values, e.g., 0.01, 0.05, etc., that is, the θ values for marker 1 immediately followed by the θ values for marker 2. These values are for identification purposes only and are not used in the calculations. It may thus be useful to distinguish θ values for marker 1 (e.g.. -0.10 or 0.11) from those for marker 2 (e.g.. 0.10). A large number of recombination fractions will yield more accurate results than a small number.

Line 5: NFAM = number of families for which lod scores are provided.

Line 6: The NT1 + NT2 lod scores for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.

Repeat line 6 for each family.

A special situation is given when the two markers near trait loci 1 and 2 are taken to be candidate genes and lod scores are evaluated at θ = 0 only. In this case, the HOMOG3R program will maximize likelihoods over only two θ values, 0 and 0.5. Consider the following input file (another sample data set is provided in the file HOMOG3R.DAT):

Linkage to two candidate genes on different chromosomes

1 1 .05

1 0

-0.01 0.01

4

0.903 -99

2.007 -99

0.601 0.601

-99 1.204

For each of four families, at each of two chromosomes, the file contains lod scores at θ = 0 (identified as -0.01 for marker 1 on chromosome 1 and 0.01 for marker 2 on chromosome 2).

There are three possible hypotheses of homogeneity:

- All families are linked with marker 1 but unlinked with marker 2.
- All families are linked with marker 2 but not with marker 1.
- All families are unlinked with markers 1 and 2.

Program HOMOG3R version 1.70 J. Ott

Heterogeneity -- Three family types, type 1 with linkage to first set of

theta values, type 2 with linkage to second set of theta values (usually

two different chromosomes), type 3 unlinked.

>> Linkage to two candidate genes on different chromosomes <<

Fam. Lod scores

1 0.9030 -99.0000

2 2.0070 -99.0000

3 0.6010 0.6010

4 -99.0000 1.2040

Theta -0.0100 0.0100

Results for different hypotheses (fixed values in parentheses)

Hypothesis a1 a2 a3 t1 t2 lnL

----------------------------------------------------------------------

H1 Heterogeneity 0.65 0.35 0.00 -0.010 0.010 8.9453

H2 Het, a3=0 0.65 0.35 (0) -0.010 0.010 8.9453

H3 Het, a2=0 0.70 (0) 0.30 -0.010 (-99) 5.9688

H4 Het, a1=0 (0) 0.40 0.60 (-99) -0.010 1.7107

H5 Homogeneity, a1=1 (1) (0) (0) -99.000 (-99) 0.0000

H6 Homogeneity, a2=1 (0) (1) (0) (-99) -99.000 0.0000

H7 Homogeneity, a3=1 (0) (0) (1) (-99) (-99) (0)

Evidence for heterogeneity (H1 vs. H5/6/7): Het. versus homogeneity

Difference in Ln(L) = 8.9453

Lik. ratio for heterog. = 7671.7558

Evidence for 2 versus 1 disease locus (H1 vs. H3/4):

Difference in Ln(L) = 2.9765

Lik. ratio for heterog. = 19.6190

Family Conditional prob. of being

no. type 1 type 2 type 3 (under heterogeneity, H1)

1 1.0000 0.0000 0.0000

2 1.0000 0.0000 0.0000

3 0.6500 0.3500 0.0000

4 0.0000 1.0000 0.0000

The program output shows positive likelihoods for hypotheses H

Testing for significance of heterogeneity may be carried out in different ways. Under heterogeneity that allows for 3 components (hypothesis H

The HOMOGM program is written in C. It is compilable on UNIX platforms using a C compiler using the following command:

>gcc -O2 homogm.c -o homogm -lm

or

>cc -O homogm.c -o homogm -lm

For Windows, the executable, homogmpc.exe is available. Also the source code, homogmpc.c is available and may be compiled using the djgpp compiler (http:// www.delorie.com/djgpp/) and the following command:

>gcc -O2 homogmpc.c -o homogmpc -lm

Families with 3 locations ⇐ Title line {Any suitable title of your choice} <Line 1>

6 6 7 ⇐ Number of q values for each locus <Line 2>

0 0 0 0 0 0 0 ⇐ Fixed parameters (hypothesis generation) <Line 3>

0.001 0.100 0.200 0.300 0.400 0.5 ⇐ Theta values for locus 1 <Line 4>

0.001 0.020 0.050 0.100 0.200 0.5 ⇐ Theta values for locus 2 <Line 5>

0.001 0.010 0.060 0.100 0.200 0.300 0.400 ⇐ Theta values for locus 3 <Line 6>

200 ⇐ Number of families <Line 7>

2.107 1.787 1.429 1.023 0.554 0 ⇐ Lod scores for 1st family and 1st locus <Line 8>

.................

.................

-99 -0.058 0.313 0.357 0.237 0 ⇐ Lod score for the 100th family at 1st locus

................

................

-99 -1.675 -0.781 -0.347 -0.092 0 ⇐ Lod scores for 200th family and 1st locus

-99 -0.212 0.046 0.053 0.011 0 ⇐ Lod scores for 1st family and 2nd locus

................

................

-99 -2.092 -0.562 0.022 0.164 0 ⇐ Lod score for the 100th family and 2nd locus

................

................

-99 -1.172 -0.317 0.013 0.094 0 ⇐ Lod score for the 200th family and 2nd locus

-1.4966 -0.5123 -0.2289 0.1758 0.3221 0.4185 0.3627 ⇐ Lod scores for 1st family and 3rd locus

................

................

-10.4953 -6.4992 -5.2995 -3.4091 -2.5406 -1.3876 -0.7413 ⇐ Lod scores for 100th family and 3rd locus

...............

...............

-4.4962 -2.5079 -1.9191 -1.0192 -0.6321 -0.1835 -0.0053 ⇐ Lod score for the 200th family and 3rd locus

Line 2 indicates the number of θ values considered for the study for loci 1, 2 and 3 respectively.

Line 3 parameters (hypothesis option): The number of zeros or ones on this line should correspond to twice the number of loci +1 i.e. 2n + 1, where n is the number of loci. Zero at the corresponding number suggests that the user does not wish to fix that particular value (i.e., that parameter should be estimated); whereas 1 at the corresponding number suggests that the user wishes to fix that particular value (parameter not estimated).

The first n values denote fixing of the θ values for the loci in serial order. The next n values denote fixing of α values for the corresponding loci in serial order. The (2n + 1)-th (i.e. the last) number denotes whether the α value for the unlinked locus is fixed/not-fixed. Any combination of α and θ can be fixed at a time.

Note: If the θ value for any particular locus is fixed to 0.5 then automatically the α value for that locus is fixed to 0. In addition, if n out of (n + 1) α values are fixed, it means that the (n + 1)th α value would be automatically fixed to (1 - sum of all n α values). Also, if n of the (n + 1) α values sum up to one, then the (n + 1)th alpha value is automatically fixed to zero. As an aid, users are provided with the number of free parameters that were available while running the hypotheses in the output file, homogm.out (details given below). Explanation on hypotheses testing can be obtained from the handbook of human genetic linkage (Terwilliger and Ott 1994).

Hypothesis generation: When the program is run, the user is prompted for values of the fixed parameters. Values may be fixed in such a way that values corresponding to θs are always between the minimum and maximum values of that given in the file homogm.dat, and sum of all αs must never be >1.

Some of the examples for running hypotheses are (3 loci):

1). No values are fixed. Line 3 looks like:

0 0 0 0 0 0 0

2). Second α value is fixed to 1. Line 3 looks like:

0 0 0 0 1 0 0

The program prompts for the constant that should be entered as 1.

3). Second θ value is fixed to 0.5. Line 3 looks like:

0 1 0 0 0 0 0

The program prompts for the constant and 0.5 should be entered. In this case, the program will automatically set the second α value to 0.

4). First θ value is fixed to 0.1 and second α value fixed to 0.2.

Line 3 looks like:

1 0 0 0 1 0 0

The program will prompt the user for the value of only these 2 fixed parameters. So the first input by the user should be 0.1 and second input must be 0.2.

5). Unlinked locus is fixed to 0 and first θ value to 0.2. Line 3 looks like:

1 0 0 0 0 0 1

The program will prompt the user for the first θ value which should be fixed to 0.2 and then the value of unlinked loci which in this case should be 0.

6). If the unlinked locus is fixed to be 1, max. likelihood is 0 and no linkage to either loci exists.

Note: The program does not interpolate between two θ values that are specified in the data file homogm.dat. Thus if the user wishes to fix the value of θ to 0.03 then the corresponding lod score (using MLINK) for that particular value has to be included in the input data file. Thus in this example, if the user fixes the value of the second theta to 0.4 the program automatically sets it back to the closest value in the file (i.e. 0.35 in this case).

Lines 4, 5, 6: Contain the list of all possible θ values.

Each line contains θ values for each locus. In this particular example, Line 4 denotes all the possible θ values for locus 1, line 5 for locus 2, and line 6 contains possible θ values for locus 3. The θ values for all loci should not have the upper and lower bounds beyond 0 and 0.5. A higher number of θ values would give more accurate answers of log likelihood. It is not necessary to include θ = 0.5 (for which the lod score is 0 by definition).

Line 7 indicates the number of families; in this case 200.

Lines 8 onward are a list of lod scores for each locus 1, 2, 3 at each θ value. First are the lod scores for all families at locus 1, then those for all families at locus 2, etc. The number of lod scores should match the number of θ values.

Following that is the table of conditional probabilities of each family linked to that particular locus. In this example, family 1 has the highest probability (0.25) to be in the criterion of being unlinked to either of the three loci. Similarly, family 2 has the highest probability (0.92) of being linked to locus 1 and no probability of being linked to locus 3 and so on for the other families.

*** Program HOMOGM (Created on Nov 20 1998, at 12:00:52) Dr. Jurg Ott ***

This output file created on Fri Nov 20 12:01:36 1998

Heterogeneity -- Trait vs. 3 loci on different chromosomes

The number of free parameters is 6

The value of Max ln(L) is 174.002816

The value of log10(L) is 75.569423

t[1] t[2] t[3]

===========================================================

0.100 0.200 0.020

a[1] a[2] a[3]

===========================================================

0.221 0.282 0.349

Alpha-unlinked is 0.148

Family linkage probabilities

1 0.08746 0.31311 0.34903 0.25040

2 0.92632 0.04094 0.00000 0.03274

..............

..............

100 0.03388 0.06823 0.02484 0.87306

..............

..............

200 0.00055 0.01771 0.96757 0.01417

>> gcc homogm.c -DSTARTING_VALUES -o homogm -lm -O2

or

>> cc homogm.c -DSTARTING_VALUES -o homogm -lm -O

The program will then prompt the user to input different starting values (for θ only).

(1) Set α

(2) fix all αs other than α

The latter hypothesis differs from the former in that it allows for a proportion of families to be unlinked to locus 2. The log likelihood obtained under such a restricted hypothesis is then compared with the log likelihood under a less restricted or unrestricted hypothesis. See explanations at the end of the section on the HOMOG3R program for further details.

To use the program, you will have to furnish 4 values of a (proportions of family types), e.g., 0.23 0.77 0 0 for two components. Also, you need to specify "theta" values. However, rather than the actual recombination fractions, the program expects the consecutive (integer) numbers corresponding to the θ values given in the input file, for example, 3 for the third θ value. To specify a recombination fraction of 50%, enter a number outside the range of numbers of θ values, e.g.. 0. The θ numbers corresponding to α = 0 are irrelevant (any θ will do).

This test statistic is also reported by the standard HOMOG program.

Files used by the program have the following fixed names:

- MTEST.DAT is the input file. It has the same structure as the input file to the HOMOG program. A sample MTEST.DAT file is provided.
- MTEST.OUT is the output file.
- MTEST.GRP is an input file holding the family group definitions. The first line contains the number, NGR, of groups to follow on subsequent lines. On each of the following NGR lines, family numbers are given that form one group, e.g., 3 11 12 15. Contiguous family numbers may be given in abbreviated form, e.g., numbers 7 through 11 may be given as -7 11.The first line and the following NGR lines define one set of groups. As many such sets may be given as desired. An example MTEST.GRP file is provided. Note that after the number NGR of groups, a title may follow on the same line, but there must be at least one space between NGR and the title.If NGR = 0 is given as the number of groups, this is taken to indicate that each family should form one group of its own (original usage of Morton's test). In that case, no family numbers are to be provided.The last line of the file should contain the number -1 to indicate the end of input.

Morton NE (1956) The detection and estimation of linkage between the genes for elliptocytosis and the Rh blood type. Am J Hum Genet 8:80-96

Ott J (1986) Linkage probability and its approximate confidence interval under possible heterogeneity. Genet Epidemiol Suppl 1:251-257

Ott J (1999) Analysis of Human Genetic Linkage, third edition. Johns Hopkins University Press, Baltimore

Terwilliger JD, Ott J (1994) Handbook of Human Genetic Linkage. Johns Hopkins University Press, Baltimore. Now available as a pdf file.