Jurg Ott
Rockefeller University New York
18 August 2015

Documentation to HOMOG programs

Contents
HOMOG program
Additional Programs
Construction of Support Intervals
References 

INTRODUCTION

Two test situations may be distinguished, a mixture of families in which individual families cannot unequivocally be assigned to one type or group (HOMOG programs), or known groups of families (fixed classifications, MTEST program, see below), for example, as distinguished by their origin. The HOMOG programs are written close to standard Pascal (exception: HOMOGM is written in C) and have been compiled with Free Pascal in Windows and Linux. HOMOG has been updated to accommodate multiple sets of input data.

This documentation describes various forms of the homogeneity (admixture) test but only the basic HOMOG program is furnished as a downloadable package. The additional programs are somewhat outdated and presumably no longer of interest. They may be obtained from the author upon request.

All HOMOG programs analyze heterogeneity (two or more disease loci) with respect to single marker loci or known maps of markers. In the first case, the programs expect lod scores between disease phenotype and the marker(s), and in the second case they expect multipoint lod scores for disease versus a known map(s) of markers. HOMOG carries out a homogeneity test (A-test) under the following alternative hypothesis: Two family types, one with linkage between a trait (or any gene locus for that matter) to a marker or map of markers, the other without linkage. For more information see Ott (1999). What follows are instructions on setting up the input file for the HOMOG program.

HOMOG program

Input is as described below. Default file names are HOMOG.DAT for input and HOMOG.OUT for output.

Line 1:  Title line

Line 2:  N  STEPSIZE  LDIFF    where

Line 3:  OUT  ALOW       where the OUTput option is set as follows:

  OUT   Table of lnL(alpha,theta)   Lods for families
   0               no                     no
   1               no                    yes
   2              yes                     no
   3              yes                    yes

 ALOW = lowest value of α analyzed (e.g., ALOW=0)

Line 4:  N recombination fraction (θ) values, e.g., 0.01, 0.05, 0.1, etc. At these points, lod scores will be computed. A large number N of θs (e.g., 10) will yield more accurate results than a small number. Note that these θ values are not used by the program other than to identify the lod scores.

Line 5:  NFAM = number of families for which lods are provided

Line 6:  Lod scores for family 1. Lods smaller than -99 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.
Repeat line 6 for families 2, 3, etc.

Sample data: The file HOMOG.DAT shows a specific example based on the analysis by Morton (1956) on Elliptocytosis vs. Rh and is reproduced below.

Morton (1956) Elliptocytosis vs. Rh
5 0.05  2.0
3 0
0.01 0.05 0.1 0.2 0.3
14
   0.292   0.258   0.215   0.134   0.064
  -9.548  -4.232  -2.182  -0.540   0.018
   5.022   4.682   4.236   3.264   2.165
   1.560   2.321   2.317   1.953   1.376
   2.489   3.304   2.735   2.109   1.274
   0.276   0.260   0.238   0.191   0.138
   0.992   0.908   0.805   0.597   0.380
   2.657   2.442   2.161   1.554   0.892
  -0.110  -0.091  -0.070  -0.038  -0.016
  -7.307  -3.840  -2.399  -1.086  -0.455
  -1.402  -0.721  -0.444  -0.194  -0.076
  -0.300  -0.101  -0.020   0.059   0.046
  -4.036  -2.020  -1.220  -0.531  -0.224
  -1.118  -0.471  -0.235  -0.064  -0.014

Additional Programs

The programs described below are somewhat obsolete. The usage instructions are retained here for the unlikely event that someone still needs to use one of these programs.

HOMOG1 is an extension of the homogeneity test, with the following alternative hypothesis: Two family types, one linked the other unlinked, plus a possible sex difference in the recombination fraction. This program comes in two versions, depending on whether the lods for the two sexes are independent or not:  HOMOG1a reads independent lods, HOMOG1b reads dependent lods. For the same problem, with independent lods, HOMOG1a is more efficient in terms of memory space.

HOMOG2 is another extension of the homogeneity test. There are two postulated trait loci at different map positions (or different map distances from a marker), with only one of the trait loci occurring in any one family. That is, there are two family types, both with linkage. The recombination fraction between trait locus 1 and the marker is θ1, that between trait locus 2 and the marker is θ2, where θ1 < θ2 < 0.5 (male and female recombination fractions the same). Typically, HOMOG2 is used with a marker map rather than a single marker in which case θ1 and θ2 refer to the map positions of the two trait loci. The proportion, α, of families with trait locus 1 is also estimated.

HOMOG3 and HOMOG4 are analogous to HOMOG2 but specify 3 or 4 family types (trait loci).  They only calculate the maximum log likelihood and the ML estimates but no support intervals.

HOMOG3R is a specialized version of the HOMOG3 program. It calculates log likelihoods under the assumption of two trait loci, where in a proportion α1 of families the trait is linked to marker 1, and in a proportion α2 of families is linked to marker 2. A third family type with proportion α3 has its trait locus elsewhere. The two trait genes are taken to be unlinked with each other.

HOMOGM is an extension of HOMOG3R to any number of trait loci. It was written in C by Ajita Bhat and uses a numerical maximization routine to maximize the likelihood over parameter values.

POINT4 is interactive and calculates, for a mixture of up to 4 family types, the log likelihood at specific parameter values.

Before running one of the programs, an input file must be constructed according to the rules given below. Input and output files have fixed names. For example, for the HOMOG program, the input file is HOMOG.DAT and the output file is HOMOG.OUT. Output of lod scores and log likelihoods is preset to a width of 80 columns unless the input quantity LL is read.

In each of the homogeneity tests, groups or types of families are assumed where any given family cannot unequivocally be assigned to either of these types. The statistical hypotheses referred to in the programs are defined as follows:

The relationship between the hypotheses 1 through 4 may be displayed as follows:

----------------------------------------------------------
Recombination fraction           α = 1          α < 1
  in the two sexes          (Homogeneity)  (Heterogeneity)
----------------------------------------------------------
           equal                  H1              H2
         unequal                  H3              H4
----------------------------------------------------------

In the programs and on output, genetic distance is labeled in terms of the recombination fraction, θ. However, the programs may also be used when the genetic distances are in centimorgans, x. Free recombination (infinite map distance) is labeled as θ = 99 or -99.

Tests of one hypothesis against another are carried out as likelihood ratio tests, where the likelihood ratio with respect to the two hypotheses is calculated. Asymptotic p-values based on theoretical chi-square distributions are no longer reported because in most of these tests they are not really appropriate.

CONSTRUCTION OF SUPPORT INTERVALS

Some of the programs described below will calculate support intervals for parameters estimated (support "regions" for more than two parameters) and, for each family, the conditional probability of being of one of the family types considered. Such calculations are only then carried out, however, when on input a value for LDIFF is specified. Otherwise, no support interval calculations will be performed, which will result in faster program execution. Such approximate confidence intervals will be very crude and practically useless if only a few θ values with lod scores are present so that the θ values are far apart of each other, or when the step size for a is too large, say, larger than 0.10. It is important for good support intervals that lod scores are available at many θ values. When only few lod scores are available, a possible solution is to approximate lod scores by interpolating between calculated lod scores before inputting them to the HOMOG programs.

In the HOMOG programs, support regions/intervals are computed as follows. First, the program determines the highest Ln likelihood, Lmax, under the most general alternative hypothesis, i.e., the one with the largest number of parameters estimated. Then, the program recalculates likelihoods and marks all those parameter values which have an Ln likelihood larger than Lmax – Ldiff (Ln likelihood within Ldiff of the maximum). The marked parameter values then form the support interval. Such a support interval is called an Ldiff-unit support interval. The table below gives examples for the correspondence between Ldiff and the associated likelihood ratio.

Under regular conditions, support intervals may be interpreted as approximate confidence intervals. For example, with two-point analysis and two family types (one linked and the other unlinked), 2 Ldiff approximately follows a chi-square distribution on 1 df when no heterogeneity is present. In multipoint situations, however, the approximation by chi-square is unreliable because the distribution of the test statistic is unknown.

 ----------------------------------------------
                Difference in units of  Approx.
 Likelihood    -----------------------  p-value
 ratio (LR)    ln(LR)=Ldiff  lod score  (1 df)
 ----------------------------------------------
      7.39         2.00         0.87     .046
     10            2.30         1        .032
     20            3.00        1.30      .014
     50            3.91        1.70      .005
    100            4.60        2         .002
   1000            6.91        3         .0002
 ----------------------------------------------

HOMOG1a program

The possible hypotheses under which likelihoods are calculated by the HOMOG1 program may be displayed as follows, where df stands for degrees of freedom (see also introduction):

----------------------------------------------------------
Male and female    Homogeneity (one     Heterogeneity (two
rec. fractions     family type)         family types)
----------------------------------------------------------
        equal          H1 (1 df)            H2 (2 df)
      unequal          H3 (2 df)            H4 (3 df)
----------------------------------------------------------

The test of H1 against H4 leads to a chi-square value with 2 df that may be partitioned into two components according to the manner in which H4 is reached from H1. Note that there are two possible paths leading from H1 to H4.

Input to the HOMOG1a program is similar to that for the HOMOG program and is as given in the following table. But refer to the notes below this table. File names are HOMOG1A.DAT for input and HOMOG1A.OUT for output.

Line 1:   Title line

Line 2:   NM  NF  STEPSIZE  LDIFF     where

Line 3:  OUT  ALOW  LL    where the OUTput option is set as follows (Warning: the table of lnL(α, θ) contains [NM NF – 1]/STEPSIZE lines):

  OUT   Table of lnL(alpha,theta)   Lods for families
   0               no                     no
   1               no                    yes
   2              yes                     no
   3              yes                    yes

 ALOW = lowest value of alpha analyzed (e.g., ALOW = 0)
 LL (optional) = line length of output (may be missing)

Line 4:  NM male θ values, tm. They may be entered on a single line, or distributed over several lines. The order is irrelevant.

Line 5:  NF female θ values, tf.

Line 6:  NFAM = number of families

Line 7:  NM male lod scores for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.

Line 8:  NF female lod scores for family 1.

Repeat lines 7 and 8 for families 2, 3, etc.

As to the θ values at which lod scores are available in each family, the user is essentially free which theta values to choose. However, he or she should make sure that there is a sufficiently large number of pairs both with tm = tf and with tm ≠ tf.

Sample data: The file HOMOG1A.DAT provides an example of data that may be analyzed for heterogeneity as well as for a sex difference in the recombination fraction. Data quoted in Ott (1986).

HOMOG1b program

The calculations performed by the HOMOG1b program are basically the same as those by the HOMOG1a program. HOMOG1b allows input of nonindependent lod scores for the two sexes and, everything else being equal, requires more memory to run. The input is slightly different from that of the HOMOG1a program and is as follows. File names are HOMOG1B.DAT for input and HOMOG1B.OUT for output.

Line 1:  Title line

Line 2:  N  STEPSIZE  LDIFF     where
Line 3:  OUT  ALOW  LL    where the OUTput option is set as follows:

   OUT   Table of lnL(alpha,theta)   Lods for families
   ---------------------------------------------------
    0               no                     no
    1               no                    yes
    2              yes                     no
    3              yes                    yes

 ALOW = lowest value of alpha analyzed (e.g., ALOW=0)
 LL (optional) = line length of output (may be missing)

Line 4:  N pairs of θ values, tm and tf, where in each pair the first value is the male and the second value is the female recombination fraction. Each pair may be entered on a single line, or several pairs may be entered on one line, e.g., 0.01, 0.01, 0.05, 0.05, 0.01, 0.05,...  There must be exactly as many pairs (N of them) as there are lod scores for each family as provided on lines no. 5, below. Omit lod = 0 at tm = tf = 0.5. The order in which these pairs are provided is irrelevant.

Line 5:  NFAM = number of families

Line 6:  Lod scores (N of them) for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.
Repeat line 6 for families 2, 3, etc.

As to the pairs of θ values at which lod scores are available in each family, the user is essentially free which θ values to choose.  However, he or she should make sure that there is a sufficiently large number of pairs both with tm = tf and with tm ≠ tf.  A minimum set of pairs would be all possible combinations of tm and tf with tm = 0, 0.05, ..., 0.5, and with tf = 0, 0.05, ..., 0.5. When the highest lod score occurs at tf > tm, then it might be sufficient to provide lods in one triangle of the plane of (tm, tf)-values only, e.g., at tm = 0, 0.05, ..., 0.5, and tm ≤ tf ≤ 0.5.  Such a set of theta values may graphically be represented as follows (marked with crosses):

             tm = 0.5 |
                  0.3 |                x    x
                  0.1 |           x    x    x
                  0.05|      x    x    x    x
                   0  | x    x    x    x    x
             ---------+-----------------------
             tf =       0  0.05  0.1  0.3  0.5

On input, for example, the following θ values would have to be given on line(s) 4:

0 0   0 0.05   0 0.1   0 0.3   0 0.5   0.05 0.05
0.05 0.1  0.05 0.3   0.05 0.5   0.1 0.1 ... 0.3 0.5

Sample data: the file HOMOG1B.DAT contains the same data as the file HOMOG1A.DAT referenced in the previous section except that the joint lod scores, Z(tm,tf), have been reconstructed from the independent sex-specific lod scores, Z(tm) and Z(tf), as Z(tm, tf) = Z(tm) + Z(tf).

HOMOG2 program

As mentioned in the introduction, in this extension to the A-test, the alternative hypothesis H2 of heterogeneity specifies two family types, both with linkage, one with recombination fraction θ1 between trait locus 1 and a marker (or θ1 is the map position of trait 1), the other with recombination fraction θ2 between trait locus 2 and the marker (or θ2 is the map position of trait 2) (θ1 < θ2 < 0.5). α is the probability of belonging to type 1 (with θ1). The two trait loci are on the same chromosome.

Input format is the same as for the HOMOG program and is as given in the following table, but refer to the notes after the table. File names are HOMOG2.DAT for input and HOMOG2.OUT for output.

Line 1:  Title line

Line 2:  NT  STEPSIZE  LDIFF     where
Line 3:  OUT  ALOW  LL     where the OUTput option is set as follows (Warning: table of lnL(α, θ) will contain 0.5 N(N + 3)/STEPSIZE lines):

  OUT   Table of lnL(alpha,theta)   Lods for families
  ---------------------------------------------------
   0               no                     no
   1               no                    yes
   2              yes                     no
   3              yes                    yes
  ---------------------------------------------------
 ALOW = lowest value of a analyzed (e.g., ALOW = 0)
 LL = line length of output (optional; if missing: LL = 80)

Line 4:  Recombination fraction (θ) values, e.g., 0.01, 0.05, etc.  At these points, lod scores will be computed. A rather large number NT of recombination fractions (e.g., 10) will yield more accurate results than a small number.

Line 5:  NFAM = number of families for which lod scores are provided.

Line 6:  Lod scores for family 1. Lods smaller than -80 are taken to repre-sent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.
Repeat line 6 for each family.

The null hypothesis (H1) is specified by α = 1 or, equivalently, by θ1 = θ2, and has one degree of freedom (df). The alternative hypothesis (H2) is characterized by three df, i.e., α, θ1, and θ2 are estimated. However, setting α = 1 forces θ2 to be equal to θ1 so that the asymptotic chi-square distribution may not apply. Thus, instead of p-values, the current version of the HOMOG2 program outputs likelihood ratios (odds ratios) for the hypotheses considered.

HOMOG3 and HOMOG4 programs

These programs are straightforward extensions of the HOMOG2 program to 3 and 4 family types (linked trait loci). They use the same input format as the HOMOG and HOMOG2 programs, and the input files are HOMOG3.DAT and HOMOG4.DAT, respectively. Output files will be called HOMOG3.OUT and HOMOG4.OUT.

The HOMOG3 and HOMOG4 programs simply calculate the max. Ln likelihood under the most general hypothesis of heterogeneity. Appropriate significance tests will have to be carried out manually by the user by comparing output from these programs with output from the HOMOG or HOMOG2 programs. Notice that HOMOG3 and HOMOG4 carry out an exhaustive search of the parameter space and may require a large amount of computer time. While they are running, they display the current alpha values so that programs may be interrupted by the user.

Interpreting results of HOMOG3 or HOMOG4 is not straightforward. For example, whenever one of the components (α's) is equal to zero, the associated θ value is irrelevant. Also, there may be more than one parameter constellation with the same maximum likelihood. The HOMOG3 and HOMOG4 programs differ in their output as follows.

In the HOMOG3 program, if the OUTput option (line 3) is set to a value larger than 1, all possible sets of α values will be printed (one set per line), and for each set the maximum likelihood over the θ values will be given along with those θ values at which the maximum occurred.

In the HOMOG4 program, if the OUTput option (line 3) is set to a value larger than 1, a table containing the Ln likelihood for each possible set of parameter values will be written to the output file. WARNING:  THIS FILE COULD BE VERY LARGE! For example, when the sample HOMOG.DAT file is analyzed by the HOMOG4 program, the output file will be 1.5MB long. For most practical situations, one should set OUT = 0 on line 3.

Notice that any α component cannot take on the whole range of values from 0 through 1.  Also, for computational efficiency, only α3 is allowed to have a value of 1; if in the course of the calculations, α2 or α1 were also allowed to be equal to 1, the resulting likelihoods and θ estimates would be exactly the same as with α3 = 1. Similarly, α3 cannot be equal to zero. In this program, if a single a is zero, it must be α1; if two α's are zero, these must be α1 and α2.

HOMOG3R program

This is a specialized version of the HOMOG3 program. It calculates log likelihoods under the assumption that in a proportion α1 of families trait locus 1 is linked to marker 1 and in a proportion α2 of families trait locus 2 is linked to marker 2, where the two markers (or maps of markers) are located in different regions of the genome such that the trait loci are unlinked with each other. There may be a third proportion, α3 = 1 – α1 – α2, of families whose trait locus is unlinked to either of the two map regions harboring trait loci 1 and 2. The two regions of the genome usually correspond to different chromosomes and are identified in the following table by the respective sets of θ values, NT1 and NT2.

Default file names are HOMOG3R.DAT for input and HOMOG3R.OUT for output.

Line 1:  Title line

Line 2:  NT1  NT2  STEPSIZE    where
Line 3:  OUT  ALOW    where the OUTput option is set as follows:

  ---------------------------------------------------
  OUT   Table of lnL(alpha,theta)   Lods for families
  ---------------------------------------------------
   0               no                     no
   1               no                    yes
   2              yes                     no
   3              yes                    yes
  ---------------------------------------------------

The table of lnL(α, θ) will print one line for each pair of α1 and α2. On each line, the log likelihood, maximized over the θs, is printed.

 ALOW = lowest value of α analyzed (e.g., ALOW=0).

Line 4:  All NT1 + NT2 θ values, e.g., 0.01, 0.05, etc., that is, the θ values for marker 1 immediately followed by the θ values for marker 2. These values are for identification purposes only and are not used in the calculations. It may thus be useful to distinguish θ values for marker 1 (e.g.. -0.10 or 0.11) from those for marker 2 (e.g.. 0.10). A large number of recombination fractions will yield more accurate results than a small number.

Line 5:  NFAM = number of families for which lod scores are provided.

Line 6:  The NT1 + NT2 lod scores for family 1. Lods smaller than -80 are taken to represent minus infinity, and a log likelihood of minus infinity will appear as -99 on output.
Repeat line 6 for each family.

A special situation is given when the two markers near trait loci 1 and 2 are taken to be candidate genes and lod scores are evaluated at θ = 0 only. In this case, the HOMOG3R program will maximize likelihoods over only two θ values, 0 and 0.5. Consider the following input file (another sample data set is provided in the file HOMOG3R.DAT):

Linkage to two candidate genes on different chromosomes
1 1 .05
1 0
-0.01 0.01
4
 0.903  -99
 2.007  -99
 0.601  0.601
 -99    1.204

For each of four families, at each of two chromosomes, the file contains lod scores at θ = 0 (identified as -0.01 for marker 1 on chromosome 1 and 0.01 for marker 2 on chromosome 2).

There are three possible hypotheses of homogeneity:
  1. All families are linked with marker 1 but unlinked with marker 2.
  2. All families are linked with marker 2 but not with marker 1.
  3. All families are unlinked with markers 1 and 2.
For the given data, hypotheses 1 and 2 have zero likelihood because there is always at least one family with one or more known recombinations (but note the comments below). The HOMOG3R program furnishes the following output:

Program  HOMOG3R  version 1.70   J. Ott

Heterogeneity -- Three family types, type 1 with linkage to first set of
theta values, type 2 with linkage to second set of theta values (usually
two different chromosomes), type 3 unlinked.

>> Linkage to two candidate genes on different chromosomes <<

      Fam.  Lod scores
         1    0.9030  -99.0000
         2    2.0070  -99.0000
         3    0.6010    0.6010
         4  -99.0000    1.2040
     Theta   -0.0100    0.0100

Results for different hypotheses (fixed values in parentheses)

            Hypothesis    a1    a2    a3        t1      t2         lnL
----------------------------------------------------------------------
H1       Heterogeneity  0.65  0.35  0.00    -0.010   0.010      8.9453
H2           Het, a3=0  0.65  0.35   (0)    -0.010   0.010      8.9453
H3           Het, a2=0  0.70   (0)  0.30    -0.010   (-99)      5.9688
H4           Het, a1=0   (0)  0.40  0.60     (-99)  -0.010      1.7107
H5   Homogeneity, a1=1   (1)   (0)   (0)   -99.000   (-99)      0.0000
H6   Homogeneity, a2=1   (0)   (1)   (0)     (-99) -99.000      0.0000
H7   Homogeneity, a3=1   (0)   (0)   (1)     (-99)   (-99)         (0)

  Evidence for heterogeneity (H1 vs. H5/6/7): Het. versus homogeneity
    Difference in Ln(L)     =     8.9453
    Lik. ratio for heterog. =  7671.7558

  Evidence for 2 versus 1 disease locus (H1 vs. H3/4):
    Difference in Ln(L)     =     2.9765
    Lik. ratio for heterog. =    19.6190

Family   Conditional prob. of being
  no.     type 1   type 2   type 3  (under heterogeneity, H1)
   1      1.0000   0.0000   0.0000
   2      1.0000   0.0000   0.0000
   3      0.6500   0.3500   0.0000
   4      0.0000   1.0000   0.0000

The program output shows positive likelihoods for hypotheses H1 through H4. It may come as a surprise that formally the program also indicates non-null likelihoods (log likelihood greater than -99 = -∞) for hypotheses H5 and H6, that is, the data are compatible with homogeneity (locus in all families on chromosome 1 or chromosome 2). This is so because the likelihood is maximized over θ = 0 and θ = 0.5. Then the recombination fraction is estimated as 0.5 (indicated by -99.000 in the output) whenever θ = 0 is incompatible with the data. Of course, if θ were kept fixed at 0, the data would be incompatible with hypotheses α1 = 1 and α2 = 1.

Testing for significance of heterogeneity may be carried out in different ways. Under heterogeneity that allows for 3 components (hypothesis H1, above), 4 parameters are specified: α1, α2, θ1, and θ2. Setting α1 = 1 leads to H5 (all families have their gene at position 1; it so happens that with the above data the estimated θ1 = 0.50, i.e. map position = ∞). The number of estimated parameters is then equal to 1 (setting α1 = 1 makes θ2 irrelevant). Thus the number of degrees of freedom is 4 – 1 = 3 (this is probably somewhat conservative because statistically this is a nonstandard situation). The corresponding test statistic might then be constructed as χ2 = 2 (difference in ln(L) between H1 and H5) = 9.9453. One may proceed analogously for hypotheses H6 and H7.

HOMOGM program

This is an extension of the HOMOG3R program to any number of loci. A given family can have its disease locus at any one of n loci or at some other unknown locus. For the i-th family type, a proportion αi and disease location xi (or recombination fraction θi in the case of a single marker linked with the i-th gene) are estimated. For the (n + 1)st family type (unlinked), only the proportion αn+1 is estimated. Because of the large number of parameters, the likelihood is no longer computed for a complete grid over the whole parameter space as it is done in the other HOMOG programs. Rather, the simplex algorithm was implemented in a computer program written by Dr. Ajita Bhat for numerical maximization of the likelihood. A detailed program description is given below.

INTRODUCTION

HOMOGM analyzes heterogeneity for two or more disease loci that are unlinked with each other. This program also allows for a proportion of families to be unlinked to any of the loci. Different hypotheses can be generated using multiple fixed values of θ (theta, recombination fractions) and α (alpha, proportion of families with linkage to different locations). For detailed explanation please see Bhat et al. (1999). The example used in that paper is contained in the homogm.dat file.

The HOMOGM program is written in C. It is compilable on UNIX platforms using a C compiler using the following command:

>gcc -O2 homogm.c -o homogm -lm

or

>cc -O homogm.c -o homogm -lm

For Windows, the executable, homogmpc.exe is available. Also the source code, homogmpc.c is available and may be compiled using the djgpp compiler (http:// www.delorie.com/djgpp/) and the following command:

>gcc -O2 homogmpc.c -o homogmpc -lm

FILES NEEDED

The default input file name is homogm.dat and the program creates an output file named homogm.out. The input file needs to be supplied by the user.

Sample input file

A template of the homogm.dat file is given below. This template file is created assuming there are 3 loci and 200 families in the data set. A brief description of each line is given and a detailed explanation of the file is given further below. The format of the input file homogm.dat is as follows:

Families with 3 locations  Title line {Any suitable title of your choice} <Line 1>
6  6  7  ⇐ Number of q values for each locus <Line 2>
0 0 0 0 0 0 0  Fixed parameters (hypothesis generation) <Line 3>
0.001  0.100  0.200  0.300  0.400  0.5 Theta values for locus 1 <Line 4>
0.001  0.020  0.050  0.100  0.200  0.5 Theta values for locus 2 <Line 5>
0.001  0.010  0.060  0.100  0.200  0.300  0.400  Theta values for locus 3 <Line 6>
200         Number of families <Line 7>
2.107 1.787 1.429  1.023  0.554 0  Lod scores for 1st family and 1st locus <Line 8>
.................
.................
-99 -0.058 0.313  0.357  0.237 0  Lod score for the 100th family at 1st locus
................
................
-99 -1.675 -0.781 -0.347 -0.092 0  Lod scores for 200th family and 1st locus
-99  -0.212  0.046  0.053  0.011 0  Lod scores for 1st family and 2nd locus
................
................
-99  -2.092 -0.562  0.022  0.164 0  Lod score for the 100th family and 2nd locus
................
................
-99  -1.172 -0.317  0.013  0.094 0  Lod score for the 200th family and 2nd locus
-1.4966  -0.5123  -0.2289  0.1758  0.3221  0.4185  0.3627  Lod scores for 1st family and 3rd locus
................
................
 -10.4953  -6.4992  -5.2995  -3.4091  -2.5406  -1.3876  -0.7413  Lod scores for 100th family and 3rd locus
...............
...............
-4.4962  -2.5079  -1.9191 -1.0192  -0.6321  -0.1835   -0.0053  Lod score for the 200th family and 3rd locus

Detailed description of the input file, homogm.dat

Line 1 is a title line of user's choice.

Line 2 indicates the number of θ values considered for the study for loci 1, 2 and 3 respectively.

Line 3 parameters (hypothesis option): The number of zeros or ones on this line should correspond to twice the number of loci +1 i.e. 2n + 1, where n is the number of loci. Zero at the corresponding number suggests that the user does not wish to fix that particular value (i.e., that parameter should be estimated); whereas 1 at the corresponding number suggests that the user wishes to fix that particular value (parameter not estimated).

The first n values denote fixing of the θ values for the loci in serial order. The next n values denote fixing of α values for the corresponding loci in serial order. The (2n + 1)-th (i.e. the last) number denotes whether the α value for the unlinked locus is fixed/not-fixed. Any combination of α and θ can be fixed at a time.

Note: If the θ value for any particular locus is fixed to 0.5 then automatically the α value for that locus is fixed to 0. In addition, if n out of (n + 1) α values are fixed, it means that the (n + 1)th α value would be automatically fixed to (1 - sum of all n α values). Also, if n of the (n + 1) α values sum up to one, then the (n + 1)th alpha value is automatically fixed to zero. As an aid, users are provided with the number of free parameters that were available while running the hypotheses in the output file, homogm.out (details given below). Explanation on hypotheses testing can be obtained from the handbook of human genetic linkage (Terwilliger and Ott 1994).

Hypothesis generation: When the program is run, the user is prompted for values of the fixed parameters. Values may be fixed in such a way that values corresponding to θs are always between the minimum and maximum values of that given in the file homogm.dat, and sum of all αs must never be >1.

Some of the examples for running hypotheses are (3 loci):

1). No values are fixed. Line 3 looks like:
0 0 0 0 0 0 0

2). Second α value is fixed to 1. Line 3 looks like:
0 0 0 0 1 0 0
The program prompts for the constant that should be entered as 1.

3). Second θ value is fixed to 0.5. Line 3 looks like:
0 1 0 0 0 0 0
The program prompts for the constant and 0.5 should be entered. In this case, the program will automatically set the second α value to 0.

4). First θ value is fixed to 0.1 and second α value fixed to 0.2.
Line 3 looks like:
1 0 0 0 1 0 0
The program will prompt the user for the value of only these 2 fixed parameters. So the first input by the user should be 0.1 and second input must be 0.2.

5). Unlinked locus is fixed to 0 and first θ value to 0.2. Line 3 looks like:
1 0 0 0 0 0 1
The program will prompt the user for the first θ value which should be fixed to 0.2 and then the value of unlinked loci which in this case should be 0.

6). If the unlinked locus is fixed to be 1, max. likelihood is 0 and no linkage to either loci exists.
Note: The program does not interpolate between two θ values that are specified in the data file homogm.dat. Thus if the user wishes to fix the value of θ to 0.03 then the corresponding lod score (using MLINK) for that particular value has to be included in the input data file. Thus in this example, if the user fixes the value of the second theta to 0.4 the program automatically sets it back to the closest value in the file (i.e. 0.35 in this case).

Lines 4, 5, 6: Contain the list of all possible θ values.
Each line contains θ values for each locus. In this particular example, Line 4 denotes all the possible θ values for locus 1, line 5 for locus 2, and line 6 contains possible θ values for locus 3. The θ values for all loci should not have the upper and lower bounds beyond 0 and 0.5. A higher number of θ values would give more accurate answers of log likelihood. It is not necessary to include θ = 0.5 (for which the lod score is 0 by definition).

Line 7 indicates the number of families; in this case 200.

Lines 8 onward are a list of lod scores for each locus 1, 2, 3 at each θ value. First are the lod scores for all families at locus 1, then those for all families at locus 2, etc. The number of lod scores should match the number of θ values.

Output of the program

The program outputs the values of the fixed parameters and the maximum likelihood for those values on the screen. The program also creates the output file homogm.out. This output file gives the time and date it was created. Following that is number of free parameters used to generate the value (in this example it is 6 since line #3 in homogm.dat has all values set to 0). The maximum log likelihood is given followed by the table of θ and α values for the loci in serial order. In this example, the maximum likelihood is 174.0028. The α value of the unlinked locus is 0.148 as given in this example; meaning a proportion of 0.148 (or 1/7) families is unlinked to any of the three loci. Fixed parameters are shown in parentheses in the output file (as shown in the manuscript and in the output file of homog3r).

Following that is the table of conditional probabilities of each family linked to that particular locus. In this example, family 1 has the highest probability (0.25) to be in the criterion of being unlinked to either of the three loci. Similarly, family 2 has the highest probability (0.92) of being linked to locus 1 and no probability of being linked to locus 3 and so on for the other families.

*** Program HOMOGM (Created on Nov 20 1998, at 12:00:52) Dr. Jurg Ott ***

This output file created on Fri Nov 20 12:01:36 1998

Heterogeneity -- Trait vs. 3 loci on different chromosomes

The number of free parameters is 6
The value of Max ln(L) is 174.002816
The value of log10(L) is 75.569423
 

  t[1]    t[2]    t[3]
===========================================================
 0.100  0.200  0.020

  a[1]    a[2]    a[3]
===========================================================
 0.221  0.282  0.349

Alpha-unlinked is 0.148

Family linkage probabilities

1 0.08746 0.31311 0.34903 0.25040
2 0.92632 0.04094 0.00000 0.03274
..............
..............
100 0.03388 0.06823 0.02484 0.87306
..............
..............
200 0.00055 0.01771 0.96757 0.01417

Starting values for parameter estimation

The program does not depend on starting values. However, if users are interested in exploring various starting points the user-defined method needs to be used. In that case the program needs to be compiled using the following command:

>> gcc homogm.c -DSTARTING_VALUES -o homogm -lm -O2

or

>> cc homogm.c -DSTARTING_VALUES -o homogm -lm -O

The program will then prompt the user to input different starting values (for θ only).

Hypothesis testing

The program provides only one maximum likelihood value (reported in the form of the natural and the decimal log), i.e. the likelihood associated with the parameter estimates in the output. The program could have been designed to test various hypotheses. However, because of the essentially unlimited number of parameters and hypotheses, it is left to the user to specify what hypotheses he/she wants to test. This may be accomplished by setting any of the parameter values to "fixed" or "variable" (input line 3). For example, one wants to test whether all families belong to class 2 (have the disease locus linked with marker 2). Then there are two possibilities:

(1) Set α2 = 1 and all other αs = 0 and keep these values fixed, or
(2) fix all αs other than α2 and αn+1 at the value 0.

The latter hypothesis differs from the former in that it allows for a proportion of families to be unlinked to locus 2. The log likelihood obtained under such a restricted hypothesis is then compared with the log likelihood under a less restricted or unrestricted hypothesis. See explanations at the end of the section on the HOMOG3R program for further details.

POINT4 program

The POINT4 program is interactive and calculates the log likelihood at specific parameter values for a mixture of up to 4 family types. It reads input files in the regular format for the HOMOG program. When files in the format for HOMOG3R are to be used by POINT4, the two numbers on the second line indicating numbers of recombination fractions must be replaced by a single number, which is the sum of the previous two numbers.

To use the program, you will have to furnish 4 values of a (proportions of family types), e.g., 0.23  0.77  0  0  for two components.  Also, you need to specify "theta" values.  However, rather than the actual recombination fractions, the program expects the consecutive (integer) numbers corresponding to the θ values given in the input file, for example, 3 for the third θ value. To specify a recombination fraction of 50%, enter a number outside the range of numbers of θ values, e.g.. 0. The θ numbers corresponding to α = 0 are irrelevant (any θ will do).

MTEST program

The MTEST program implements Morton's likelihood ratio test for heterogeneity of the recombination fraction among different groups of families (Morton 1956). Each group consists of a certain number of families, e.g., groups may correspond to investigators, or to countries of origin, etc. Also, each family may be regarded as forming a group of its own (Morton's original usage of the test). The test assumes homogeneity within each group (same θ).  The null hypothesis specifies overall homogeneity while under the alternative hypothesis of heterogeneity, a potentially different θ value exists for each group.

This test statistic is also reported by the standard HOMOG program.

Files used by the program have the following fixed names:

REFERENCES

Bhat A, Heath SC, Ott J (1999) Heterogeneity for multiple disease loci in linkage analysis. Hum Hered 49:229-231

Morton NE (1956) The detection and estimation of linkage between the genes for elliptocytosis and the Rh blood type. Am J Hum Genet 8:80-96

Ott J (1986) Linkage probability and its approximate confidence interval under possible heterogeneity. Genet Epidemiol Suppl 1:251-257

Ott J (1999) Analysis of Human Genetic Linkage, third edition. Johns Hopkins University Press, Baltimore

Terwilliger JD, Ott J (1994) Handbook of Human Genetic Linkage. Johns Hopkins University Press, Baltimore. Now available as a pdf file.