Jurg Ott, 10 Dec 2020

This
website refers to a computer program, statdel,
described in Imai-Okazaki *et al*. (2017). Here, just the basic
instructions for its use are provided. More details may be found in
the sample parameter file.

The program needs to be invoked in a command window (Windows) or terminal window (Linux) by typing on the command line, for example: statdel Pt369.param, where the second item refers to the parameter file required. In the program package, a sample parameter file is included and provides information on the relevant parameters. The sample parameter file looks as follows:

Statdel: Pt369 with known pathogenic deletions

-9 17 14005439 15217437 17p12 code for missing, chrom, start and end bp of dis.del, 0 0 0 if unknown

-12 1 1 code for test statistic, exponent, include obs

1 3 0.5 0.8 no. of "n" values, min #variants for HDR, min median(HDR), min. overlap

hdr_scores_Pt369.txt

StatdelPt369.out

EXPLANATIONS

Line 1: Any text, truncated to 255 characters.

Line 2:

- Value for "missing", should be left equal to -9
- Chromosome number for disease deletion if known (=0 otherwise)
- Chromosomal start position of disease deletion if known (=0 otherwise)
- Chromosomal end position of disease deletion if known (items 2 - 4 are only used in a research setting when the true disease deletion is known. Potential text following this item is not used by the program.

Line 3:

- Code for primary test statistic (see below)
- Exponent for power transformation of HDR values (see Note 2 below)
- 1 if observed data are counted among null data, 0 otherwise (see Note 4)

Line 4:

- Number, N, of different boundary values ("n") flanking an ROH, currently fixed at 1
- Min. average number of basepairs (variants) for a valid HDR (preferably 3 or more)
- Min. median HDR for an ROH to have a valid result.
- Min. proportion of disease deletion to be covered by an ROH so that this ROH is taken to identify the disease deletion. Such an ROH is flagged * on output. Two examples, assuming min. overlap = 0.8, are as follows:

Candidate ROH: + + + + +

The ROH overlaps a proportion of 2/8 = 0.25 of the deletion, so does not qualify to represent the deletion.

Candidate ROH: + + + + + + + + + + + +

The ROH overlaps a proportion of 9/10 = 0.90 of the deletion, so it qualifies to represent the deletion.

Exception: If an ROH is shorter than the known deletion but
completely inside the deletion boundaries, then the overlap is declared to be 1.0 (100%).

Line
5: Name of the file holding HDR values

Line 6: Name of results (output) file. If this line is blank the output file will be called "statdel.out".

----------------------------------------

Codes on line 3:

1 = mean difference3 = Kolmogorov-Smirnov 2-sample test statistic

11 = 2-sample t statistic, equal variances

12 = 2-sample t statistic, unequal variances

NOTE 1: A positive value of the code indicates a 2-sided test, a negative value specifies a 1-sided test. For example, -3 indicates whether values in group 2 tend to be larger than those in group 1 (distribution function in group 1 tends to be larger than that in group 2).

NOTE
2: That exponent has the function of transforming HDR data with a
skewed distribution
to become (nearly) normal. Assume that x is a quantitative trait
like body height or cholesterol level. Often such measurements have a
long upper tail and a lower bound like 0. To make their distributions more
symmetric (and thus more normal) one might transform x, for example, into
y = √x = x^{λ}, with λ = ½. More generally,
such power transformations are
given by y = (x^{λ} – 1)/λ + λ, with y =
ln(x) for λ = 0, and λ = 1 for y
= x (no transformation). In practice, however, there may not be a
need for
power transformations. See Ott
J (1979) Detection of rare major genes in lipid levels. Hum Genet
51:79-91.

NOTE
3: Structure of data input file, with each line providing the following information.

Line 1:

- Number of candidate variants. Number of case individuals = 1, fixed.
- Number of control individuals in addition to the case individual
- (obsolete: Number of "regions", may be set to 1, no longer needed)

Line 2: Any text

Line 3:

- chr These 3 (or any other) characters in columns 1-3 will not be used by the program
- Chromosome number following "chr"
- Basepair start position of deletion
- Basepair end position of deletion
- Number 1 or 2, with 1=HDR value, 2=HDR2 value (number of obs leading to HDR)
- HDR values of case-control pairs and control-control pairs

Line
4: Analogous input for variant 2, etc.

The total number of lines (after lines 1 and 2) is twice the number of candidate variants.

NOTE
4: If item 3 on line 3 is equal to 1 then the observed data are
included among
the null (pseudo) data. This is standard practice in statistics. If
the item
is equal to 0 then observed data are not counted as null data.

The
following files are included in the program package. Program parameters are sightly different in the Windows and Linux version.

hdr_scores_Pt369.txt |
sample data file |

statdel.linux |
Linux executable |

statdel.exe |
Windows executable |

statdel.pas, statdel.linux.pas |
source program |

StatdelPt369.out, StatdelPt369.out.log |
sample output files |

Pt369.param |
sample parameter file |

Imai-Okazai *et al*
(2017) HDR-del: A tool based on Hamming distance for prioritizing
pathogenic chromosomal deletions in exome sequencing. *Hum Mutat*
38:1796-1800
(PMID: 28722338)