Jurg Ott, Josephine Hoh / 10 Oct 2012
ott@rockefeller.edu
User's guide to the p53MH program
For each site in the DNA sequence of a gene, this program computes a
score between 0 and 100. A high score indicates an increased
probability that n
nucleotides at that and subsequent sites represent a
p53 binding site, where n = 2
× 10 + (selected length of spacer
region). The algorithm is based on our published description [1]. The
program is written in Free Pascal
(an extension of Turbo Pascal) and is available for Windows and Linux. A slightly modified version of our algorithm has more recently been developed for p63 binding sites [2].
Input files
1. Gene names
This file, p53names.dat,
contains a list of file names that should be
processed. Each file (gene or other sequence) will be analyzed by the
algorithm. The program will read names and process the corresponding
files one after the other until it encounters a blank line (or end of
file) in the p53names.dat file. Text below such an empty line will be
ignored.
2. Random seed
For the random number generator, the user must prepare a file called
seed.txt containing a negative
integer number, i.e., the random number
seed. This file will be updated with the ending seed of the previous
run with each successive run of the p53MH
program.
3. Gene file(s)
As many such files must be present as names are listed in the
p53names.dat file. Each file
may contain any text on the first few
lines but must have a blank line before the actual sequence starts.
Examples: mdm2.dat, waf1.txt.
4. Parameter file
This file, p53param.dat,
contains various parameters that control
analysis features. After the actual parameter values, brief
explanations are listed in the sample file. More detailed explanations
are as follows (each input line may contain one or more parameters):
- Line 1: Number of order statistics to evaluate
(may be 0)
- Line 2: Number of bootstrap replicates to carry
out. Set to 0 for none.
- Line 3: Threshold limits (%max) for evaluating
significance levels of scores at or above these thresholds. List any
number of limits, no trailing text. Whole numbers only. Leave blank for
no limits. Is relevant only for a value >0 on line 2.
- Line 4: Gap weights (1) or no gap weights (0).
With gaps weight on, scores will be weighted by relative frequency of
gaps in the genome (up to gap (spacer) length of 14.
- Line 5: Smoothing factor for gap weight (0 for no
smoothing; no effect when gap weights are off)
- Line 6: Multiplier for score contributions of core
sites, #4 - #7, #14 - #17
- Line 7: Use filtering (1) or not (0)
- Line 8: Smoothing constant s
in log likelihood
ratio score, ln(LR + s). A
value of -1 indicates that s
will be chosen
so highest and lowest scores will be equal in absolute value.
- Line 9: Scoring method, likelihood ratio (1) or
conditional probability (0)
- Line 10 (two numbers):
- n1 = Number of
bp to skip before reading sequence (may be 0)
- n2 = Number of
bp to read (after skipping) in given gene (0 for whole gene). For
example, if n1 = 1000 and n2 = 12000, then reading the gene
sequence would start at position 1001 and would continue for 12000
nucleotides. On output the actual sequence positions in the gene are
given.
- Line 11: Print 15 bp of sequence flanking the
putative binding site (1) or not (0)
- Line 12: Number of gap sizes to test. Two options
(choose one or the other, no trailing text):
- -1 n indicates
that scores should be maximized over gap sizes from 0 through n (max
for n = 20), or
- n1 n2
... provides a list of gap sizes to test. There must be at least
one number.
Running the p53MH program
Windows
There are two options:
- Traditional approach. Open a command window (“DOS box”) and
change directories until you are in the directory (folder) where the
p53MH program and its files reside. Then enter the command, p53MH.
- Double click on the p53MH (p53MH.exe) file. The program will
then execute and show progress on screen. Once it finishes the window
closes automatically. Alternatively, to execute the program and save
screen output to a file, double click on the run (run.bat) file. After
the program finishes, screen output may be viewed by inspecting the
screen.out file.
The program currently checks array bounds and may, thus, be somewhat
slow. Once all bugs have been eliminated the array bound checking
feature will be disabled.
Linux
This is analogous to option 1 above.
Output files
The following output files will be written by the program (some of them
only for certain parameter settings):
- p53res.out presents
detailed output
- p53ord.out essentially
provides the same output as above but in a format that is easy to
import into a spreadsheet
- p53psum.out: If computer
simulation is requested (input line 2) then this file is
written and contains p-values for sums of order statistics for the
scores.
- p53pmin.out: Analogous
output for minimum p-value for sums of order statistics (one
result per gene)
References
[1] Hoh J, Jin S, Parrado T, Edington J, Levine AJ, Ott J (2002) The
p53MH algorithm and its application in detecting p53-responsive genes.
Proc Natl Acad Sci USA 99,
8467-8472
[2] Perez CA, Ott J, Mays DJ, Pietenpol JA (2007) p63 consensus
DNA-binding site: identification, analysis and application into a p63MH
algorithm. Oncogene 26, 7363-7370