Converting files from plink to sumstat format
Jurg Ott 26 Feb 2015
program uses the LINKAGE input format, which is a widely used format in
genetic linkage and association analysis. The sumstat suite of
programs has been developed using a different input format.
Specifically, it lists each SNP on a separate input line (row) rather
than an input column. I have written a small utility program to convert
formats from plink
and vice versa. Also, another small program, copylines, is
discussed below. The program package may be downloaded as a zip
Using the p2s program
First, you need to ask
plink to make a file in its
that is, with rows rather than columns representing SNPs. Assume that
your plink files are data.ped and data.map. Then you type the
plink --file data --transpose --recode12 (in plink version 1.07)
or, if your
data are in binary format, use --bfile data.
plink --file data --recode 12 transpose (in plink version 1.9)
The plink program will
produce two files, plink.tped and plink.tfam. Run the conversion
program, p2s, and follow instructions. The
program requires certain
parameters before it can run. Below is a list of questions the program
will ask along
with suitable responses. Default responses, in parentheses, may be
chosen by simply
pressing the Enter key.
of tped file? (press Enter for plink.*)
In response to this question you type, for example,
data.tped. If your files are plink.tped and plink.tfam then both are
read after you simply press the Enter key. Otherwise, the p2s program
will ask you for the name of the tfam file.
SNPs be ordered by chromosomal position? (y/n)
If you type y then the p2s program will
order the resulting dataset by chromosomal position. It attempts to do
this with the Windows sort command. Note: The sort command has a maximum
record (line) length of 65535 characters, which the p2s
program uses. If your data are longer than that limit of characters
then the sort command will issue a message saying "Input record exceeds
maximum length". In this case, repeat the conversion process but
specify n in response to the last
question. SNPs will then not be
ordered by the p2s program but they might
already have been in
chromosomal order in the original plink format.
may want to run the p2s program with a parameter file, for example,
p2s.param. An example for such a file is included in the program
package. It must contain 3 lines as follows:
1) Name of tped file
2) Name of tfam file
3) y or n (for yes or no for sorting SNPs by chromosomal position).
With this file, you run the program by typing, p2s <p2s.param.
Converting from sumstat to plink format
An analogous program, s2p,
allows for the reverse conversion. This program may still require some
modification. If you encounter problems using it please send me email.
Working with very large files
Windows, there is generally no text editor that can handle very large
files and displaying line and column numbers. The most suitable
program, not using
line numbers, is probably wordpad. To see only parts of a large
file, a small utility program, copylines,
is included that allows you to copy parts of a large file (a certain
range of lines and columns) to a new file, which should be much smaller
and can be viewed with a regular editor like the Crimson editor. Its usage is
self-explanatory. Also, notepad++ is very nice and useful.