region-plot
is a bioinformatics pipeline to plot significant regions found by
genome-wide association studies (GWAS). It works on both Python 2 and 3
versions.
The following figure (low resolution) is an example from Tardif et al. (2015) (doi:10.1161/CIRCGENETICS.114.000663). The difference with the original figure is the inclusion of the annotation from the HAVANA project in this example.
The tool requires a standard Python installation with the following packages:
- numpy version 1.9.1 or latest
- pandas version 0.17.0 or latest
- six version 1.9.0 or latest
- matplotlib version 1.4.3 or latest
- gepyto version 0.9.2 or latest
The tool has been tested on Linux only, but should also work on both Mac OSX and Windows.
For Linux users, make sure that the script is executable (using the chmod command).
$ launch-region-plot --help
usage: launch-region-plot [-h] [-v] [--log-level {INFO,DEBUG}]
[--log-file LOGFILE] --assoc FILE --bfile PREFIX
[--imputed-sites FILE] [--significant FLOAT]
[--plot-p-lower FLOAT] [--snp-col COL]
[--chr-col COL] [--pos-col COL] [--p-col COL]
--genetic-map FILE [--genetic-chr-col COL]
[--genetic-pos-col COL] [--genetic-rate-col COL]
[--plot-format {png,pdf}] [--build {GRCh37,GRCh38}]
[--region-padding FLOAT] [--whole-dataset]
Plots significant regions of GWAS.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--log-level {INFO,DEBUG}
The logging level [INFO]
--log-file LOGFILE The log file [region-plot.log]
Input Files:
--assoc FILE The association file containing the hits
--bfile PREFIX The prefix of the binary PEDFILE to compute LD with
best hit
--imputed-sites FILE The file containing the imputed sites (if absent, all
points will have the same darkness)
Association Options:
--significant FLOAT The significant association threshold [<5.000000e-08]
--plot-p-lower FLOAT Plot markers with p lower than value [<5.000000e-08]
--snp-col COL The name of the SNP column [snp]
--chr-col COL The name of the chromosome column [chr]
--pos-col COL The name of the pos column [pos]
--p-col COL The name of the p-value column [p]
Genetic Map Options:
--genetic-map FILE The file containing the genetic map
--genetic-chr-col COL
The name of chromosome column for the genetic map
[chromosome]
--genetic-pos-col COL
The name of the position column for the genetic map
[position]
--genetic-rate-col COL
The name of the recombination rate column for the
genetic map [rate]
Plot Options:
--plot-format {png,pdf}
The format of the output file containing the plot
(might be 'png' or 'pdf') [png]
--build {GRCh37,GRCh38}
The build to search the overlapping genes [GRCh37]
--region-padding FLOAT
The amount of base pairs to pad the region (on each
side of the best hit [500000.0]
--whole-dataset Plot all markers (no padding) (WARNING this might take
a lot of memory)