-
Notifications
You must be signed in to change notification settings - Fork 15
Installing itep on your machine
ITEP only runs on Linux (several of its dependencies, such as MCL, are Linux-only). There is a virtual machine available at (TODO - LINK) which includes all of the dependencies - see Using the ITEP virtual machine for details. The VM also includes a copy of ITEP that contains the genomes and a pre-built ITEP database used to write this tutorial. The VM can be run on any operating system (it has been tested on VirtualBox but can also likely be used with other virtualization software)
ITEP is on github - you will need to install git on your machine first. On Ubuntu this is simply:
$ sudo apt-get install git
(you will need to be an administrator to do this). You will also need to create a Github account and upload your SSH public key to Github (see Github's help for details on how to set this up).
Once you have done this, create a folder for ITEP and navigate to it. Then run
$ git clone git@github.com:mattb112885/clusterDbAnalysis
Type your passphrase if you have one attached to your SSH key.
Note the location of the SourceMe.sh file (I'll call this $SOURCEDIR). Using your favorite editor (I use emacs) open up your .bashrc file (~/.bashrc) and add the following line to the bottom:
source $SOURCEDIR/SourceMe.sh
Save the file, then back in the shell enter the following command:
$ source ~/.bashrc
That's it! Now you can access all of the scripts in ITEP and the libraries are also accessible to Python commands.
There is a complete list of what these dependencies are for in the doc/INSTALL file. Which dependencies you want to install depends on which parts of the code you want to run (easy-install requires you to install setuptools first). You will need Python (2.6 or 2.7) and some form of BASH, both of which typically come with Linux systems.
NCBI BLAST+: Download from the NCBI website at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
MCL: $ sudo apt-get install mcl
Sqlite: $ sudo apt-get install sqlite3
Python: $ sudo apt-get install python2.6 or python 2.7
(Python packages)
Biopython: $ sudo apt-get install biopython
ETE: $ sudo easy-install -U ete2
Numpy\scipy: $ sudo apt-get install python-numpy python-scipy (note many distributions already have these)
Ruffus: $ sudo easy-install -U ruffus
NOTE - depending on your version of Ubuntu, the Sqlite that comes with it might be too old - it fails on data the size you will get with ITEP with significant numbers of genomes and one of them also had a string-handling bug that makes comparisons incorrect. If this is the case, download and compile the latest version from the SQLite website instead.
NOTE - The latest version of Biopython (1.61 as of the time of this writing) fixes a bug in reading genbank files from some sources like JGI. If you get errors from Biopython related to reading the genbank files, try upgrading your Biopython (and make sure to remove the old version too).
NOTE - The latest versions of NCBI's CDD will ONLY compile with the newest RPSBLAST (so make sure you get the latest version of BLAST+). Unfortuantely, NCBI did not change the name of the RPSBLAST program when changing the syntax and input formats. Type "rpsblast -help" and make sure you are pointing at the new version of RPSBLAST and not an old one (only the new one will actually show help with this command - the old one requires you to use --help), if you have both installed, before attempting to run main4.sh.
The following packages are used only in small numbers of scripts and often be substituted with other programs that input and output the same file formats (e.g. FASTA / Newick) or that interface with other databases.
FastTreeMP - Download from http://www.microbesonline.org/fasttree/ and follow directions to compile.
FastTree comparison tools: Download from http://www.microbesonline.org/fasttree/treecmp.html and add the Perl scripts to your PERL5LIB (you will also need Phylip to use them).
MAFFT: Download from http://mafft.cbrc.jp/alignment/software/source.html and follow directions to compile.
MyRast: Download from http://blog.theseed.org/servers/installation/distribution-of-the-seed-server-packages.html and follow directions to compile.
OrthoMCL: Download it from http://orthomcl.org/common/downloads/software/ (we support v2.0). However see notes below.
RAxML: Clone it from Github at https://github.com/stamatak/standard-RAxML and follow the directions to compile it.
NOTE: If you want to run OrthoMCL you need to install it and its dependencies, which include MySQL, Perl 5, and the DBI and DBD::mysql packages, both of which can be installed (as root) by using CPAN if you don't have them:
$ sudo cpan DBI
$ sudo cpan DBD::mysql
OrthoMCL support is still somewhat beta.
These are each only used by a few of the scripts. If you don't need those scripts, you don't need to download these.
matplotlib: $ sudo apt-get install python-matplotlib
networkx: $ sudo apt-get install python-networkx
PyQt4: Download it and its dependencies from http://www.riverbankcomputing.com/software/pyqt/download and follow directions to install.
ReportLab: https://pypi.python.org/pypi/reportlab