Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Installing itep on your machine

mattb112885 edited this page May 4, 2013 · 25 revisions

Operating systems

ITEP only runs on Linux (several of its dependencies, such as MCL, are Linux-only). There is a virtual machine available at (TODO - LINK) which includes all of the dependencies - see Using the ITEP virtual machine for details. The VM also includes a copy of ITEP that contains the genomes and a pre-built ITEP database used to write this tutorial. The VM can be run on any operating system (it has been tested on VirtualBox but can also likely be used with other virtualization software)

Downloading ITEP

ITEP is on github - you will need to install git on your machine first. On Ubuntu this is simply:

$ sudo apt-get install git

(you will need to be an administrator to do this). You will also need to create a Github account and upload your SSH public key to Github (see Github's help for details on how to set this up).

Once you have done this, create a folder for ITEP and navigate to it. Then run

$ git clone git@github.com:mattb112885/clusterDbAnalysis

Type your passphrase if you have one attached to your SSH key.

Setting up paths and python directories

Note the location of the SourceMe.sh file (I'll call this $SOURCEDIR). Using your favorite editor (I use emacs) open up your .bashrc file (~/.bashrc) and add the following line to the bottom:

source $SOURCEDIR/SourceMe.sh

Save the file, then back in the shell enter the following command:

$ source ~/.bashrc

That's it! Now you can access all of the scripts in ITEP and the libraries are also accessible to Python commands.

Installing dependencies

There is a complete list of what these dependencies are for in the doc/INSTALL file. Which dependencies you want to install depends on which parts of the code you want to run (easy-install requires you to install setuptools first). You will need Python (2.6 or 2.7) and some form of BASH, both of which typically come with Linux systems.

REQUIRED packages:

NCBI BLAST+: Download from the NCBI website at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
MCL: $ sudo apt-get install mcl
Sqlite: $ sudo apt-get install sqlite3
Python: $ sudo apt-get install python2.6 or python 2.7
(Python packages)
Biopython: $ sudo apt-get install biopython
ETE: $ sudo easy-install -U ete2
Numpy\scipy: $ sudo apt-get install python-numpy python-scipy (note many distributions already have these)
Ruffus: $ sudo easy-install -U ruffus

NOTE - depending on your version of Ubuntu, the Sqlite that comes with it might be too old - it fails on data the size you will get with ITEP with significant numbers of genomes and one of them also had a string-handling bug that makes comparisons incorrect. If this is the case, download and compile the latest version from the SQLite website instead.

NOTE - The latest version of Biopython (1.61 as of the time of this writing) fixes a bug in reading genbank files from some sources like JGI. If you get errors from Biopython related to reading the genbank files, try upgrading your Biopython (and make sure to remove the old version too).

NOTE - The latest versions of NCBI's CDD will ONLY compile with the newest RPSBLAST (so make sure you get the latest version of BLAST+). Unfortuantely, NCBI did not change the name of the RPSBLAST program when changing the syntax and input formats. Type "rpsblast -help" and make sure you are pointing at the new version of RPSBLAST and not an old one (only the new one will actually show help with this command - the old one requires you to use --help), if you have both installed, before attempting to run main4.sh.

Useful external tools

The following packages are used only in small numbers of scripts and often be substituted with other programs that input and output the same file formats (e.g. FASTA / Newick) or that interface with other databases.

FastTreeMP - Download from http://www.microbesonline.org/fasttree/ and follow directions to compile.
FastTree comparison tools: Download from http://www.microbesonline.org/fasttree/treecmp.html and add the Perl scripts to your PERL5LIB (you will also need Phylip to use them).
MAFFT: Download from http://mafft.cbrc.jp/alignment/software/source.html and follow directions to compile.
MyRast: Download from http://blog.theseed.org/servers/installation/distribution-of-the-seed-server-packages.html and follow directions to compile.
OrthoMCL: Download it from http://orthomcl.org/common/downloads/software/ (we support v2.0). However see notes below.
RAxML: Clone it from Github at https://github.com/stamatak/standard-RAxML and follow the directions to compile it.

NOTE: If you want to run OrthoMCL you need to install it and its dependencies, which include MySQL, Perl 5, and the DBI and DBD::mysql packages, both of which can be installed (as root) by using CPAN if you don't have them:

$ sudo cpan DBI
$ sudo cpan DBD::mysql

OrthoMCL support is still somewhat beta.

Other useful Python packages:

These are each only used by a few of the scripts. If you don't need those scripts, you don't need to download these.

matplotlib: $ sudo apt-get install python-matplotlib
networkx: $ sudo apt-get install python-networkx
PyQt4: Download it and its dependencies from http://www.riverbankcomputing.com/software/pyqt/download and follow directions to install.
ReportLab: https://pypi.python.org/pypi/reportlab
Clone this wiki locally