Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Installing itep on your machine

mattb112885 edited this page Jun 7, 2013 · 25 revisions

Operating systems

ITEP only runs on Linux (several of its dependencies, such as MCL, are Linux-only). There is a virtual machine available at (TODO - LINK) which includes all of the dependencies - see Using the ITEP virtual machine for details. The VM also includes a copy of ITEP that contains the genomes and a pre-built ITEP database used to write this tutorial. The VM can be run on any operating system (it has been tested on VirtualBox but can also likely be used with other virtualization software)

Downloading ITEP

ITEP is on github - you will need to install git on your machine first. On Ubuntu this is simply:

$ sudo apt-get install git

(you will need to be an administrator to do this). You will also need to create a Github account and upload your SSH public key to Github (see Github's help for details on how to set this up).

Once you have done this, create a folder for ITEP and navigate to it. Then run

$ git clone git@github.com:mattb112885/clusterDbAnalysis

Type your passphrase if you have one attached to your SSH key.

Verifying permissions

You will need to set the execute ("x") bit on all of the python and sh scripts in the repo (EXCEPT for SourceMe.sh) if they aren't already set.

Make sure all the .py and .sh files in the src/ directory are executable:

$ chmod u+x src/*.py

$ chmod u+x src/*.sh

Do the same with the src/internal and src/utilities directories:

$ chmod u+x src/internal/*.py

$ chmod u+x src/internal/*.sh

$ chmod u+x src/utilities/*.py

$ chmod u+x src/utilities/*.sh

Setting up paths and python directories

Note the location of the SourceMe.sh file (I'll call this $SOURCEDIR). Using your favorite editor (I use emacs) open up your .bashrc file (~/.bashrc) and add the following line to the bottom:

source $SOURCEDIR/SourceMe.sh

Save the file, then back in the shell enter the following command:

$ source ~/.bashrc

That's it! Now you can access all of the scripts in ITEP and the libraries are also accessible to Python commands.

Installing dependencies

The following is a complete list of direct dependencies for ITEP. Which dependencies you want to install depends on which parts of the code you want to run. You will need Python (2.6 or 2.7) and some form of BASH, both of which typically come with Linux systems. For easy installation you'll also want to install setuptools (using $sudo apt-get install python-setuptools).

REQUIRED packages:

  • NCBI BLAST+: Download from the NCBI website at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ . Note that the version of BLAST on the current Ubunutu repositories is NOT BLAST+.
  • MCL: Clustering tool that is the workhorse of the toolkit. Download from the MCL website (http://micans.org/mcl/) and follow their directions to install.
  • Sqlite: Download from the SQLite website (due to warning below). https://www.sqlite.org/ . Follow their directions to install
  • Python: If this doesn't come with your distro run $ sudo apt-get install python2.6 or python 2.7

(Python packages)

  • Biopython: Used for writing and reading files and some visualizations. $ sudo easy_install -f http://biopython.org/DIST/ biopython
  • ETE: Used for visualizing and manipulating phylogenetic trees. $ sudo easy-install -U ete2
  • Numpy\scipy: Used for many miscellaneous computations. $ sudo apt-get install python-numpy python-scipy (note many distributions already have these)
  • Ruffus: Used to parallelize BLAST and RPSBLAST computations. $ sudo easy-install -U ruffus

NOTE: Depending on your version of Ubuntu, the Sqlite that comes with it might be too old - it fails on data the size you will get with ITEP with significant numbers of genomes and one of them also had a string-handling bug that makes comparisons incorrect. If this is the case, download and compile the latest version from the SQLite website instead.

NOTE: The latest version of Biopython (1.61 as of the time of this writing) fixes a bug in reading genbank files from some sources like JGI. If you get errors from Biopython related to reading the genbank files, try upgrading your Biopython (and make sure to remove the old version too). Note that the version in the repos as of the time of this writing was 1.60 (which has the aforementioned bug).

NOTE:The latest versions of NCBI's CDD will ONLY compile with the newest RPSBLAST (so make sure you get the latest version of BLAST+). Unfortuantely, NCBI did not change the name of the RPSBLAST program when changing the syntax and input formats. Type "rpsblast -help" and make sure that the name "rpsblast" refers to the new version of RPSBLAST and not an old one (only the new one will actually show help with this command - the old one requires you to use --help), if you have both installed, before attempting to run main4.sh.

Useful external tools

The following packages are used only in small numbers of scripts and often be substituted with other programs that input and output the same file formats (e.g. FASTA / Newick) or that interface with other databases.

Note that MyRast requires you to have perl5 on your machine (often this is included in your distribution).

NOTE: If you want to run OrthoMCL you need to install it and its dependencies, which include MySQL, Perl 5, and the DBI and DBD::mysql packages, both of which can be installed (as root) by using CPAN if you don't have them:

$ sudo cpan DBI

$ sudo cpan DBD::mysql

OrthoMCL support is still somewhat beta. In particular it will fail due to memory limitations earlier than MCL by itself. Consider yourself warned.

  • Python packages for visualization

    matplotlib: Used for some plotting functions. $ sudo apt-get install python-matplotlib

    networkx: Used to make GML files for network visualization. $ sudo apt-get install python-networkx

    PyQt4: Required to visualize trees with ETE. Download it and its dependencies from http://www.riverbankcomputing.com/software/pyqt/download and follow directions to install.

    ReportLab: Needed for biopython visualizations. https://pypi.python.org/pypi/reportlab

Clone this wiki locally