Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Using the itep virtual machine

mattb112885 edited this page Jan 26, 2016 · 14 revisions

The ITEP virtual machine is intended to be a convenient way for people to build and perform analysis on ITEP databases from any host operating system (linux, mac, or windows). It includes a pre-built tutorial database, space for users to build their own database, and complete installations of all dependencies.

WARNING: The default size of the virtual hard drive is 50 GB, but it is dynamically allocated so as long as the host has sufficient hard drive space, it should expand to the size of your actual data. However, due to potential issues with this (e.g. it takes time to reallocate), and the significant computational resources required to build large databases, we recommend NOT using the VM to analyze large numbers of genomes (more than 50). For such analyses we instead recommend installing ITEP and its dependencies on a Linux server using the directions found at Installing ITEP on your machine

WARNING: The 32-bit VM is unable to run RPSBLAST on the complete CDD because it is too big to fit in memory. If you need to run RPSBLAST we recommend installing the dependencies on a Linux server instead or (if your host is capable) using the 64-bit version.

Obtaining the ITEP virtual machine

The ITEP virtual machine is currently hosted on Google Drive (it will very soon be migrated to a location on the Price Lab servers).

The current 32-bit VM is here: http://goo.gl/gqh6BG

The current 64-bit VM is here: http://goo.gl/nM4DWr

The code for version 1.0 is the same as what is included in the manuscript's supplemental material. Note that the VM code will likely be older than the latest code in this github repo. It will be updated to reflect any major changes. Minor changes can always be incorporated by running a "git pull origin master" (see below for details).

Loading the VM in VirtualBox

The virtual machine (VM) is a single .ova file. To open it in VirtualBox:

  1. Start up the VirtualBox program.
  2. Go to file -> import appliance...
  3. Click on the "open appliance" button and select the .ova file you downloaded.
  4. Click "next" then "import".

It will take about 10 minutes to import the appliance (this only needs to be done once). Once it's imported, you can select it and click "start" to boot it up.

By default the VM will only use one core. To improve the speed of database building and some of the analyses, you should allow it to use more cores. Once you load your machine into VirtualBox, right click on it and click "settings...". Then go to "system" on the left-side menu, and click on the "processors" tab. Note that you will only be able to do this if your host machine has hardware virtualization capabilities.

The Virtual machine (VM) Operating System is Ubuntu 13 with a single user "itep" (password is ITEP). To get started, type that password in if it asks for it, then click the "ITEP" icon on the desktop that appears. You will start in /home/itep .

What is in the VM

The VM includes two copies of ITEP - one pre-built database containing the three genomes we will focus on in this tutorial, and one empty copy. The empty one is intended to be used with a user's own genomes. The VM also includes up-to-date versions (as of May 2013) of all of the dependencies for the ITEP scripts.

The tutorial copy is located at /home/itep/LATEST_ITEP/TUTORIAL_EXAMPLE and includes pre-computed BLAST results and samples of clustering results that you can query right away after sourcing the SourceMe.sh file. The empty ITEP database (which the user can populate with his or her own genomes using the directions in this tutorial) is located at /home/itep/LATEST_ITEP/master

To choose one of the ITEP distributions to use and automatically point to it at log in time, edit the .bashrc file and comment out one of the clearly-labeled lines at the bottom of the file.

emacs -nw ~/.bashrc

Alternatively, just source the SourceMe.sh file for the copy that you want to use each time you log into the virtual machine (this is likely a better idea if you will be switching between them a lot).

The ITEP VM is not updated as often as the Github repository and therefore we recommend performing a git update before moving too far.

$ cd LATEST_ITEP/master
$ git pull origin master

It might be also worth ensuring that your version of Biopython is the latest, since updates to that package will often include improvements that increase the robustness of Genbank file parsing.

Clone this wiki locally