ROAGUE is a tool to reconstruct ancestors of gene blocks in prokaryotic genomes. Gene blocks are genes co-located on the chromosome. In many cases, gene blocks are conserved between bacterial species, sometimes as operons, when genes are co-transcribed. The conservation is rarely absolute: gene loss, gain, duplication, block splitting and block fusion are frequently observed.
ROAGUE accepts a set of species and a gene block in a reference species. It then finds all gene blocks, orhtologous to the reference gene blocks, and reconsructs their ancestral states.
- Python 2.7.6+
- Biopython 1.63+ (python-biopython)
- Muscle Alignment
- ncbi-tools (ncbi-tools-bin)
- BLAST2 (blast2)
- BLAST+ (ncbi-blast+)
- ete3 (python framework for trees)
User can either use github interface Download or type the following command in command line:
git clone
For the requirements, everything but ete3 can be installed using the following command line:
sudo apt-get install python-biopython ncbi-tools-bin blast2 ncbi-blast+ muscle
For ete3, user can check installation instructions on this website:
The easiest way to run the project is to execute the script roague. The user can run this script on the two data sets provided in directory E.Coli and B.Sub. The two following command line will run roague on our 2 directories. The final results (pdf file of our ancestral reconstruction) are stored in E.Coli/visualization
and B.Sub/visualization
./ -g E.Coli/genomes/ -b E.Coli/gene_block_names_and_genes.txt -r NC_000913 -f E.Coli/phylo_order.txt -m global
./ -g B.Sub/genomes/ -b B.Sub/gene_block_names_and_genes.txt -r NC_000964 -f B.Sub/phylo_order.txt -m global
Each accompanying script can be run on its own as well, and each help for each script can be found by using the -h or --help option.
./ -h
usage: [-h] [--genomes_directory GENOMES_DIRECTORY] [--gene_blocks GENE_BLOCKS] [--reference REFERENCE] [--filter FILTER] [--method METHOD]
optional arguments:
-h, --help show this help message and exit
The directory that store all the genomes file in genbank format.
--gene_blocks GENE_BLOCKS, -b GENE_BLOCKS The gene_block_names_and_genes.txt file, this file stores the operon name and its set of genes.
--reference REFERENCE, -r REFERENCE The ncbi accession number for the reference genome. (NC_000913 for E.Coli and NC_000964 for B.Sub)
--filter FILTER, -f FILTER
The filter file for creating the tree.
for E.Coli or
for B.Sub)
--method METHOD, -m METHOD The method to reconstruct ancestral gene block, we support either global or local approach.