-
Notifications
You must be signed in to change notification settings - Fork 15
Analyzing gene neighborhoods
We have done lots of analysis of the C. beijerinckii 6-phosphofructokinase, but how do we know whether it has the same function as the ones in the other organisms? One of the most powerful ways to tell this is by looking at the gene context of this gene and its homologs and seeing if it is the same.
ITEP includes two main ways to look at gene neighborhoods: one is to generate a tab-delimited list of a gene's neighbors and the other is to add gene neighborhood information to a protein tree.
The gene neighborhoods for a list of genes can be obtained in a convenient tabular format using the db_getGeneNeighborhoods.py function, which calls upon a pre-cached table of calculated neighborhoods for each gene in your genomes (up to a maximum of 10 genes in each direction).
We determined earlier that fig|290402.1.peg.4768 was the ITEP ID for one of the three annotated 6-phosphofructokinases (and as we'll see later, the most-conserved of the three). The neighborhoods of this gene are determined as follows:
$ echo "fig|290402.1.peg.4768" | db_getGeneNeighborhoods.py
fig|290402.1.peg.4768 fig|290402.1.peg.4765 -3 290402.1.NC_009617.1 5673448 5674506 + galactoside ABC transporter periplasmic D-galactose/D-glucose-binding protein_YP_001311911.1_Cbei_4849
fig|290402.1.peg.4768 fig|290402.1.peg.4766 -2 290402.1.NC_009617.1 5676687 5675308 - RNA methyltransferase_YP_001311912.1_Cbei_4850
fig|290402.1.peg.4768 fig|290402.1.peg.4767 -1 290402.1.NC_009617.1 5679010 5677589 - pyruvate kinase_YP_001311913.1_Cbei_4851
fig|290402.1.peg.4768 fig|290402.1.peg.4768 0 290402.1.NC_009617.1 5680111 5679155 - 6-phosphofructokinase_YP_001311914.1_Cbei_4852
fig|290402.1.peg.4768 fig|290402.1.peg.4769 1 290402.1.NC_009617.1 5684143 5680568 - DNA polymerase III DnaE_YP_001311915.1_Cbei_4853_dnaE
fig|290402.1.peg.4768 fig|290402.1.peg.4770 2 290402.1.NC_009617.1 5684531 5684247 - stress responsive alpha-beta barrel domain-containing protein_YP_001311916.1_Cbei_4854
fig|290402.1.peg.4768 fig|290402.1.peg.4771 3 290402.1.NC_009617.1 5685502 5684555 - hypothetical protein_YP_001311917.1_Cbei_4855
In this table the first column is the center gene's ID, the second is the neighboring gene's ID, the third is the number of genes away from the center, the fourth is the contig ID, the fifth and sixth are the start and stop locations of the neighboring gene, the seventh is the strand of the neighbor gene and the last is the annotation.
If the genome is incomplete genes will often fall on the ends of contigs. In such a case, the db_getGeneNeighborhoods.py function will only print out neighbors until it hits the end of a contig.
In another tutorial, we created the following Newick tree for one of the 6-phosphofructokinase clusters:
(fig_290402_1_peg_4768:0.15942,fig_931626_1_peg_1249:0.69610,fig_386415_1_peg_406:0.19652);
Create a file called "pfk_tree" containing this string. To visualize the neighborhoods of these genes, you need to pick a cluster run to use as a basis for coloring (we used all_I_2.0_c_0.4_m_maxbit):
$ cat pfk_tree | db_makeNeighborhoodTree.py -r all_I_2.0_c_0.4_m_maxbit -p pfk_tree -d
(You can omit the -d flag if you just want to save the results to a file and not look at them). The result should look like the below.
The function automatically replaces the gene IDs in the table above with human-readable labels (including organism name and annotation). The arrows for genes on the tree have red borders to distinguish them from their neighbors. The legend above the tree shows the cluster ID corresponding to each color on the tree.
A protein tree with tBLASTn IDs (last column of a results table from the db_TBlastN_wrapper.py script) instead of gene IDs can also be used as input to this function. The db_makeNeighborhoodTree.py function will automatically search for genes neighboring the location of the tBLASTn hit and append those to the tree.