-
Notifications
You must be signed in to change notification settings - Fork 2
plot_expression_tutorial
In this tutorial, we will visualize gene expression resulting from basic EGRIN 2.0
queries
You can download a blank slate version of this tutorial for editing on your local machine here
As described in the System Requirements, there are several dependencies that need to be satisfied to complete this tutorial, including:
- pymongo
- numpy
- pandas
- joblib
- scipy
- statsmodels
- itertools
To query the ensemble, we must first load all of the query functions.
import query.egrin2_query as e2q
import query.egrin2_plot as e2p
import pymongo
We will also define the host and the database that we'd like to use. Host
is the name of the machine hosting the EGRIN 2.0
MongoDB while db
is the organism-specific EGRIN 2.0
database name to query.
client = pymongo.MongoClient(host='baligadev')
db = client['eco_db']
Here we will retrieve genes and experiments in which these genes are co-regulated from a specific corem.
This type of information can be retrieved using the coremFinder
function. To call this function we need to specify:
-
x
: our query -
x_type
: our query type. This could be corems, genes, conditions, GREs, or specific-coregulatory edges. In this case we will usecorem
. -
y_type
: our target type. This is the type of information we would like to retrieve. The type can be any type described byx_type
In addition we include the host
and db
variables defined above.
corem = 1
corem_genes = e2q.find_corem_info(db, corem, x_type="corem_id", y_type="genes")
corem_genes.sort()
print "\nThere are %d genes in corem %d, including:\n" % (len(corem_genes), corem)
for i in corem_genes:
print i, " is also called ", e2q.row2id_batch(db, [i], return_field="name" )[0]
ERROR:root:Cannot identify row name: genes
There are 9 genes in corem 1, including:
genes is also called None
This query will also use the coremFinder
function. To find conditions
associated with our corem rather than genes, we simply change the y_type
argument.
corem_conditions = e2q.find_corem_info(db, corem, x_type="corem", y_type="conditions")
corem_conditions.sort()
print "\nThere are %d conditions in which these genes are co-regulated, including:\n" % len(corem_conditions)
for i in corem_conditions[0:10]:
print i
There are 418 conditions in which these genes are co-regulated, including:
conditions
To retrieve gene expression values for these genes and experiments we use the expressionFinder
function. To call this function we specific the rows
(genes) and columns
(conditions), as well as the host
and db
as before.
gene_expression = e2q.find_gene_expression(db, rows=corem_genes, cols=corem_conditions)
/usr/lib/python2.7/dist-packages/pymongo/cursor.py:848: RuntimeWarning:
couldn't encode - reloading python modules and trying again. if you see this without getting an InvalidDocument exception please see http://api.mongodb.org/python/current/faq.html#does-pymongo-work-with-mod-wsgi
---------------------------------------------------------------------------
InvalidDocument Traceback (most recent call last)
<ipython-input-14-b1189e2cd1db> in <module>()
----> 1 gene_expression = e2q.find_gene_expression(db, rows=corem_genes, cols=corem_conditions)
/users/wwu/egrin2-tools/query/egrin2_query.pyc in find_gene_expression(db, rows, cols, standardized)
781 # translate rows/cols
782
--> 783 rows = row2id_batch(db, rows, return_field="row_id", input_type=input_type_rows)
784 cols = col2id_batch(db, cols, return_field="col_id", input_type=input_type_cols)
785
/users/wwu/egrin2-tools/query/egrin2_query.pyc in row2id_batch(db, rows, return_field, input_type)
69 "egrin2_row_name": 1,
70 "GI": 1, "accession": 1,
---> 71 "name": 1})))
72
73 if input_type in ["row_id", "egrin2_row_name", "GI", "accession", "name", "sysName"]:
/usr/lib/python2.7/dist-packages/pymongo/cursor.pyc in next(self)
902 raise StopIteration
903 db = self.__collection.database
--> 904 if len(self.__data) or self._refresh():
905 if self.__manipulate:
906 return db._fix_outgoing(self.__data.popleft(),
/usr/lib/python2.7/dist-packages/pymongo/cursor.pyc in _refresh(self)
846 self.__skip, ntoreturn,
847 self.__query_spec(), self.__fields,
--> 848 self.__uuid_subtype))
849 if not self.__id:
850 self.__killed = True
InvalidDocument: Cannot encode object: genes
0 b3317
1 b3320
2 b3319
3 b3313
4 b3315
5 b3318
6 b3314
7 b3321
8 b3316
[9 rows x 1 columns]
We have several options for plotting these gene expression values. We could plot the expression values as lines, or in a heatmap, or even as a boxplot for all genes in each condition.
Each of these visualizations is available by calling a single function, plotExpression
. To call this function we must provide:
-
data
: this is the gene expression values, a Pandas data frame -
plot_type
: this is the type of plot to draw. Can beboxplot' (default),
line, or
heatmap` -
ipynb
: logical indicating whether the plot will be drawn in an iPython notebook -
sort
: optionally sort the data (default:FALSE
)
It's important to note that this function only requires a Pandas data frame, meaning you can use it to plot any kind of data (e.g. loaded from a text file).
Additionally, if we are producing the plot in an iPython notebook, we need to set the argument ipynb
= TRUE
and call an additional function py.iplot
on the value returned from plotExpression
.
Below are three examples, calling the plotExpression
function with three different values for the plot_type
argument: line
, heatmap
, and boxplot
.
line_plot = e2p.plotExpression(data=gene_expression, plot_type = "line", ipynb=True, sort=False)
py.iplot(line_plot)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-15-ce054f5158bf> in <module>()
----> 1 line_plot = e2p.plotExpression(data=gene_expression, plot_type = "line", ipynb=True, sort=False)
2 py.iplot(line_plot)
NameError: name 'gene_expression' is not defined
heatmap = plotExpression( data = gene_expression, plot_type = "heatmap", ipynb = True )
py.iplot( heatmap )
boxplot = plotExpression( gene_expression, plot_type = "boxplot", ipynb = True, sort = True )
py.iplot( boxplot )
Here is the code. You can copy this to your own notebook or download a blank slate notebook here.
# PLOT GENE EXPRESSION
# prelims
from query.egrin2_query import *
host = ""
db = ""
corem = 1
# find corem genes
corem_genes = coremFinder(x = corem,x_type = "corem", y_type="genes",host=host,db=db)
# find corem conditions
corem_conditions = coremFinder(x = corem,x_type = "corem", y_type="conditions", host=host, db=db)
# get gene expression
gene_expression = expressionFinder(rows=corem_genes,cols=corem_conditions,host=host,db=db)
# plot
plot = plotExpression( data = gene_expression, plot_type = "line", ipynb = True, sort = False )
py.iplot( plot )