From 464ac32327549747cb3b4025728d6dea9a267c18 Mon Sep 17 00:00:00 2001 From: Mehrtash Babadi Date: Thu, 1 Aug 2024 12:56:24 -0400 Subject: [PATCH] CAS-63 Visualization tool usage tip in the notebook (#76) * visualization tool usage tip in the notebook * some more text * Update notebooks/quickstart_tutorial.ipynb Co-authored-by: Kevin Lydon * Update some typos and add note about downsampling * Update notebooks/quickstart_tutorial.ipynb Co-authored-by: Kevin Lydon * Update notebooks/quickstart_tutorial.ipynb * Update notebooks/quickstart_tutorial.ipynb Co-authored-by: Kevin Lydon --------- Co-authored-by: Nick Malfroy-Camine Co-authored-by: Kevin Lydon --- notebooks/quickstart_tutorial.ipynb | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/notebooks/quickstart_tutorial.ipynb b/notebooks/quickstart_tutorial.ipynb index 142ace0..aba3eb3 100644 --- a/notebooks/quickstart_tutorial.ipynb +++ b/notebooks/quickstart_tutorial.ipynb @@ -177,9 +177,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We notice that Gene Symbols (names) serve as the index of the ``.var`` DataFrame, and ENSEMBLE Gene IDs are provided under ``gene_ids`` column. We take note of both for the next steps.\n", + "We notice that Gene Symbols (names) serve as the index of the ``.var`` DataFrame, and Ensembl Gene IDs are provided under ``gene_ids`` column. We take note of both for the next steps.\n", "\n", - ">**Note:** CAS requires both Gene Symbols and ENSEMBLE Gene IDs. If you do not have either available in your AnnData file, please update your AnnData file before proceeding to the next steps. We recommend using [BioMart](http://www.ensembl.org/info/data/biomart/index.html) for converting Gene Symbols to ENSEMBLE Gene IDs or vice versa." + ">**Note:** CAS requires both Gene Symbols and Ensembl Gene IDs. If you do not have either available in your AnnData file, please update your AnnData file before proceeding to the next steps. We recommend using [BioMart](http://www.ensembl.org/info/data/biomart/index.html) for converting Gene Symbols to Ensembl Gene IDs or vice versa." ] }, { @@ -241,7 +241,7 @@ "source": [ "At this point, we are ready to submit our AnnData file to CAS for annotation.\n", "\n", - ">**Note:** Before you proceed, you may need to modify the next cell as necessary for your dataset. CAS must be pointed to the appropriate columns in the ``.var`` DataFrame for fetching Gene Symbols and ENSEMBLE Gene IDs. This is done by setting ``feature_names_column_name`` and ``feature_ids_column_name`` arguments accordingly. If either appears as the index of the ``.var`` DataFrame, use `index` as argument. Otherwise, use the appropriate column name. " + ">**Note:** Before you proceed, you may need to modify the next cell as necessary for your dataset. CAS must be pointed to the appropriate columns in the ``.var`` DataFrame for fetching Gene Symbols and Ensembl Gene IDs. This is done by setting ``feature_names_column_name`` and ``feature_ids_column_name`` arguments accordingly. If either appears as the index of the ``.var`` DataFrame, use `index` as argument. Otherwise, use the appropriate column name. " ] }, { @@ -313,9 +313,17 @@ "source": [ "### Exploring the Cellarium CAS response\n", "\n", - "We recommending exploring the CAS response using our provided ``CASCircularTreePlotUMAPDashApp`` Dash App for a more streamlined and holistic visualization of the CAS response. The visualization is self-explanatory.\n", + "We recommend exploring the CAS response using our provided ``CASCircularTreePlotUMAPDashApp`` Dash App for a more streamlined and holistic visualization of the CAS response.\n", "\n", - ">**Note:** In a nutshell, the visualization shows various cell type ontology terms as colored circles. The size of the circle signifies the occurence of the term in the entire dataset (or over the chosen group of cells). The color of the circle signifies the relevance score of the term in cells over which the term was found to have any degree of relevance. You can highlight the ontology term relevance scores over the UMAP scatter plot by clicking on the circles. You can also show the terms relevant to individual cells and their scores by clicking on a cell over the UMAP scatter plot. " + ">**Tooltip:** The visualization displays various cell type ontology terms as colored circles in a circular dendrogram. The relationships underlying this dendrogram correspond to \"_is_a_\" relationships from [Cell Ontology](https://obofoundry.org/ontology/cl.html) (CL). Since these relationships are not mutually exclusive, a term can have multiple parent terms, meaning the same term can appear along different branches of the tree representation. The radius of each circle (whether it is a clade or a leaf node) signifies the occurrence of the term in the entire dataset, regardless of its relevance score. The color of the circle indicates the relevance score of the term in cells where it was found to have non-vanishing relevance.\n", + ">\n", + "> Here are some of the interactive capabilities of the visualization app:\n", + "> - **Cell selection:** By default, all cells are selected, and the cell type ontology dendrogram shows an aggregated summary over all cells. You can restrict the aggregation to a subset of cells by selecting your desired subset over the UMAP scatter plot clicking a single cell or using the rectangular select or lasso select tool. The dendrogram will react to your custom cell selection. If your input AnnData file includes clustering, you can restrict score aggregation to each cluster by selecting your cluster in the Settings panel (accessible via the gear icon in the upper right of the app). \n", + "> - **Highlighting ontology term relevance scores:** You can highlight cell type ontology term relevance scores over the UMAP scatter plot by clicking on the circles in the dendrogram. Only the selected cells will be scored, and the rest will be grayed out. You can revert to selecting all cells from the settings panel or by using the rectangular select tool to select all cells.\n", + "> - **Studying the ontology term relevance scores for a single cell:** You can display the term relevance scores for individual cells by clicking on a single cell in the UMAP scatter plot.\n", + "> - **Advanced settings:** By default, only terms above a specified relevance threshold with occurrence above another threshold over the selected cells are shown. You can modify these thresholds in the Settings panel (accessible via the gear icon in the upper right of the app).\n", + ">\n", + ">**Note**: The number of cells displayed should be limited to roughly 50K. Beyond that, performance may suffer. If you need to visualize more cells, please attempt to downsample your cells." ] }, { @@ -365,7 +373,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Assing cell type calls to individual cells" + "#### Assign cell type calls to individual cells" ] }, {