Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to tidyverse and scIntro notebooks #760

Merged
merged 4 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion intro-to-R-tidyverse/03-intro_to_tidyverse.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,7 @@ stats_df |>
#### Save to TSV files

Let's write some of the data frames we created to a file.
To do this, we can use the `readr` library of `_write()` functions.
To do this, we can use the `readr` library of `write_()` functions.
The first argument of `write_tsv()` is the data we want to write, and the second argument is a character string that describes the path to the new file we would like to create.
Remember that we created a `results` directory to put our output in, but if we want to save our data to a directory other than our working directory, we need to specify this.
This is what we will use the `file.path()` function for.
Expand Down
40 changes: 34 additions & 6 deletions scRNA-seq/03-normalizing_scRNA.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,10 @@ One way to determine whether our normalization yields biologically relevant resu
Because plotting expression for thousands genes together isn't practical, we will reduce the dimensions of our data using Principal Components Analysis (PCA).

We will also make the same plot with our *unnormalized* data, to visualize the effect of normalization on our sample.
We'll do this comparison twice:

- Once coloring the points by their total UMI count
- Once coloring the points based on their cell labels

Before plotting the unnormalized data, we will log transform the raw counts to make their scaling more comparable to the normalized data.
To do this we will use the `log1p()` function, which is specifically designed for the case where we want to add 1 to all of our values before taking the log, as we do here.
Expand All @@ -166,27 +170,28 @@ log_pca <- counts(bladder_sce) |> # get the raw counts
Note that we are using `scater::calculatePCA()` two different ways here: once on the full `bladder_sce` object, and once on just the `counts` matrix.
When we use `calculatePCA()` on the object, it automatically uses the log normalized matrix from inside the object.

Next we will arrange the PCA scores for plotting, adding a column with the cell-type data so we can color each point of the plot.
Next we will arrange the PCA scores for plotting, adding a column for each of the total UMI counts and the cell type labels so we can color each point of the plot.

```{r pca_df}
# Set up the PCA scores for plotting
norm_pca_scores <- data.frame(norm_pca,
geo_accession = rownames(norm_pca),
total_umi = bladder_sce$sum,
cell_type = bladder_sce$cell_ontology_class)
log_pca_scores <- data.frame(log_pca,
geo_accession = rownames(log_pca),
total_umi = bladder_sce$sum,
cell_type = bladder_sce$cell_ontology_class)
```

Now we will plot the unnormalized PCA scores with their cell labels:
First, we will plot the unnormalized PCA scores with their total UMI counts:

```{r pca_plot}
# Now plot counts pca
ggplot(log_pca_scores, aes(x = PC1, y = PC2, color = cell_type)) +
ggplot(log_pca_scores, aes(x = PC1, y = PC2, color = total_umi)) +
geom_point() +
labs(title = "Log counts (unnormalized) PCA scores",
color = "Cell Type") +
scale_color_brewer(palette = "Dark2", na.value = "grey70") + # add a visually distinct color palette
color = "Total UMI count") +
theme_bw()
```

Expand All @@ -197,15 +202,38 @@ Feel free to customize the plot with a different theme or color scheme!
Let's plot the `norm_pca_scores` data:

```{r norm_pca_plot, live = TRUE}
ggplot(norm_pca_scores, aes(x = PC1, y = PC2, color = total_umi)) +
geom_point() +
labs(title = "Normalized log counts PCA scores",
color = "Total UMI count") +
sjspielman marked this conversation as resolved.
Show resolved Hide resolved
theme_bw()
```

Do you see an effect from the normalization when comparing these plots?



Now, let's plot these two sets of PCA scores again, but colored by cell type.
Do you see an effect from the normalization when comparing these plots?

```{r celltype_pca_plots}
# First, plot the normalized pca
ggplot(norm_pca_scores, aes(x = PC1, y = PC2, color = cell_type)) +
geom_point() +
labs(title = "Normalized log counts PCA scores",
color = "Cell Type") +
scale_color_brewer(palette = "Dark2", na.value = "grey70") + # add a visually distinct color palette
theme_bw()

# Next, plot log count pca
ggplot(log_pca_scores, aes(x = PC1, y = PC2, color = cell_type)) +
geom_point() +
labs(title = "Log counts (unnormalized) PCA scores",
color = "Cell Type") +
scale_color_brewer(palette = "Dark2", na.value = "grey70") + # add a visually distinct color palette
theme_bw()
```

Do you see an effect from the normalization in the comparison between these plots?


## Save the normalized data to tsv file
Expand Down
Loading