Skip to content

Commit

Permalink
Live and rendered notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
jashapiro authored and github-actions[bot] committed May 29, 2024
1 parent c83a006 commit 610ff91
Show file tree
Hide file tree
Showing 24 changed files with 1,703 additions and 1,820 deletions.
30 changes: 12 additions & 18 deletions RNA-seq/01-qc_trim_quant.nb.html

Large diffs are not rendered by default.

184 changes: 80 additions & 104 deletions RNA-seq/02-gastric_cancer_tximeta.nb.html

Large diffs are not rendered by default.

119 changes: 63 additions & 56 deletions RNA-seq/03-gastric_cancer_exploratory.nb.html

Large diffs are not rendered by default.

18 changes: 6 additions & 12 deletions RNA-seq/04-nb_cell_line_tximeta.nb.html

Large diffs are not rendered by default.

163 changes: 62 additions & 101 deletions RNA-seq/05-nb_cell_line_DESeq2.nb.html

Large diffs are not rendered by default.

163 changes: 87 additions & 76 deletions RNA-seq/06-openpbta_heatmap.nb.html

Large diffs are not rendered by default.

67 changes: 31 additions & 36 deletions intro-to-R-tidyverse/01-intro_to_base_R.nb.html

Large diffs are not rendered by default.

104 changes: 48 additions & 56 deletions intro-to-R-tidyverse/02-intro_to_ggplot2.nb.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion intro-to-R-tidyverse/03-intro_to_tidyverse-live.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,7 @@ stats_df |>
#### Save to TSV files

Let's write some of the data frames we created to a file.
To do this, we can use the `readr` library of `_write()` functions.
To do this, we can use the `readr` library of `write_()` functions.
The first argument of `write_tsv()` is the data we want to write, and the second argument is a character string that describes the path to the new file we would like to create.
Remember that we created a `results` directory to put our output in, but if we want to save our data to a directory other than our working directory, we need to specify this.
This is what we will use the `file.path()` function for.
Expand Down
98 changes: 37 additions & 61 deletions intro-to-R-tidyverse/03-intro_to_tidyverse.nb.html

Large diffs are not rendered by default.

215 changes: 108 additions & 107 deletions scRNA-seq-advanced/01-read_filter_normalize_scRNA.nb.html

Large diffs are not rendered by default.

317 changes: 123 additions & 194 deletions scRNA-seq-advanced/02-dataset_integration.nb.html

Large diffs are not rendered by default.

192 changes: 95 additions & 97 deletions scRNA-seq-advanced/03-differential_expression.nb.html

Large diffs are not rendered by default.

155 changes: 66 additions & 89 deletions scRNA-seq-advanced/04-overrepresentation_analysis.nb.html

Large diffs are not rendered by default.

130 changes: 65 additions & 65 deletions scRNA-seq-advanced/05-gene_set_enrichment_analysis.nb.html

Large diffs are not rendered by default.

20 changes: 7 additions & 13 deletions scRNA-seq/00-scRNA_introduction.html

Large diffs are not rendered by default.

53 changes: 25 additions & 28 deletions scRNA-seq/01-scRNA_quant_qc.nb.html

Large diffs are not rendered by default.

215 changes: 110 additions & 105 deletions scRNA-seq/02-filtering_scRNA.nb.html

Large diffs are not rendered by default.

42 changes: 36 additions & 6 deletions scRNA-seq/03-normalizing_scRNA-live.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,10 @@ One way to determine whether our normalization yields biologically relevant resu
Because plotting expression for thousands genes together isn't practical, we will reduce the dimensions of our data using Principal Components Analysis (PCA).

We will also make the same plot with our *unnormalized* data, to visualize the effect of normalization on our sample.
We'll do this comparison twice:

- Once coloring the points by their total UMI count
- Once coloring the points based on their cell labels

Before plotting the unnormalized data, we will log transform the raw counts to make their scaling more comparable to the normalized data.
To do this we will use the `log1p()` function, which is specifically designed for the case where we want to add 1 to all of our values before taking the log, as we do here.
Expand All @@ -166,27 +170,29 @@ log_pca <- counts(bladder_sce) |> # get the raw counts
Note that we are using `scater::calculatePCA()` two different ways here: once on the full `bladder_sce` object, and once on just the `counts` matrix.
When we use `calculatePCA()` on the object, it automatically uses the log normalized matrix from inside the object.

Next we will arrange the PCA scores for plotting, adding a column with the cell-type data so we can color each point of the plot.
Next we will arrange the PCA scores for plotting, adding a column for each of the total UMI counts and the cell type labels so we can color each point of the plot.

```{r pca_df}
# Set up the PCA scores for plotting
norm_pca_scores <- data.frame(norm_pca,
geo_accession = rownames(norm_pca),
total_umi = bladder_sce$sum,
cell_type = bladder_sce$cell_ontology_class)
log_pca_scores <- data.frame(log_pca,
geo_accession = rownames(log_pca),
total_umi = bladder_sce$sum,
cell_type = bladder_sce$cell_ontology_class)
```

Now we will plot the unnormalized PCA scores with their cell labels:
First, we will plot the unnormalized PCA scores with their total UMI counts:

```{r pca_plot}
# Now plot counts pca
ggplot(log_pca_scores, aes(x = PC1, y = PC2, color = cell_type)) +
ggplot(log_pca_scores, aes(x = PC1, y = PC2, color = total_umi)) +
geom_point() +
labs(title = "Log counts (unnormalized) PCA scores",
color = "Cell Type") +
scale_color_brewer(palette = "Dark2", na.value = "grey70") + # add a visually distinct color palette
color = "Total UMI count") +
scale_color_viridis_c() +
theme_bw()
```

Expand All @@ -200,7 +206,31 @@ Let's plot the `norm_pca_scores` data:
```

Do you see an effect from the normalization in the comparison between these plots?
Do you see an effect from the normalization when comparing these plots?



Now, let's plot these two sets of PCA scores again, but colored by cell type.
Do you see an effect from the normalization when comparing these plots?

```{r celltype_pca_plots}
# First, plot the normalized pca
ggplot(norm_pca_scores, aes(x = PC1, y = PC2, color = cell_type)) +
geom_point() +
labs(title = "Normalized log counts PCA scores",
color = "Cell Type") +
scale_color_brewer(palette = "Dark2", na.value = "grey70") + # add a visually distinct color palette
theme_bw()
# Next, plot log count pca
ggplot(log_pca_scores, aes(x = PC1, y = PC2, color = cell_type)) +
geom_point() +
labs(title = "Log counts (unnormalized) PCA scores",
color = "Cell Type") +
scale_color_brewer(palette = "Dark2", na.value = "grey70") + # add a visually distinct color palette
theme_bw()
```



## Save the normalized data to tsv file
Expand Down
230 changes: 138 additions & 92 deletions scRNA-seq/03-normalizing_scRNA.nb.html

Large diffs are not rendered by default.

42 changes: 41 additions & 1 deletion scRNA-seq/04-dimension_reduction_scRNA-live.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -206,13 +206,53 @@ dim(filtered_sce)
Now we will perform the same normalization steps we did in a previous dataset, using `scran::computeSumFactors()` and `scater::logNormCounts()`.
You might recall that there is a bit of randomness in some of these calculations, so we should be sure to have used `set.seed()` earlier in the notebook for reproducibility.

```{r normalize}
```{r sumfactors}
# Cluster similar cells
qclust <- scran::quickCluster(filtered_sce)
# Compute sum factors for each cell cluster grouping.
filtered_sce <- scran::computeSumFactors(filtered_sce, clusters = qclust, positive = FALSE)
```

It turns out in this case we end up with some negative size factors.
This is usually an indication that our filtering was not stringent enough, and there remain a number of cells or genes with nearly zero counts.
This probably happened when we removed the infrequently-expressed genes; cells which had high counts from those particular genes (and few others) could have had their total counts dramatically reduced.

To account for this, we will recalculate the per-cell stats and filter out low counts.
Unfortunately, to do this, we need to first remove the previously calculated statistics, which we will do by setting them to `NULL`.

```{r reQC}
# remove previous calculations
filtered_sce$sum <- NULL
filtered_sce$detected <- NULL
filtered_sce$total <- NULL
filtered_sce$subsets_mito_sum <- NULL
filtered_sce$subsets_mito_detected <- NULL
filtered_sce$subsets_mito_sum <- NULL
# recalculate cell stats
filtered_sce <- scater::addPerCellQC(filtered_sce, subsets = list(mito = mito_genes))
# print the number of cells with fewer than 500 UMIs
sum(filtered_sce$sum < 500)
```

Now we can filter again.
In this case, we will keep cells with at least 500 UMIs after removing the lowly expressed genes.
Then we will redo the size factor calculation, hopefully with no more warnings.


```{r refilter}
filtered_sce <- filtered_sce[, filtered_sce$sum >= 500]
qclust <- scran::quickCluster(filtered_sce)
filtered_sce <- scran::computeSumFactors(filtered_sce, clusters = qclust)
```

Looks good! Now we'll do the normalization.

```{r normalize}
# Normalize and log transform.
normalized_sce <- scater::logNormCounts(filtered_sce)
```
Expand Down
296 changes: 177 additions & 119 deletions scRNA-seq/04-dimension_reduction_scRNA.nb.html

Large diffs are not rendered by default.

425 changes: 173 additions & 252 deletions scRNA-seq/05-clustering_markers_scRNA.nb.html

Large diffs are not rendered by default.

243 changes: 112 additions & 131 deletions scRNA-seq/06-celltype_annotation.nb.html

Large diffs are not rendered by default.

0 comments on commit 610ff91

Please sign in to comment.