-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathhome_content.Rmd
150 lines (95 loc) · 7.31 KB
/
home_content.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
title: "Content"
output:
bookdown::html_document2:
highlight: textmate
toc: false
toc_float:
collapsed: true
smooth_scroll: true
print: false
toc_depth: 4
number_sections: false
df_print: default
code_folding: none
self_contained: false
keep_md: false
encoding: 'UTF-8'
css: "assets/lab.css"
include:
after_body: assets/footer-lab.html
---
```{r,child="assets/header-lab.Rmd"}
```
This page contains links to different lectures (slides) and practical exercises (labs) that are part of this workshop. The links below are similar to that under [Schedule](home_schedule.html) but organised here by topic.
<div class="boxy boxy-lightbulb">
Input code blocks are displayed like shown below. The code language is displayed above the block. Shell scripts (**SH**) are to be executed in the linux terminal. **R** scripts are to be run in R either through the terminal, RGui or RStudio.
```{sh,eval=FALSE,chunk.title=TRUE}
command
```
<i class="fas fa-exclamation-circle"></i> Note <i class="fas fa-lightbulb"></i> Tip <i class="fas fa-comments"></i> Discuss <i class="fas fa-clipboard-list"></i> Task
</div>
<br>
# Introduction
A brief introduction to the world of RNA and RNA-Seq technology.
* [Introduction to RNA (Slide)](slide_rna_intro.pdf)
* [Introduction to RNA-Seq (Slide)](slide_rnaseq_intro.pdf)
A primer to working on the unix/linux command line and working on a remote computing cluster Uppmax.
* [Introduction to Linux & Uppmax (Slide)](slide_uppmax_intro.pdf)
* [Working on Uppmax (Lab)](lab_uppmax_intro.html)
RNA-Seq analyses is usually carried out in R programming language and it will be useful to learn some basic R.
* [Introduction to R (Lab)](lab_r.html)
This topic covers retrieving supporting data needed for RNA-seq analyses. These include gene annotation IDs such as mapping between Ensembl IDs and Gene IDs, GO terms and transcript IDs. We also cover retrieving genomic data from Ensembl.
* [Downloading data (Lab)](lab_download.html)
# Main lab
## Data
In most of the exercises, we will use RNA-seq data (Illumina short reads) from the dataset [GSE131032](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE131032) [Czarnewski et al (2019) *Nat Comm*](https://www.nature.com/articles/s41467-019-10769-x).
*To elucidated through an unbiased manner which genes and pathways are differentially regulated during mouse colonic inflammation followed by a tissue regeneration phase. In particular, we took advantage of the widely used dextran sodium sulfate (DSS)-induced model of colitis. This model is one of the few characterized by a phase of damage followed by a phase of regeneration. Therefore, this model gave the possibility to identify also sets of genes essential in the regeneration phase, a key step towards the resolution of the inflammation. In short, mice were exposed to DSS in the drinking water for 7 days, then allowed to recover for the following 7 days. During this period, we collected colonic tissue samples every second day to then be analyzed by RNA sequencing (RNA-seq). Next, we performed a RNA-seq analysis from colonic samples throughout the experiment and computed differentially expressed genes (DEGs) taking the complete kinetics of expression into consideration for p-value estimation using EdgeR.*
For this course, downsampled FASTQ files from this dataset from 2 experimental groups (day00 and day07, 3 samples each) will be used in the labs on read mapping, transcript assembly, visualization, quality control and differential expression. There are many relevant questions that could be asked based on these measurements, from several quality checks to understanding biological insignts.
## Quality control
Before doing any other analysis on mapped RNA-seq reads it is always important to do quality control of your mapped reads and that you do not have any obvious errors in your RNA-seq data.
* [QC of raw reads (Slide)](slide_after_sequencing_qc.pdf)
* [Short read quality control (Lab)](lab_qc_beforemapping.html)
## Mapping
This section contains information on how to map reads to a reference genome using splice-aware aligner STAR and HISAT2.
* [RNA-Seq aligners (Slide)](slide_rnaseq_aligners.pdf)
* [Mapping reads using STAR (Lab)](lab_mapping.html)
## Post-alignment QC
After alignment, the BAM files are inspected for various alignment metrics. Some of these include the number of reads mapped, number of unmapped reads, regions in the reference that reads map to, gene body coverage, signs of DNA contamination etc.
* [QC of alignments (Slide)](slide_after_mapping_qc.pdf)
* [Post-alignment quality control (Lab)](lab_qc_aftermapping.html)
BAM files are optionally visualised using integrated genome viewer.
* [Using IGV (Lab)](lab_igv.html)
## Quantification
Gene counts are quantified from BAM files using featureCounts.
* [Quantification (Slide)](slide_quantification.pdf)
* [Quantification (Lab)](lab_quantification.html)
## Filtering & Normalisation
* [Filtering & Normalisation (Slide)](slide_preprocessing.html)
* [Filtering & Normalisation (Lab)](lab_preprocessing.html)
## Exploratory data analyses
Before commencing any quantitative analyses, it is important to run some exploratory analyses to access similarity between samples. This is a vital step to identify mislabelled samples, poor-quality samples and/or replicates that differ considerably. This section dives deeper into exploratory analyses PCA and hierarchical clustering.
* [Principal component analysis (Slide)](slide_pca.pdf)
* [Cluster analysis (Slide)](slide_clustering.pdf)
* [Exploratory data analyses (Lab)](lab_eda.html)
## Differential gene expression
We find genes that are differentially expressed between our time points.
* [Differential gene expression (Slide)](slide_dge.html)
* [DGE using DESeq2 (Lab)](lab_dge.html)
## Functional analysis
We will perform functional analysis on the differentially expressed genes to place them into a function context and possibly explain the biological consequences of DE. Methods covered are GSA (Gene set analysis) and GSEA (Gene set enrichment analysis).
* [Functional analyses (Slide)](slide_functional.pdf)
* [Functional analysis (Lab)](lab_functional.html)
# Bonus labs
## Pseudoaligners
This is an alternative step INSTEAD of mapping, PA-QC and quantification. Kallisto uses FastQ reads and a reference transcriptome (cDNA+ncRNA) to quantify transcripts using rapid pseudo-alignment along with bootstrap replicates to assess quantification inaccuracy. Kallisto is significantly faster than STAR or HISAT2 and has a small memory footprint. Kallisto generates transcript counts. Differential transcript expression is carried out using Sleuth which utilises bootstrap replicates.
* [Pseudoaligners (Slide)](slide_pseudoaligners.pdf)
* [Mapping and quantification using Kallisto (Lab)](lab_kallisto.html)
## small RNA analyses
RNA-seq differential analyses workflow on microRNAs from Fruit fly.
* [Small RNA-seq analyses (Lab)](lab_smallrna.html)
## Assembly & annotation
Raw sequencing short reads are assembled into transcripts using two approaches. Genome-guided assembly using HiSat2 and StringTie. De-novo transcriptome assembly using Trinity. Assembled transcriptomes are functionally annotated to identify genes.
* [Transcriptome assembly (Lab)](lab_assembly.html)
* [Transcriptome annotation (Lab)](lab_annotation.html)
***