Skip to content

Commit

Permalink
added more vignettes
Browse files Browse the repository at this point in the history
  • Loading branch information
ghar1821 committed Jan 19, 2024
1 parent d0c8458 commit 2b7a18d
Show file tree
Hide file tree
Showing 10 changed files with 2,270 additions and 92 deletions.
2 changes: 2 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.R
Binary file added vignettes/data/B2.fcs
Binary file not shown.
1,001 changes: 1,001 additions & 0 deletions vignettes/data/Levine_32dim_H1.csv

Large diffs are not rendered by default.

1,001 changes: 1,001 additions & 0 deletions vignettes/data/Levine_32dim_H2.csv

Large diffs are not rendered by default.

Binary file added vignettes/data/Run3_B2.fcs
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -35,25 +35,23 @@ In that case, please reach out through the github repository by creating a Githu

# Installation

To install SuprCellCyto, we need to use the `devtools` package from CRAN.
You can install `devtools` by using the `install.packages("devtools")` command.
To install SuperCellCyto, we need to use the `remotes`` package from CRAN.
You can install `remotes` by using the `install.packages("remotes")` command.

Thereafter, you can install SuperCellCyto using
`devtools::install_github("phipsonlab/SuperCellCyto")`.
`remotes::install_github("phipsonlab/SuperCellCyto")`.

SuperCellCyto requires the [SuperCell R package](https://github.com/GfellerLab/SuperCell)
installed to run properly.
If you use the `devtools::install_github` command above to install SuperCellCyto,
If you use the `remotes::install_github` command above to install SuperCellCyto,
it should be, in theory, automatically installed.
But in the case it doesn't, you can manually install it by using
`devtools::install_github("GfellerLab/SuperCell")`.
`remotes::install_github("GfellerLab/SuperCell")`.

# Preparing your dataset

The function which creates supercells is called `runSuperCellCyto`, and it
operates on a `data.table` object, an enhanced version of R native `data.frame`.
We may add some support for `SummarizedExperiment` or `flowFrame` object
in the future if there are enough demands for it.

If the raw data is stored in a csv file, we can import it into a `data.table`
object using their `fread` function.
Expand Down
261 changes: 261 additions & 0 deletions vignettes/how_to_prepare_data.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
---
title: "Preparing Data for SuperCellCyto"
author: "Givanna Putri"
output:
rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{how_to_prepare_data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

## Performing Quality Control

Prior to creating supercells, it's crucial to ensure that your dataset has
undergone thorough quality control (QC).
We want to retain only single, live cells and remove any debris, doublets, or dead cells.
Additionally, it is also important to perform compensation to correct for fluorescence spillover
(for Flow data) or to adjust for signal overlap or spillover between different metal isotopse (for Cytof data).
A well-prepared dataset is key to obtaining reliable supercells from SuperCellCyto.

Several R packages are available for performing QC on cytometry data.
Notable among these are [PeacoQC](https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.24501),
[CATALYST](https://bioconductor.org/packages/release/bioc/html/CATALYST.html),
and [CytoExploreR](https://dillonhammill.github.io/CytoExploreR/).
These packages are well maintained and are continuously updated.
To make sure that the information we provide do not quickly go out of date, we highly
recommend you to consult the packages' respective vignettes for detailed guidance on
how to use them to QC your data.

If you prefer using manual gating to do QC, you can also use FlowJo.
For a comprehensive guide on using FlowJo for prepare your data, please read this
[vignette](https://wiki.centenary.org.au/display/SPECTRE/Exporting+data+from+FlowJo+for+analysis+in+Spectre).
The steps taken in that vignette to do QC is perfectly adequate for SuperCellCyto.

In our manuscript, we used `CytoExploreR` to QC the `Oetjen_bcell` flow cytometry data
and `CATALYST` to QC the `Trussart_cytofruv` Cytof data.

The specific scripts used can be found in our [Github repository](https://github.com/phipsonlab/SuperCellCyto-analysis/tree/master/code):

1. `b_cell_identification/gate_flow_data.R` for `Oetjen_bcell` data.
2. `batch_correction/prepare_data.R` for `Trussart_cytofruv` data.
These scripts were adapted from those used in the [CytofRUV manuscript](https://elifesciences.org/articles/59630).

For Oetjen_bcell data, we used the following gating strategy post compensation:

1. FSC-H and FSC-A to isolate only the single events. (Also check SSC-H vs SSC-A).
2. FSC-A and SSC-A to remove debris.
3. Live/Dead and SSC-A to isolate live cells.

The following is the resulting single live cells manually gated for the `Oetjen_bcell` data.

```{r}
knitr::include_graphics("figures/oetjen_bcell_single_live_cells.png", error = FALSE)
```

After completing the QC process, you will have clean data in either CSV or FCS file formats.
The next section will guide you on how to load these files and proceed with preparing your data for SuperCellCyto.

## Preparing FCS/CSV files for SuperCellCyto

To use SuperCellCyto, your input data must be formatted as a [data.table](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) object.
Briefly, `data.table` is an enhanced version of R native `data.frame` object.
It is a package that offers fast processing of large `data.frame`.

Additionally, each cell in your `data.table` must also have a unique identifier
and be associated with a sample, typically the biological sample it came from.

### Preparing CSV files

Loading CSV files into a `data.table` object is straightforward.
We can use the `fread` function from the `data.table` package.

Here's how to install it:

```{r eval=FALSE}
install.packages("data.table")
```

For this example, let's load two CSV files containing subsampled data from the
`Levine_32dim` dataset we used in SuperCellCyto manuscript.
Each file represents a sample (H1 and H2), with the sample name appended to the file name:

```{r}
library(data.table)
csv_files <- c("data/Levine_32dim_H1.csv", "data/Levine_32dim_H2.csv")
samples <- c("H1", "H2")
dat <- lapply(seq(1: length(samples)), function(i) {
csv_file <- csv_files[i]
sample <- samples[i]
dat_a_sample <- fread(csv_file)
dat_a_sample$sample <- sample
return(dat_a_sample)
})
dat <- rbindlist(dat)
dat[, cell_id := paste0("Cell_", seq(1: nrow(dat)))]
head(dat)
```

Let's break down what we have done:

1. We specify the location of the csv files in `csv_files` vector
and their corresponding sample names in `samples` vector.
`data/Levine_32dim_H1.csv` belongs to sample H1 while `data/Levine_32dim_H2.csv`
belongs to sample H2.
2. We use `lapply` to simultaneously iterate over each element in the `csv_files` and `samples` vector.
For each csv file and the corresponding sample, we read the csv file into the variable
`dat_a_sample` using `fread` function.
We then assign the sample id in a new column called `sample`.
As a result, we get a list `dat` containing 2 `data.table` objects, 1 object per csv file.
3. We use `rbindlist` function from the `data.table` package to merge list into one `data.table` object.
4. We create a new column `cell_id` which gives each cell a unique id such as `Cell_1`,
`Cell_2`, etc. up until `Cell_2000` (we have in total 2,000 cells).

### Preparing FCS files

FCS files, commonly used in cytometry, require specific handling.
The [Spectre](https://github.com/immuneDynamics/Spectre/) package is an excellent tool for this purpose.

You can install Spectre using the remotes package:

```{r eval=FALSE}
install.packages("remotes")
remotes::install_github("immunedynamics/Spectre")
```

Let's load two FCS files from the `Trussart_cytofruv` dataset we used
in our manuscript.

```{r}
library(Spectre)
dat_list <- read.files(file.loc = "data", file.type = ".fcs")
class(dat_list)
names(dat_list)
class(dat_list$B2)
head(dat_list$B2)
```

Spectre's `read.files` reads FCS files into a list of `data.table` objects,
one for each file.
For each `data.table` object, it will also add a column `FileName` denoting the name of the file
the cell come from, which we can then use add sample information for the cells.

For this dataset, both FCS files contain data belonging to a patient (let's call him B2).
The file with name "B2" contains the patient's sample quantified in batch 1, and
the file with name "Run3_B2" contains the patient's sample quantified in batch 2.
Let's name these samples "B2_batch1" and "B2_batch2".

Now, let's merge `data.table` objects and add sample information.
We will use `rbindlist` to merge the list into one `data.table` object and then add
the sample information as a column.
The latter is done by first creating a new `data.table` object containing the mapping of
FileName and the sample name, and then using `merge.data.table` to add them into our `data.table` object.

```{r}
dat_cytof <- rbindlist(dat_list)
sample_info <- data.table(
sample = c("B2_batch1", "B2_batch2"),
filename = c("B2", "Run3_B2")
)
dat_with_sample_info <- merge.data.table(
x = dat_cytof,
y = sample_info,
by.x = "FileName",
by.y = "filename"
)
head(dat_with_sample_info)
```

We now should have the sample id for each cell.

Now we need to create a new column `cell_id` which gives each cell a unique id such as `Cell_1`,
`Cell_2`, etc.

```{r}
dat_with_sample_info[, cell_id := paste0("Cell_", seq(1: nrow(dat_with_sample_info)))]
head(dat_with_sample_info)
```

With CSV and FCS files loaded as data.table objects, the next step is to transform
the data appropriately for SuperCellCyto.

## Data Transformation

Before using SuperCellCyto, it's essential to apply appropriate data transformations.
These transformations are crucial for accurate analysis, as explained in this
[article on data transformation](https://wiki.centenary.org.au/display/SPECTRE/Data+transformation).

**Note**: If you have completed the QC process as outlined
[here](https://wiki.centenary.org.au/display/SPECTRE/Exporting+data+from+FlowJo+for+analysis+in+Spectre)
and have CSV files exported from FlowJo, you can proceed directly to the next vignette on
how to create supercells.
For more details on different file types (FCS, CSV scale, and CSV channel value), refer to this
[guide](https://wiki.centenary.org.au/display/SPECTRE/Exporting+data+from+FlowJo+for+analysis+in+Spectre).

A common method for data transformation in cytometry is the arcsinh transformation,
an [inverse hyperbolic arcsinh transformation](https://mathworld.wolfram.com/InverseHyperbolicSine.html).
The transformation requires specifying a cofactor, which affects the representation of the low-end data.
Typically, a cofactor of 5 is used for Cytof data and 150 for Flow data.
This vignette will focus on the transformation process rather than cofactor selection.
For more in-depth information on choosing a cofactor, read this detailed
[article](https://wiki.centenary.org.au/display/SPECTRE/Data+transformation).

We'll use the `Levine_32dim` dataset loaded earlier from CSV files:

```{r}
head(dat)
```

First, we need to select the markers to be transformed.
Usually, all markers should be transformed for SuperCellCyto.
However, you can choose to exclude specific markers if needed:

```{r}
markers_to_transform <- c("CD45RA","CD133","CD19","CD22","CD11b","CD4",
"CD8","CD34","Flt3","CD20","CXCR4","CD235ab",
"CD45","CD123","CD321","CD14","CD33","CD47","CD11c",
"CD7","CD15","CD16","CD44","CD38","CD13","CD3","CD61",
"CD117","CD49d","HLA-DR","CD64","CD41")
```

For transformation, we'll use a cofactor of 5 and apply the arcsinh transformation
using the Spectre package.
If Spectre isn't installed, use:

```{r eval=FALSE}
install.packages("remotes")
remotes::install_github("immunedynamics/Spectre")
```

Perform the transformation:

```{r}
dat <- do.asinh(dat, markers_to_transform, cofactor = 5)
head(dat)
```

After transformation, new columns with "_asinh" appended indicate the transformed markers.

With your data now transformed, you're ready to create supercells using SuperCellCyto.
Please refer to our dedicated vignette for detailed instructions.





85 changes: 0 additions & 85 deletions vignettes/how_to_supercell.R

This file was deleted.

Binary file removed vignettes/supercellcyto.png
Binary file not shown.

0 comments on commit 2b7a18d

Please sign in to comment.