Compiled: September 15, 2020
For this tutorial, we will be using scMLnet to construct the multi-layer signaling network between B cells and Secretory cells from scRNA-Seq data of BALF in COVID-19 patients. The expression matrix and annotation of clstuers can be found in the /example
folder (or be downloaded from Zenodo) and the prior information about interactions in the /database
folder.
We start by loading all required packages. The Seurat
package is used for normalizing the raw scRNA-Seq data, the Matrix
package is used for transformation between matrix and sparse matrix and the parallel
package is used for parallel computation of t.test for screening potentially highly expressed genes.
library(Seurat)
library(Matrix)
library(parallel)
library(scMLnet)
We then read a raw scRNA-Seq data with rows as genes (gene symbols) and columns as cells and the gene expression matrix is required as a sparse matrix. Annotation of cell type consists of two columns: the first column is barcode and the second is cell type. The column number of the gene expression matrix should be consistent with the row number of the annotation table. Numbers can also be used to replace specific cell types in BarCluFile, but with less biological explanation of cell-cell communication between groups.
# import sample data
GCMat <- readRDS("./example/data.Rdata")
GCMat<- as(GCMat,"dgCMatrix")
# import sample annotation
BarCluFile <- "./example/barcodetype.txt"
BarCluTable <- read.table(BarCluFile,sep = "\t",header = TRUE,stringsAsFactors = FALSE)
We next define the receiver cell and sender cell that we want to explorer the cell-cell communication between them. In this example, we focus on the inter-/intracellular signaling network between B cells as senders and Secretory cells as receivers (NOTE: make sure the LigClu
and RecClu
parameters are the values in the annotation table).
types <- unique(BarCluTable$Cluster)
LigClu <- "B cells" #types[4]
RecClu <- "Secretory" #types[8]
Default parameter settings are as follows. Users can provide their own databases which must including three colums: molecule A, molecule B and key (connecting A with B by underlined). Molecules A and B need to have clear identities (i.e., Ligand, Receptor, TF and Target gene), and there should be interactions between them (i.e., Ligand_Receptor, Receptor_TF and TF_Gene interactions).
According to the difference (subjected to pct
parameter) and ratio (subjected to logfc
parameter) of gene expression percentage between the sender cells and receiver cells, we firstly define specific highly expressed genes in each of the two types of cells. The highly expressed genes in sender cells are considered as potential ligands and the highly expressed genes in receiver cells are considered as potential receptors and target genes. We screen the potential Ligand-Receptor interactions by searching the Ligands-Receptor pairs in database. We then screen the potential Receptor-TF, TF-Target gene interactions by Fisher's Exact Test (subjected to pval
parameter).
pval <- 0.05
logfc <- 0.15
LigRecLib <- "./database/LigRec.txt"
TFTarLib <- "./database/TFTargetGene.txt"
RecTFLib <- "./database/RecTF.txt"
Running time of getting highly expressed genes depends on the t-test of gene expression in two types of cells. Doing t-test in parallel can improve the performance of scMLnet (Precondition: the parallel package has been installed). The default cores in parallel is set to the number of available logical cores minus one.
netList <- RunMLnet(GCMat, BarCluFile, RecClu, LigClu,
pval, logfc,
LigRecLib, TFTarLib, RecTFLib)
The output netList
is a list consisiting of gene pairs connecting each upstream layer and downstream layer (i.e., Ligand_Receptor, Receptor_TF and TF_Gene subnetworks). The signaling subnetwork is returned as a dataframe object as the same structure as sample databases.
workdir <- "sample"
DrawMLnet(netList,LigClu,RecClu,workdir,plotMLnet = F)
If you want to visualize the tabular results of multi-layer signaling network, make sure the correct installation of following three python packages: networkx, matplotlib and pymnet and the correct setting of PyHome parameter in DrawMLnet
funtion. The tabular and graphic output of the constructed multi-layer network will be saved in the /output/sample/B_Hepatocytes
folder.
workdir <- "sample"
PyHome <- "D:/Miniconda3/envs/R36/python.exe" #for Window
DrawMLnet(netList,LigClu,RecClu,workdir,PyHome,plotMLnet = T)
In the above section, we only focus one pairs of cell type every time, however we can always remain the same receiver cells and only change sender cells so as to construct the multi-cellular multi-layer signaling networks of receiver cells (the central cells) affected by sender cells (neighbor cells).
# import data
GCMat <- readRDS("./example/data.Rdata")
GCMat<- as(GCMat,"dgCMatrix")
# import annotation
BarCluFile <- "./example/barcodetype.txt"
BarCluTable <- read.table(BarCluFile,sep = "\t",header = TRUE,stringsAsFactors = FALSE)
## get LigClu
LigClus <- unique(BarCluTable$Cluster)
LigClus <- LigClus[-grep("Secretory|Doublets",LigClus)]
## creat MLnet
netList <- list()
for(ligclu in LigClus){
#sender cell and receiver cell
LigClu <- ligclu
RecClu <- "Secretory"
name <- paste(strsplit(LigClu,split = "\\W")[[1]][1],RecClu,sep = "_")
#main
netList[[name]] <- RunMLnet(GCMat,BarCluFile,RecClu,LigClu)
}
## save output and plot MLnet
workdir <- "multi-cellular"
for (name in names(netList)) {
#scMLnet output
MLnetList <- netList[[name]]
print(paste0(name,":"))
#sender cell and receiver cell
LigClu <- strsplit(name,"_")[[1]][1]
RecClu <- strsplit(name,"_")[[1]][2]
#main
PyHome <- "D:/Miniconda3/envs/R36/python.exe" #for Window
DrawMLnet(MLnetList,LigClu,RecClu,workdir,PyHome,plotMLnet = T)
}
The demo of multi-cellular-mediated ACE2 regulation based on the scRNA-seq data of bronchoalveolar lavage fluid (BALF) in COVID-19 patients: