Skip to content

This repo contains codes for single-cell RNA-seq analysis.

Notifications You must be signed in to change notification settings

DamarisDeng/My_SingleCell_Workflow

Repository files navigation

Single-Cell Analysis Script

1. Quality control

Running instruction

Run the script by executing the following command:

python s1_qc.py -p <prefix> [-o <output_directory>] [-m <mitochondrial_genes_percent>] [-s]
  • <prefix>: Prefix of the input file (required).
  • <output_directory>: Output directory for the results (default: "qc").
  • <mitochondrial_genes_percent>: Percent of mitochondrial genes to consider as outliers (default: 15).
  • -s: Flag indicating whether to scale the data (default: False). adata.raw is created in s1*.py after filtering outliers normalization.

Output

The script produces the following outputs:

  • QC Result:
    • File: d1_<prefix>_qced.h5ad
    • Description: Filtered and quality-controlled dataset after outlier removal.
  • Doublet detection
    • powered by package DoubletDetection
  • Normalized Result:
    • File: d2_<prefix>_normlog.h5ad
    • Description: Normalized and log-transformed dataset after QC.
    • Notice: contain adata.raw (include all genes after normalization)
    • Add doublet prediction result and doublet score columns in adata.obs
  • Figures:
    • Location: <output_directory>
    • Description:
      • QC related: total counts histogram, mitochondrial gene percentage violin plot, scatter plot of total counts vs. gene counts, and violin plot of QC metrics.
      • Doublet detection: Doublet heatmap

2. Select hvg

Running Instruction

python s3_bbknn.py -n <number> -p <prefix> [-b <batch_key>]
  • <number>: The number of highly variable genes to be selected (default: 2000).
  • <prefix>: Prefix of the input file (required).
  • <batch_key>: Batch key of the file (default: 'patient').

Transformation

  • sc.pp.pca
  • sc.pp.neighbors
  • sc.tl.umap
  • sc.tl.leiden

Output

The script produces the following outputs:

  • Highly Variable Genes (HVGs) Result:
    • File: d3_<prefix>.h5ad
    • Description: Dataset containing the selected highly variable genes.
  • PCA Result:
    • Description: Principal Component Analysis (PCA) performed on the selected HVGs.
  • UMAP Result:
    • Description: Uniform Manifold Approximation and Projection (UMAP) performed on the PCA results.
    • Figure: UMAP plot showing the clustering results colored by 'leiden' and 'batch_key'.
      • File: before_bbknn.png
      • Location: cluster/
  • UMAP Result (Saved Dataset):
    • File: d4_<suffix>_umap.h5ad
    • Description: Dataset with UMAP coordinates and clustering information.

3. deal with batch effect (bbknn)

Running Instruction

python s3_bbknn.py -p <prefix> -b <batch_key>
  • <prefix>: Prefix of the input file (required).
  • <batch_key>: Batch key of the file (required).

Output

The script produces the following outputs:

  • BBKNN Result:
    • File: d5_<prefix>.h5ad
    • Description: Dataset after performing the BBKNN integration.
  • UMAP Result:
    • Figure: UMAP plot showing the clustering results after BBKNN integration, colored by 'leiden_r2' and 'batch_key'.
      • File: after_bbknn.png
      • Location: cluster/

Note: leiden clustering resolution=2, the result is stored in key leiden_r2


adata.layers["counts"] is also created in s1.py. Data in counts layer is un-normalized and not log-transformed.

I used resolution=2 to run leiden clustering. And the result after bbknn is stored in key leiden_r2.

marker file: should contain at least cell_name and Symbol two columns

9. correlation

Produce boxplot, heatmap and conduct statistical test.

--heatmap:

  1. all: draw one heatmap with all conditions (levels)
  2. sep: draw a separate heatmap for each condition (level)
  3. no: do not draw heatmap
  4. heatmap.csv: provide a custom file for grouping

Format of heatmap.csv (| represents , in csv file):

Tumor I II III
[condition] [states (separated by spaces)]

--filter_sample:

If specified a number, the sample whose total number of cells (of the same cell type) below this threshold will be filtered. We used 15 as threshold.

--test_type:

Choices are 1 (means single-sided test) or 2 (means double sided test). Default option is 2.

About

This repo contains codes for single-cell RNA-seq analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published