This is forked from markgene/chipseq, which itself is forked from nf-core/chipseq configed for running on SJ HPC. This repo focuses on CUT&RUN instead of regular ChIP-seq.
markgene/cutnrun is a bioinformatics analysis pipeline used for CUT&RUN data.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
- Raw read QC (
FastQC
) - Adapter trimming (
Trim Galore!
) - Alignment (
Bowtie2
) - Mark duplicates (
picard
) - Merge alignments from multiple libraries of the same sample (
picard
)- Re-mark duplicates (
picard
) - Filtering to remove:
- reads mapping to blacklisted regions (
SAMtools
,BEDTools
) - reads that are marked as duplicates (
SAMtools
) - reads that arent marked as primary alignments (
SAMtools
) - reads that are unmapped (
SAMtools
) - reads that map to multiple locations (
SAMtools
) - reads containing > 4 mismatches (
BAMTools
) - reads that have an insert size > 2kb (
BAMTools
; paired-end only) - reads that map to different chromosomes (
Pysam
; paired-end only) - reads that arent in FR orientation (
Pysam
; paired-end only) - reads where only one read of the pair fails the above criteria (
Pysam
; paired-end only)
- reads mapping to blacklisted regions (
- Alignment-level QC and estimation of library complexity (
picard
,Preseq
) - Create normalised bigWig files scaled to 1 million mapped reads (
BEDTools
,bedGraphToBigWig
) - Generate gene-body meta-profile from bigWig files (
deepTools
) - Calculate genome-wide IP enrichment relative to control (
deepTools
) - Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC (
phantompeakqualtools
) - Call broad/narrow peaks (
MACS2
) - Annotate peaks relative to gene features (
HOMER
) - Create consensus peakset across all samples and create tabular file to aid in the filtering of the data (
BEDTools
) - Count reads in consensus peaks (
featureCounts
) - Differential binding analysis, PCA and clustering (
R
,DESeq2
)
- Re-mark duplicates (
- Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation (
IGV
). - Present QC for raw read, alignment, peak-calling and differential binding results (
MultiQC
,R
)
i. Install nextflow
ii. Install one of docker
, singularity
or conda
iii. Download the pipeline and test it on a minimal dataset with a single command
nextflow run markgene/cutnrun -profile test,<docker/singularity/conda/institute>
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile institute
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment.
iv. Start running your own analysis!
nextflow run nf-core/chipseq -profile <docker/singularity/conda/institute> --input design.csv --genome GRCh37
See usage docs for all of the available options when running the pipeline.
The markgene/cutnrun pipeline comes with documentation about the pipeline, found in the docs/
directory:
- Installation
- Pipeline configuration
- Running the pipeline
- Output and how to interpret the results
- Troubleshooting
The workflow was originally forked from nf-core/chipseq. I modify the codes to make it fit better for CUT&RUN data.