Skip to content

KoppesEA/PRJEB6698_dnmt1tetoff_datamining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PREJB6698_reanalysis

Summary:

This project is a (re)-analysis of a reduced-representation bisulfite sequencing (RRBS) dataset from DNMT1-TET-OFF mESCs generated by my PhD thesis mentor J.R. Chaillet (U.Pitt) and initially analyzed by S. McGraw and J. Trasler (McGill University). 1.https://pubmed.ncbi.nlm.nih.gov/25578964/ 2.https://www.ebi.ac.uk/ena/browser/view/PRJEB6698

Data Mining Aquisition:

The McGraw et al. RRBS data is found on the European Nucleotide Archive (ENA) Record PRJEB6698.

  1. Download read file TSV report directly from ENA
  2. Use Awk one-liner to extract R1 and R2 FTP locations for 10 samples
cat filereport_read_run_PRJEB6698.tsv | awk -F"\t" '{print $7}' | awk -F";" -v OFS="\n" 'NR>1 {print $1, $2}' > PRJEB6698_ACC_List.txt
  1. Using the wget bash script PRJEB6698_ERR_wgetdownload.bash download corresponding _R1 and _R2 fastq records, check out log to make sure no errors, should add checksum feature
  2. Download read experiment XML
  3. Use '.bash' or the below command to extract sample metadata
cat ena_PRJEB6698_read_experiment.xml | \
grep -o -E ">SAMEA\d{7}<|>ERR\d{6}<|refname\=\"[dN].*>" | \
tr -d '>|<' | tr '\n' ' ' | \
awk -v OFS="\t" ' {print $3, $2, $1, "\n", $6, $5, $4, "\n", $9, $8, $7, "\n", $12, $11, $10, "\n", $15, $14, $13, "\n", $18, $17, $16, "\n", $21, $20, $19, "\n", $24, $23, $22, "\n", $27, $26, $25, "\n", $30, $29, $28}' | \
sed 's/^\t//' | sed 's/refname=\"//' | sed 's/\"//' > PRJEB6698_metadata.tsv

Bismark Genome Preperation:

  1. Download the GRC38 Mouse Reference Genome to a Reference folder using the commands below:
wget -O CAST_EiJ_v1.dna.toplevel.fa.gz https://ftp.ensembl.org/pub/release-109/fasta/mus_musculus_casteij/dna/Mus_musculus_casteij.CAST_EiJ_v1.dna.toplevel.fa.gz
  1. Prepare bisulfite converted reference genome with CT and GA stranded C-->T deamination transition
module load bismark/0.20.0
bismark_genome_preparation ./

Fastq Trimming and QC

  1. Run PRJEB6698_pretrimqc.bash to perform pre-trim QC
  2. Run PRJEB6698_trimRRBS.bash to perform RRBS trimming and post-trim QC with trimgalore
  3. Run multiqc in both pretrim and trim QC directories:
multiqc --filename "PRJEB6698_multiqc_pretrim_report.html" . &

Bismark BT2 Alignment and Methylation Calls

Note: For RRBS it is not recommended to deduplicate

  1. Run PRJEB6698_Bismark_Align_GRCm39_BT2.bash to align reads to Bisulfite converted genomes
  2. Check BAM files [optional] Bismark BAM output is unsorted and unindexed
samtools view -H *ERR560529_1_val_1_bismark_bt2_pe.bam | grep SO
@HD	VN:1.0	SO:unsorted

Bismark Alignment with Directional (MspI/HpaII CCGG cut) is PE with unstranded (genomic)

wget https://ftp.ensembl.org/pub/release-109/gtf/mus_musculus/Mus_musculus.GRCm39.109.gtf.gz
gunzip -c Mus_musculus.GRCm39.109.gtf.gz > Mus_musculus.GRCm39.109.gtf
module load rseqc/2.6.6
module load bedops/2.4.35
awk '{ if ($0 ~ "transcript_id") print $0; else print $0" transcript_id \"\";"; }' Mus_musculus.GRCm39.109.gtf| gtf2bed - > Mus_musculus.GRCm39.109.bed
infer_experiment.py -r /ix1/mmann/KoppesEA/REF_Sequences/Mus_musculus/GRCm39_ref/Mus_musculus.GRCm39.109.bed -i /ix1/mmann/KoppesEA/PRJEB6698/Bismark/ERR560527_1_val_1_bismark_bt2_pe_sorted.bam
  1. Run PRJEB6698_Bismark_MethExtractor.bash to tabulate methylation fractions for each Cytosine; output with focus on CpG methylation

Differential methylation analysis using methylKit

  1. Run PRJEB6698_methylKit_script.R (Work in progress) to perform differential methylation and annotate CGIs

References and Links:

  1. DNMT1-TETOFF Paper: Transient DNMT1 suppression reveals hidden heritable marks in the genome. 2015. NAR. Serge McGraw, Jacques X Zhang, Mena Farag, Donovan Chan, Maxime Caron, Carolin Konermann, Christopher C Oakes, K Naga Mohan, Christoph Plass, Tomi Pastinen, Guillaume Bourque, J Richard Chaillet, Jacquetta M Trasler.
  2. FastQC: https://github.com/s-andrews/FastQC
  3. TrimGalore: https://github.com/FelixKrueger/TrimGalore
  4. Bismark: https://github.com/FelixKrueger/Bismark
  5. methylKit: https://bioconductor.org/packages/release/bioc/html/methylKit.html

About

Tet-Off DNMT1 RRBS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published