June-Koo Lee, Seongyeol Park, Hansol Park, Kijong Yi, Yohan An, Jeonghwan Youk, Junehawk Lee, Tae Min Kim, In Kyu Park, Chang Hyun Kang, Peter J Park, Young Seok Ju, and Young Tae Kim
Detailed instructions for the step-by-step process are described in Supplemental information.
We prepared paired tumor BAMs and normal BAMs. (e.g. A.tumor.bam, A.normal.bam). BAM files were aligned to the human genome reference (GRCh37 without the “chr” prefix in the contig name) by BWA-mem.
We ran Delly v0.7.6 (structural variation caller) with command as below.
delly call –t DEL –q 15 –n –o <Output file name> -g <Reference fasta> <Tumor BAM> <Normal BAM>
delly call –t <DUP, INV, or TRA> -q 15 –o <Output file name> -g <Reference fasta> <Tumor BAM> <Normal BAM>
Example of command line:
delly call –t DEL –q 15 –n –o test.DEL.bcf –g ref.fa test.tumor.bam test.normal.bam
delly call –t DUP –q 15 –o test.DUP.bcf –g ref.fa test.tumor.bam test.normal.bam
delly call –t INV –q 15 –o test.INV.bcf –g ref.fa test.tumor.bam test.normal.bam
delly call –t TRA –q 15 –o test.TRA.bcf –g ref.fa test.tumor.bam test.normal.bam
Example of output file name:
test.DEL.bcf, test.DUP.bcf, test.INV.bcf, test.TRA.bcf
To allow the next steps to be easily implemented, we merged multiple BCF output files into a VCF file for each sample.
bcftools concat –a –O v –o <Output.vcf> <Delly BCFs (output of step 2)>
Example of command line:
bcftools concat –a –O v –o test.delly.vcf test.DEL.bcf test.DUP.bcf test.INV.bcf test.TRA.bcf
Example of output file name:
We made a panel of normal (PON) file by merging multiple Delly VCFs.
python Making_PON_Delly.py <Input text>
Format of Input text:
Tab-delimited Delly VCF(output of step 3) list
ID1 /path/to/A.vcf
ID2 /path/to/B.vcf
ID3 /path/to/C.vcf
Example of command line:
python Making_PON_Delly.py Delly_VCFs_list.txt
Example of output file name:
To distinguish true positives from false positives in the next filtering step, we used in-house scripts to annotate multiple columns to the Delly VCFs. These processes were done by a shell script running a series of in-house Python scripts. The individual steps of the process are described below.
sh Delly_annotation.sh <Input Delly VCF (output of step 3)> <column number of tumor in Input Delly VCF (10 or 11)> <Tumor BAM> <Normal BAM> <PON file> <DIR of SV_annot_scripts>
Example of command line:
sh Delly_annotation.sh test.delly.vcf 10 test.tumor.bam test.normal.bam PON.delly.txt /path/to/Delly_annotation_scripts
Example of output file name:
python annotated_SV_filter.py <Input SV file (output of step 5)>
Example of command line:
python annotated_SV_filter.py test.delly.vcf.somatic.annotated
Example of output file name:
python annotated_SV_BPadd_edit.py <Input SV file (output of step 6)> <Normal BAM>
Example of command line:
python annotated_SV_BPadd_edit.py test.delly.vcf.somatic.annotated.fi test.normal.bam
Example of output file name:
sh SV_clustering.sh <Input SV file (output of step 7 or 8)> <DIR of SV_cluster_scripts)>
Example of command line:
sh SV_clustering.sh test.delly.vcf.somatic.annotated.fi.BPedit /path/to/SV_clustering_scripts
Example of output file name:
sh Get_100kb_absCN.sh <Tumor pileup file> <Normal pileup file> <Cellularity> <Ploidy> <gender (XX or XY)> <DIR of Calc_absCN_scripts>
Example of command line:
sh Get_100kb_absCN.sh test.tumor.pileup.gz test.normal.pileup.gz 0.62 3.6 XY /path/to/Calc_absCN_scripts
Example of output file:
sh Classify_complexSV.sh <input SV file (output of step 9)> <100kb bin AbsCN file (output of step 10-1)> <chrCN file (output of step 10-1)> <DIR of scripts> <reference fasta index file>
Example of command line:
sh Classify_complexSV.sh test.delly.vcf.somatic.annotated.fi.BPedit.clustered test.tumor.pileup.100kbcov.absCN.gen_fi test.tumor.pileup.100kbcov.absCN.gen_fi.chrCN /path/to/Classify_complexSV_scripts ref.fa.fai
Example of output file name:
sh Merge_SV_CNV.sh <CNV segments.txt (output of Sequenza)> <Input SV file (output of step 8)> <DIR of scripts>
Example of command line:
sh Merge_SV_CNV.sh test.segments.txt test.delly.vcf.somatic.annotated.fi.BPedit
Example of output file name:
Input SNV file should have columns as below.
1st: chromosome without “chr” prefix
2nd: position
9th: reference read count
10th: altered read count
sh Calc_early_SNV.sh <CNV segments.txt (output of Sequenza)> <Input SNV file> <Input SV file (output of step 9)> <purity> <DIR of scripts> <sample id> <DIR of output>