Skip to content

Commit

Permalink
Merge pull request #4 from icgc-argo-workflows/update_stage_input_fix
Browse files Browse the repository at this point in the history
Update stage input fix
  • Loading branch information
lindaxiang authored Apr 23, 2024
2 parents 86ac48b + ff75df8 commit 0aa5596
Show file tree
Hide file tree
Showing 80 changed files with 2,610 additions and 311 deletions.
30 changes: 21 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The workflow has adopted [nf-core](https://nf-co.re/) framework and best practic

2. Install [`Docker`](https://docs.docker.com/engine/installation/).

3. Stage the required [reference files](#references)
3. Stage the required [reference files](#references)

## Quick start
1. Test the workflow running in `Local` mode on a minimal dataset with a single command:
Expand Down Expand Up @@ -55,27 +55,39 @@ Depending on where the input data are coming from and output data are sending to
- Reference genome:
- GRCh38 reference genome fasta file. The file can be downloaded by:
```bash
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/reference-genome/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa
wget https://swengbioinfo.blob.core.windows.net/genomics-public-data/reference-genome/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa
```

- GRCh38 reference genome fasta index file. The file can be downloaded by:
```bash
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/reference-genome/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa.fai
wget https://swengbioinfo.blob.core.windows.net/genomics-public-data/reference-genome/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.fa.fai
```

- GRCh38 reference genome sequence dictionary file. The file can be downloaded by:
```bash
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/reference-genome/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.dict
wget https://swengbioinfo.blob.core.windows.net/genomics-public-data/reference-genome/GRCh38_hla_decoy_ebv/GRCh38_hla_decoy_ebv.dict
```
- GATK resources:
- `germline_resource` and index files. The files can be downloaded by:
```bash
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/gatk-resources/af-only-gnomad.pass-only.biallelic.snp.hg38.vcf.gz
wget https://object.cancercollaboratory.org:9080/swift/v1/genomics-public-data/gatk-resources/af-only-gnomad.pass-only.biallelic.snp.hg38.vcf.gz.tbi
wget https://swengbioinfo.blob.core.windows.net/genomics-public-data/gatk-resources/af-only-gnomad.pass-only.biallelic.snp.hg38.vcf.gz
wget https://swengbioinfo.blob.core.windows.net/genomics-public-data/gatk-resources/af-only-gnomad.pass-only.biallelic.snp.hg38.vcf.gz.tbi
```
- Autosome non-gap regions
- `autosome_non_gap` bed file was downloaded from [NPM-sample-qc](https://raw.githubusercontent.com/c-BIG/NPM-sample-qc/master/resources/autosomes_non_gap_regions.bed) and staged under project folder [assets](https://github.com/icgc-argo-workflows/dnaalnqc/tree/main/assets)

> **NOTE**
> Please stage the reference files into the reference directory <REFERENCE_BASE> with the following folder structure
```bash
<REFERENCE_BASE>
├── GRCh38_hla_decoy_ebv.dict
├── GRCh38_hla_decoy_ebv.fa
├── GRCh38_hla_decoy_ebv.fa.fai
├── gatk_resource
│   ├── af-only-gnomad.pass-only.biallelic.snp.hg38.vcf.gz
│   └── af-only-gnomad.pass-only.biallelic.snp.hg38.vcf.gz.tbi
```


### Inputs
#### Local mode
Expand All @@ -84,9 +96,9 @@ First, prepare a sample sheet with your input data that looks as following examp
`sample_sheet.csv`:

```csv
sample,bam_cram,patient(optional),status(optional),sex(optional)
CONTROL_REP1_SAMPLE0,CONTROL_REP_0.bam,CONTROL_REP1_DONOR,0,XX
CONTROL_REP1_SAMPLE1,CONTROL_REP_1.bam,CONTROL_REP1_DONOR,1,XX
sample,bam_cram,bai_crai(optional),patient(optional),status(optional),sex(optional)
CONTROL_REP1_SAMPLE0,CONTROL_REP_0.bam,CONTROL_REP_0.bam.bai,CONTROL_REP1_DONOR,0,XX
CONTROL_REP1_SAMPLE1,CONTROL_REP_1.bam,CONTROL_REP_1.bam,bai,CONTROL_REP1_DONOR,1,XX
```

Each row represents an aligned BAM or CRAM from a sample.
Expand Down
6 changes: 3 additions & 3 deletions assets/tests/sample_sheet.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
patient,status,sample,bam_cram
test,0,cfa409d0-3236-5f07-8634-a2c0de74c8f2,https://raw.githubusercontent.com/icgc-argo-workflows/gatk-mutect2-variant-calling/main/tests/data/HCC1143_BL-mini-N/cfa409d0-3236-5f07-8634-a2c0de74c8f2.5.20190927.wgs.grch38.bam
test,1,8f879c15-14da-593d-bb76-db866f81ab3a,https://raw.githubusercontent.com/icgc-argo-workflows/gatk-mutect2-variant-calling/main/tests/data/HCC1143-mini-T/8f879c15-14da-593d-bb76-db866f81ab3a.6.20190927.wgs.grch38.bam
patient,status,sample,bam_cram,bai_crai
test,0,cfa409d0-3236-5f07-8634-a2c0de74c8f2,https://raw.githubusercontent.com/icgc-argo-workflows/gatk-mutect2-variant-calling/main/tests/data/HCC1143_BL-mini-N/cfa409d0-3236-5f07-8634-a2c0de74c8f2.5.20190927.wgs.grch38.bam,https://raw.githubusercontent.com/icgc-argo-workflows/gatk-mutect2-variant-calling/main/tests/data/HCC1143_BL-mini-N/cfa409d0-3236-5f07-8634-a2c0de74c8f2.5.20190927.wgs.grch38.bam.bai
test,1,8f879c15-14da-593d-bb76-db866f81ab3a,https://raw.githubusercontent.com/icgc-argo-workflows/gatk-mutect2-variant-calling/main/tests/data/HCC1143-mini-T/8f879c15-14da-593d-bb76-db866f81ab3a.6.20190927.wgs.grch38.bam,https://raw.githubusercontent.com/icgc-argo-workflows/gatk-mutect2-variant-calling/main/tests/data/HCC1143-mini-T/8f879c15-14da-593d-bb76-db866f81ab3a.6.20190927.wgs.grch38.bam.bai
6 changes: 6 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ process {
enabled: params.outdir ? true : false
]

withName: 'SONG.*|SCORE.*' {
ext.prefix = ""
ext.api_download_token = params.api_download_token ?: params.api_token
ext.api_upload_token = params.api_upload_token ?: params.api_token
}

withName: CUSTOM_DUMPSOFTWAREVERSIONS {
publishDir = [
path: { "${params.outdir}/pipeline_info" },
Expand Down
14 changes: 7 additions & 7 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ params {
config_profile_description = 'Minimal test dataset to check workflow function'

// Input data
input = "assets/tests/sample_sheet.csv"
input = "${projectDir}/assets/tests/sample_sheet.csv"
local_mode = true
fasta = "assets/tests/reference/tiny-grch38-chr11-530001-537000.fa"
fasta_fai = "assets/tests/reference/tiny-grch38-chr11-530001-537000.fa.fai"
fasta_dict = "assets/tests/reference/tiny-grch38-chr11-530001-537000.dict"
germline_resource = "assets/tests/reference/tiny-chr11-exac_common_3.hg38.vcf.gz"
germline_resource_tbi = "assets/tests/reference/tiny-chr11-exac_common_3.hg38.vcf.gz.tbi"
autosome_non_gap = "assets/tests/reference/tiny-intervals.bed"
fasta = "${projectDir}/assets/tests/reference/tiny-grch38-chr11-530001-537000.fa"
fasta_fai = "${projectDir}/assets/tests/reference/tiny-grch38-chr11-530001-537000.fa.fai"
fasta_dict = "${projectDir}/assets/tests/reference/tiny-grch38-chr11-530001-537000.dict"
germline_resource = "${projectDir}/assets/tests/reference/tiny-chr11-exac_common_3.hg38.vcf.gz"
germline_resource_tbi = "${projectDir}/assets/tests/reference/tiny-chr11-exac_common_3.hg38.vcf.gz.tbi"
autosome_non_gap = "${projectDir}/assets/tests/reference/tiny-intervals.bed"

}

Expand Down
87 changes: 56 additions & 31 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@
"https://github.com/icgc-argo-workflows/argo-modules.git": {
"modules": {
"icgc-argo-workflows": {
"checkinput": {
"branch": "impromptu_index",
"git_sha": "e1f2b946b457eac191c0fa97ae1d159a15874c6b",
"installed_by": ["modules", "stage_input"]
},
"cleanup": {
"branch": "main",
"git_sha": "8d014598ef81d65bece3684bd67aef7afae2cda9",
"installed_by": ["modules"]
},
"payload/qcmetrics": {
"branch": "main",
"git_sha": "681bcc1563700f8322be1ba65305a85e169f0909",
Expand All @@ -16,46 +26,56 @@
"installed_by": ["modules"]
},
"prep/sample": {
"branch": "main",
"git_sha": "06a2ac189b58d53ee8120ffc12aff485849c9a7e",
"branch": "impromptu_index",
"git_sha": "f253d1e6d4dc5f6ac0e6440041ee7e55b8203e35",
"installed_by": ["stage_input"]
},
"samtools/index": {
"branch": "impromptu_index",
"git_sha": "e1f2b946b457eac191c0fa97ae1d159a15874c6b",
"installed_by": ["stage_input"]
},
"score/download": {
"branch": "main",
"git_sha": "f5e2d027a4f886a8702f5c4be825801513b578d0",
"branch": "impromptu_index",
"git_sha": "19ee48fdf1672ef9723e3093531be7ddea3e27ec",
"installed_by": ["song_score_download"]
},
"score/upload": {
"branch": "main",
"git_sha": "f5e2d027a4f886a8702f5c4be825801513b578d0",
"git_sha": "19ee48fdf1672ef9723e3093531be7ddea3e27ec",
"installed_by": ["song_score_upload"]
},
"song/get": {
"branch": "main",
"git_sha": "f5e2d027a4f886a8702f5c4be825801513b578d0",
"branch": "impromptu_index",
"git_sha": "19ee48fdf1672ef9723e3093531be7ddea3e27ec",
"installed_by": ["song_score_download"]
},
"song/manifest": {
"branch": "main",
"git_sha": "f5e2d027a4f886a8702f5c4be825801513b578d0",
"git_sha": "19ee48fdf1672ef9723e3093531be7ddea3e27ec",
"installed_by": ["song_score_upload"]
},
"song/publish": {
"branch": "main",
"git_sha": "f5e2d027a4f886a8702f5c4be825801513b578d0",
"git_sha": "19ee48fdf1672ef9723e3093531be7ddea3e27ec",
"installed_by": ["song_score_upload"]
},
"song/submit": {
"branch": "main",
"git_sha": "f5e2d027a4f886a8702f5c4be825801513b578d0",
"git_sha": "19ee48fdf1672ef9723e3093531be7ddea3e27ec",
"installed_by": ["song_score_upload"]
},
"tabix/tabix": {
"branch": "impromptu_index",
"git_sha": "e1f2b946b457eac191c0fa97ae1d159a15874c6b",
"installed_by": ["stage_input"]
}
}
},
"subworkflows": {
"icgc-argo-workflows": {
"song_score_download": {
"branch": "main",
"branch": "impromptu_index",
"git_sha": "92aa620385099e94401c22b8633cc55ed34ca10e",
"installed_by": ["stage_input"]
},
Expand All @@ -65,8 +85,8 @@
"installed_by": ["subworkflows"]
},
"stage_input": {
"branch": "main",
"git_sha": "fd02dbcc7dc4e922dc60004b297ac11833154c2e",
"branch": "impromptu_index",
"git_sha": "e1f2b946b457eac191c0fa97ae1d159a15874c6b",
"installed_by": ["subworkflows"]
}
}
Expand All @@ -77,82 +97,87 @@
"nf-core": {
"custom/dumpsoftwareversions": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"git_sha": "de45447d060b8c8b98575bc637a4a575fd0638e1",
"installed_by": ["modules"]
},
"gatk4/calculatecontamination": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"gatk4/gatherpileupsummaries": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"gatk4/getpileupsummaries": {
"branch": "master",
"git_sha": "2df2a11d5b12f2a73bca74f103691bc35d83c5fd",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"gatk4/intervallisttobed": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"mosdepth": {
"branch": "master",
"git_sha": "ebb27711cd5f4de921244bfa81c676504072d31c",
"git_sha": "69e3eb17fb31b772b18f134d6e8f8b93ee980e65",
"installed_by": ["modules"]
},
"multiqc": {
"branch": "master",
"git_sha": "a6e11ac655e744f7ebc724be669dd568ffdc0e80",
"git_sha": "ccacf6f5de6df3bc6d73b665c1fd2933d8bbc290",
"installed_by": ["modules"]
},
"picard/bedtointervallist": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"git_sha": "20b0918591d4ba20047d7e13e5094bcceba81447",
"installed_by": ["modules"]
},
"picard/collecthsmetrics": {
"branch": "master",
"git_sha": "0ce3ab0ac301f160225b22254fa238478b4389f2",
"installed_by": ["modules", "bam_qc_picard"]
"git_sha": "20b0918591d4ba20047d7e13e5094bcceba81447",
"installed_by": ["bam_qc_picard", "modules"]
},
"picard/collectmultiplemetrics": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"git_sha": "20b0918591d4ba20047d7e13e5094bcceba81447",
"installed_by": ["bam_qc_picard"]
},
"picard/collectwgsmetrics": {
"branch": "master",
"git_sha": "735e1e04e7e01751d2d6e97055bbdb6f70683cc1",
"installed_by": ["modules", "bam_qc_picard"]
"git_sha": "20b0918591d4ba20047d7e13e5094bcceba81447",
"installed_by": ["bam_qc_picard", "modules"]
},
"samtools/index": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
"installed_by": ["modules"]
},
"samtools/stats": {
"branch": "master",
"git_sha": "735e1e04e7e01751d2d6e97055bbdb6f70683cc1",
"git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
"installed_by": ["modules"]
},
"tabix/bgziptabix": {
"branch": "master",
"git_sha": "591b71642820933dcb3c954c934b397bd00d8e5e",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"tabix/tabix": {
"branch": "master",
"git_sha": "9502adb23c0b97ed8e616bbbdfa73b4585aec9a1",
"installed_by": ["modules"]
},
"untarfiles": {
"branch": "master",
"git_sha": "5c460c5a4736974abde2843294f35307ee2b0e5e",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"verifybamid/verifybamid2": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
}
}
Expand Down
56 changes: 56 additions & 0 deletions modules/icgc-argo-workflows/checkinput/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
process CHECKINPUT {
tag "$samplesheet"
label 'process_single'

conda "conda-forge::python=3.8.3"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/python:3.8.3' :
'quay.io/biocontainers/python:3.8.3' }"

input:
path samplesheet
val workflow_name

output:
path 'samplesheet.valid.csv', emit: csv
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when

script:
"""
case '$workflow_name' in
'Pre Alignment QC')
echo $workflow_name detected;
prealnqc.py \\
$samplesheet \\
samplesheet.valid.csv
;;
'DNA Alignment QC')
dnaalnqc.py \\
$samplesheet \\
samplesheet.valid.csv
;;
'DNA Alignment')
dnaaln.py \\
$samplesheet \\
samplesheet.valid.csv
;;
'Germline Variant Call')
germlinevar.py \\
$samplesheet \\
samplesheet.valid.csv
;;
*)
echo -n "Unknown workflow"
exit 1
;;
esac
cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
}
Loading

0 comments on commit 0aa5596

Please sign in to comment.