+```
+
#### I want to generate a new formatted database
```{bash .annotate hl_lines="5"}
@@ -95,4 +102,4 @@ nextflow run fmalmeida/bacannot -profile docker,quicktest --bacannot_db ./bacann
### Annotation with bakta
-User can also perform the core generic annotation with bakta instead of prokka. Please read [the manual](manual#bakta-annotation).
+User can also perform the core generic annotation with bakta instead of prokka. Please read [the manual](manual.md#bakta-annotation).
diff --git a/docs/reports/report_MGEs.html b/docs/reports/report_MGEs.html
index c85ef8f1..9f6e4eda 100644
--- a/docs/reports/report_MGEs.html
+++ b/docs/reports/report_MGEs.html
@@ -11,7 +11,7 @@
-
+
Annotation of mobile genetic elements
@@ -4910,7 +4910,7 @@
Annotation of mobile genetic elements
Produced with bacannot pipeline
-05 May 2023
+19 March 2023
@@ -4949,6 +4949,11 @@ About
Platon detects plasmid contigs within bacterial draft genomes from WGS short-read assemblies.
Therefore, Platon analyzes the natural distribution biases of certain protein coding genes between chromosomes and plasmids.
+MOB Suite;
+
+- Software tools for clustering, reconstruction and typing of plasmids from draft assemblies.
+- In the pipeline, only the typer tool is used.
+
IslandPath.
- IslandPath-DIMOB is a standalone software to predict genomic islands in bacterial and archaeal genomes based on the presence of dinucleotide biases and mobility genes.
@@ -4957,6 +4962,10 @@ About
- digIS is a command-line tool for detection of insertion sequences (IS) in prokaryotic genomes.
+Integron Finder.
+
+- a command line tool to identity integrons in DNA sequences
+
Prediction thresholds
@@ -4995,8 +5004,8 @@
Plasmidfinder
Table 1: In silico detection of plasmids with Plasmidfinder
-
-
+
+
Platon
@@ -5007,8 +5016,20 @@
Platon
Table 2: In silico detection of plasmids with Platon
-
-
+
+
+
+
+
MOB suite (typer)
+
MOB-typer provides in silico predictions of the replicon family, relaxase type, mate-pair formation type and predicted transferability of the plasmid. Using a combination of biomarkers and MOB-cluster codes, it will also provide an observed host-range of your plasmid based on its replicon, relaxase and cluster assignment. This is combined with information mined from the literature to provide a prediction of the taxonomic rank at which the plasmid is likely to be stably maintained but it does not provide source attribution predictions.
+
+- The complete results can be found in the directory
plasmids/mob_suite
under the main output directory.
+
+
+Table 3: In silico typing of plasmids with MOB suite
+
+
+
@@ -5018,7 +5039,7 @@
Prophage detection
Phigaro
-
Phigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated โprophage genome mapsโ and marks possible transposon insertion spots inside prophages. Its results can be nicely visualized in its own html report file stored in its output directory. The genomic regions predicted as putative prophage sequences are also summarized in Table 3.
+
Phigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated โprophage genome mapsโ and marks possible transposon insertion spots inside prophages. Its results can be nicely visualized in its own html report file stored in its output directory. The genomic regions predicted as putative prophage sequences are also summarized in Table 4.
- Check it out at:
@@ -5028,37 +5049,37 @@ Phigaro
-Table 3: Putative prophage sequences annotated with phigaro software
+Table 4: Putative prophage sequences annotated with phigaro software
-
-
+
+
PhiSpy
-
PhiSpy is a standalone tool that identifies prophages in Bacterial (and probably Archaeal) genomes. Given an annotated genome it will use several approaches to identify the most likely prophage regions. The genomic regions predicted as putative prophage sequences are also summarized in Table 4.
+
PhiSpy is a standalone tool that identifies prophages in Bacterial (and probably Archaeal) genomes. Given an annotated genome it will use several approaches to identify the most likely prophage regions. The genomic regions predicted as putative prophage sequences are also summarized in Table 5.
- Check the results at
prophages/phispy
in the main output directory
-Table 4: Putative prophage sequences annotated with phispy software
+Table 5: Putative prophage sequences annotated with phispy software
-
-
+
+
PHAST database
-
All prophage genes from PHAST database that had good alignments to the genes of the query genome are summarized in Table 5. The protein sequences of these genes were aligned against the gene sequences predicted by Prokka via BLASTp. They are all available in the genome browser provided. A good way to interrogate this annotation is to visualize the putative prophage regions predicted by phigaro and phispy interpolating it with the prophage gene annotation provided with phast database.
+
All prophage genes from PHAST database that had good alignments to the genes of the query genome are summarized in Table 6. The protein sequences of these genes were aligned against the gene sequences predicted by Prokka via BLASTp. They are all available in the genome browser provided. A good way to interrogate this annotation is to visualize the putative prophage regions predicted by phigaro and phispy interpolating it with the prophage gene annotation provided with phast database.
Unfortunately, PHASTER database have no searchable interface to visualize its prophages. Therefore, this table has no links to external sources.
-Table 5: Prophage genes annotated using PHAST database via BLASTp
+Table 6: Prophage genes annotated using PHAST database via BLASTp
-
-
+
+
@@ -5068,28 +5089,28 @@
ICEberg database
Analysis of full-length ICEs
-
Full-length ICEs are available at ICEberg database in nucleotide fastas while the proteins found inside these ICEs are in protein fastas. Since the ICEfinder script has no licenses to be incorporated to the pipeline, we try to search for the full-length ICEs. However, they are very difficult to be completely found in new genomes, thus they are scanned without coverage or identity thresholds. The filtering and selection of these is up to you. We have found a total of 35 alignments in the query genome, check it out in table 6.
+
Full-length ICEs are available at ICEberg database in nucleotide fastas while the proteins found inside these ICEs are in protein fastas. Since the ICEfinder script has no licenses to be incorporated to the pipeline, we try to search for the full-length ICEs. However, they are very difficult to be completely found in new genomes, thus they are scanned without coverage or identity thresholds. The filtering and selection of these is up to you. We have found a total of 35 alignments in the query genome, check it out in table 7.
Users are advised to also use the ICEfinder tool to predict the putative genomic position of known ICEs since we are not allowed to include this step under this pipeline.
-Table 6: Alignment of full-length ICEs to the query genome via BLASTn
+Table 7: Alignment of full-length ICEs to the query genome via BLASTn
-
-
+
+
Analysis of ICEโs proteins
-
All query genes predicted by Prokka that have a match in ICEberg database are shown in Table 7. It is summarized the ICE id and all its genes that were found in the query genome. All of them are linked to the database for further investigations.
+
All query genes predicted by Prokka that have a match in ICEberg database are shown in Table 8. It is summarized the ICE id and all its genes that were found in the query genome. All of them are linked to the database for further investigations.
Take note: The fact that the genome possess some proteins from ICEs does not necessarily means that the ICE is present in the genome. Please, check the number of proteins that the ICE of origin posses in the ICEberg database list of ICEs, and then make inferences based one the alignments you see.
Users are advised to also use the ICEfinder tool to predict the putative genomic position of known ICEs since we are not allowed to include this step under this pipeline.
-Table 7: ICE genes annotated from ICEberg database via BLASTp
+Table 8: ICE genes annotated from ICEberg database via BLASTp
-
-
+
+
+
diff --git a/docs/reports/report_general.html b/docs/reports/report_general.html
index 911d15e7..d005400b 100644
--- a/docs/reports/report_general.html
+++ b/docs/reports/report_general.html
@@ -11,7 +11,7 @@
-
+
Generic annotation
@@ -4910,7 +4910,7 @@
Generic annotation
Produced with bacannot pipeline
-05 May 2023
+19 March 2023
@@ -4924,14 +4924,14 @@ About
RefSeq Masher
RefSeq Masher is a tool that enables to rapidly find what NCBI RefSeq genomes match or are contained within your sequence data using Mash MinHash with a Mash sketch database of NCBI RefSeq Genomes. The results are shown below (bacannot outputs only the top 10).
-
-
+
+
MLST
Bacannot uses the mlst package to scan the PubMLST schemes available in order to classify the genome under public multilocus sequence type schemes. The results for ecoli are shown below.
-
-
+
+
Prokka
@@ -4939,21 +4939,21 @@
Prokka
In bacannot, when using prokka, the prokka database is incremented with either TIGRFAM hmm hosted at NCBI or with the extensive PGAP hmm database hosted at NCBI with the parameter --prokka_use_pgap
is used.
-
-
+
+
Barrnap
Barrnap is a fast Ribosomal RNA predictor for bacterias, from the same developer of Prokka. It is fast and produces a GFF of the predicted rRNAs (See below).
-
-
+
+
KEGG KOs
KEGG KOs are annotated with KofamScan, which is a gene function annotation tool based on KEGG Orthology and hidden Markov model. You need KOfam database to use this tool. Online version is available on https://www.genome.jp/tools/kofamkoala/.
After annotation, the results are plotted with
KEGGDecoder (See below).
@@ -4981,11 +4981,11 @@
CARD RGI
Table 1: RGI annotation results. The perfect hits are highlighted in yellow while the strict hits in light blue.
-
-
+
+
@@ -5053,8 +5053,8 @@
Prokka
Table 4: Generic annotation of resistance determinants by Prokka
-
-
+
+
diff --git a/docs/reports/report_virulence.html b/docs/reports/report_virulence.html
index 05b9a600..1ddc92c4 100644
--- a/docs/reports/report_virulence.html
+++ b/docs/reports/report_virulence.html
@@ -11,7 +11,7 @@
-
+
Annotation of virulence factors
@@ -4910,7 +4910,7 @@
Annotation of virulence factors
Produced with bacannot pipeline
-05 May 2023
+19 March 2023
@@ -4990,8 +4990,8 @@ Detailed information
Table 1: Virulence factors annotated using the VFDB database via BLASTn
-
-
+
+
@@ -5003,8 +5003,8 @@
Victors
Table 2: Virulence factors annotated using the Victors database via BLASTp
-
-
+
+
diff --git a/docs/requirements.txt b/docs/requirements.txt
index d5ff7eda..42c51dbf 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -14,4 +14,5 @@ mergedeep>=1.3.4
colorama>=0.4; platform_system == 'Windows'
mkdocs-pymdownx-material-extras
mkdocs-git-revision-date-plugin
-mkdocs-material
\ No newline at end of file
+mkdocs-material
+mkdocs-macros-plugin
\ No newline at end of file
diff --git a/lib/WorkflowBacannot.groovy b/lib/WorkflowBacannot.groovy
index 94e05db2..527e14c0 100755
--- a/lib/WorkflowBacannot.groovy
+++ b/lib/WorkflowBacannot.groovy
@@ -10,8 +10,20 @@ class WorkflowBacannot {
public static void initialise(params, log) {
// input has been given and user does not want to download databases?
- if (!params.input && !params.get_dbs) {
- log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.yml'. Or select the download databases mode with --get_dbs."
+ if (!params.input && !params.get_dbs && !params.get_zenodo_db) {
+ log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.yml'. Or select the download databases mode with --get_dbs or --get_zenodo_db"
+ System.exit(1)
+ }
+
+ // using incompatible parameters?
+ if (params.input && (params.get_dbs || params.get_zenodo_db)) {
+ log.error "Not possible to run (--input) the pipeline and try to download databases (--get_dbs or --get_zenodo_db). Please do one or another."
+ System.exit(1)
+ }
+
+ // input has been given and user does not want to download databases?
+ if (params.get_dbs && params.get_zenodo_db) {
+ log.error "Please select either --get_dbs or --get_zenodo_db, not both at the same time."
System.exit(1)
}
diff --git a/lib/WorkflowMain.groovy b/lib/WorkflowMain.groovy
index 7531147f..6ccb1c75 100755
--- a/lib/WorkflowMain.groovy
+++ b/lib/WorkflowMain.groovy
@@ -10,7 +10,7 @@ class WorkflowMain {
public static String citation(workflow) {
return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" +
"* The pipeline\n" +
- " https://doi.org/10.5281/zenodo.3627669\n\n" +
+ " https://doi.org/10.12688/f1000research.139488.1\n\n" +
"* The nf-core framework\n" +
" https://doi.org/10.1038/s41587-020-0439-x\n\n" +
"* Software dependencies\n" +
@@ -74,6 +74,30 @@ class WorkflowMain {
System.exit(0)
}
+ // Download docker config
+ if (params.get_docker_config) {
+ new File("docker.config").write(new URL ("https://github.com/fmalmeida/bacannot/raw/master/conf/docker.config").getText())
+ log.info """
+ docker.config file saved in working directory
+ After configuration, run:
+ nextflow run fmalmeida/bacannot -c ./docker.config
+ Nice code
+ """.stripIndent()
+ System.exit(0)
+ }
+
+ // Download singularity config
+ if (params.get_singularity_config) {
+ new File("singularity.config").write(new URL ("https://github.com/fmalmeida/bacannot/raw/master/conf/singularity.config").getText())
+ log.info """
+ singularity.config file saved in working directory
+ After configuration, run:
+ nextflow run fmalmeida/bacannot -c ./singularity.config
+ Nice code
+ """.stripIndent()
+ System.exit(0)
+ }
+
// Validate workflow parameters via the JSON schema
if (params.validate_params) {
NfcoreSchema.validateParameters(workflow, params, log)
diff --git a/main.nf b/main.nf
index f86a7eb3..7c8421ef 100644
--- a/main.nf
+++ b/main.nf
@@ -36,7 +36,7 @@ include { CREATE_DBS } from './workflows/bacannot_dbs.nf'
workflow {
- if (params.get_dbs) {
+ if (params.get_dbs || params.get_zenodo_db) {
CREATE_DBS()
} else {
if (params.input) {
diff --git a/markdown/CHANGELOG.md b/markdown/CHANGELOG.md
index b47caeed..c5bbbec5 100644
--- a/markdown/CHANGELOG.md
+++ b/markdown/CHANGELOG.md
@@ -2,6 +2,20 @@
The tracking for changes started in v2.1
+## v3.3 [01-October-2023]
+
+* [[#50](https://github.com/fmalmeida/bacannot/issues/50)] -- Add `Integron Finder` tool to the pipeline
+* [[#69](https://github.com/fmalmeida/bacannot/issues/69)] -- Change how tools use docker images in order to:
+ * make tools use public bioconda images whenever possible to allow easy addition of tools and avoid much conflicts in docker images
+ * dimish the size and tools inside the docker images, the docker images now are only built to contain tools and all required for modules that cannot just use bioconda docker images.
+* [[#81](https://github.com/fmalmeida/bacannot/issues/81)] -- Add `MOB Suite` tool to the pipeline
+* [[#85](https://github.com/fmalmeida/bacannot/issues/85)] -- Include checkup on header size for Prokka
+* [[#98](https://github.com/fmalmeida/bacannot/issues/98)] -- Add ICEberg and PHAST blastp results to json summary
+* [[#100](https://github.com/fmalmeida/bacannot/issues/100)] -- Update pipeline to use docker shasum instead of tags
+* [[#107](https://github.com/fmalmeida/bacannot/issues/107)] -- Add a parameter, `--enable_deduplication` for deduplicating input reads before assembly
+* Update unicycler docker image to latest '0.5.0--py310h6cc9453_3' to avoid errors originated from previous image containing buggy installation.
+* Other minor changes / updates highlited in [[#93](https://github.com/fmalmeida/bacannot/pull/93)]
+
## v3.2 [19-December-2022]
* Fixes https://github.com/fmalmeida/bacannot/issues/68 reported by @lam-c
diff --git a/markdown/list_of_tools.md b/markdown/list_of_tools.md
index 68d4bf21..27a1727a 100644
--- a/markdown/list_of_tools.md
+++ b/markdown/list_of_tools.md
@@ -16,10 +16,12 @@ These are the tools that wrapped inside bacannot. **Cite** the tools whenever yo
| Annotation of virulence genes | [Victors](http://www.phidias.us/victors/) and [VFDB](http://www.mgc.ac.cn/VFs/main.htm) |
| Prophage sequences and genes annotation | [PHASTER](http://phast.wishartlab.com/), [Phigaro](https://github.com/bobeobibo/phigaro) and [PhySpy](https://github.com/linsalrob/PhiSpy) |
| Annotation of integrative and conjugative elements | [ICEberg](http://db-mml.sjtu.edu.cn/ICEberg/) |
+| Annotation of bacterial integrons | [Integron Finder](https://github.com/gem-pasteur/Integron_Finder) |
| Focused detection of insertion sequences | [digIS](https://github.com/janka2012/digIS) |
-| _In silico_ detection of plasmids | [Plasmidfinder](https://cge.cbs.dtu.dk/services/PlasmidFinder/) and [Platon](https://github.com/oschwengers/platon) |
+| _In silico_ detection and typing of plasmids | [Plasmidfinder](https://cge.cbs.dtu.dk/services/PlasmidFinder/), [Platon](https://github.com/oschwengers/platon) and [MOB-typer](https://github.com/phac-nml/mob-suite)|
| Prediction and visualization of genomic islands | [IslandPath-DIMOB](https://github.com/brinkmanlab/islandpath) and [gff-toolbox](https://github.com/fmalmeida/gff-toolbox) |
| Custom annotation from formatted FASTA or NCBI protein IDs | [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs) |
| Merge of annotation results | [bedtools](https://bedtools.readthedocs.io/en/latest/) |
| Genome Browser renderization | [JBrowse](http://jbrowse.org/) |
+| Circos plot generation | [easy_circos](https://easy_circos.readthedocs.io/en/latest/index.html) |
| Renderization of automatic reports and shiny app for results interrogation | [R Markdown](https://rmarkdown.rstudio.com/), [Shiny](https://shiny.rstudio.com/) and [SequenceServer](https://sequenceserver.com/) |
diff --git a/mkdocs.yml b/mkdocs.yml
index 58482435..1a5157ba 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -22,6 +22,8 @@ theme:
repo: fontawesome/brands/github-alt
plugins:
- git-revision-date
+ - search
+ - macros
markdown_extensions:
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
diff --git a/modules/KOs/kofamscan.nf b/modules/KOs/kofamscan.nf
index 87d25450..cd07f66e 100644
--- a/modules/KOs/kofamscan.nf
+++ b/modules/KOs/kofamscan.nf
@@ -4,7 +4,7 @@ process KOFAMSCAN {
else "$filename"
}
tag "${prefix}"
- label = [ 'misc', 'process_high' ]
+ label = [ 'process_high', 'error_retry' ]
input:
tuple val(prefix), file('proteins.faa')
diff --git a/modules/MGEs/draw_gis.nf b/modules/MGEs/draw_gis.nf
index ff64cfd2..13277613 100644
--- a/modules/MGEs/draw_gis.nf
+++ b/modules/MGEs/draw_gis.nf
@@ -5,7 +5,6 @@ process DRAW_GIS {
}
tag "${prefix}"
label = [ 'misc', 'process_ultralow' ]
-
input:
tuple val(prefix), file(gff), file(gis_bed)
diff --git a/modules/MGEs/integron_finder.nf b/modules/MGEs/integron_finder.nf
new file mode 100644
index 00000000..b06fccdf
--- /dev/null
+++ b/modules/MGEs/integron_finder.nf
@@ -0,0 +1,42 @@
+process INTEGRON_FINDER {
+ publishDir "${params.output}", mode: 'copy', saveAs: { filename ->
+ if (filename.indexOf("_version.txt") > 0) "tools_versioning/$filename"
+ else "${prefix}/integron_finder/$filename"
+ }
+ tag "${prefix}"
+ label = [ 'process_medium' ]
+
+ input:
+ tuple val(prefix), file(genome)
+
+ output:
+ tuple val(prefix), path("*") , emit: all
+ tuple val(prefix), path("${prefix}_integrons.gbk"), emit: gbk, optional: true
+ path("integronfinder_version.txt")
+
+ script:
+ def args = task.ext.args ?: ''
+ """
+ # Get version
+ integron_finder --version > integronfinder_version.txt ;
+
+ # run tool
+ integron_finder \\
+ --local-max \\
+ --func-annot \\
+ --pdf \\
+ --gbk \\
+ --cpu $task.cpus \\
+ $args \\
+ $genome
+
+ # move results
+ mv Results_Integron_Finder_${prefix}/* . ;
+ rm -rf Results_Integron_Finder_${prefix} ;
+
+ # convert to gff if available
+ for gbk in \$(ls *.gbk) ; do
+ cat \$gbk >> ${prefix}_integrons.gbk ;
+ done
+ """
+}
diff --git a/modules/MGEs/integron_finder_2gff.nf b/modules/MGEs/integron_finder_2gff.nf
new file mode 100644
index 00000000..6abaeab3
--- /dev/null
+++ b/modules/MGEs/integron_finder_2gff.nf
@@ -0,0 +1,24 @@
+process INTEGRON_FINDER_2GFF {
+ publishDir "${params.output}/${prefix}/integron_finder", mode: 'copy'
+ tag "${prefix}"
+ label = [ 'misc', 'process_low' ]
+
+ input:
+ tuple val(prefix), file(gbk)
+
+ output:
+ tuple val(prefix), path("${prefix}_integrons.gff"), emit: gff
+
+ script:
+ def args = task.ext.args ?: ''
+ """
+ # convert to gff if available
+ touch ${prefix}_integrons.gff ;
+ for gbk in \$(ls *.gbk) ; do
+ conda run -n perl bp_genbank2gff3 \$gbk -o - | \
+ grep 'integron_id' | \
+ sed 's|ID=.*integron_id=|ID=|g' | \
+ sed 's/GenBank/Integron_Finder/g' >> ${prefix}_integrons.gff
+ done
+ """
+}
diff --git a/modules/MGEs/islandpath.nf b/modules/MGEs/islandpath.nf
index d7ded993..d9ef1714 100644
--- a/modules/MGEs/islandpath.nf
+++ b/modules/MGEs/islandpath.nf
@@ -1,7 +1,9 @@
process ISLANDPATH {
publishDir "${params.output}/${prefix}/genomic_islands", mode: 'copy'
tag "${prefix}"
- label = [ 'perl', 'process_low' ]
+ label = [ 'process_low' ]
+ errorStrategy = 'retry'
+ maxRetries = 5
input:
tuple val(prefix), file("annotation.gbk")
diff --git a/modules/MGEs/mob_suite.nf b/modules/MGEs/mob_suite.nf
new file mode 100644
index 00000000..14256e92
--- /dev/null
+++ b/modules/MGEs/mob_suite.nf
@@ -0,0 +1,36 @@
+process MOBSUITE {
+ publishDir "${params.output}", mode: 'copy', saveAs: { filename ->
+ if (filename.indexOf("_version.txt") > 0) "tools_versioning/$filename"
+ else "${prefix}/plasmids/mob_suite/$filename"
+ }
+ tag "${prefix}"
+ label = [ 'process_medium' ]
+
+ input:
+ tuple val(prefix), file(genome)
+
+ output:
+ tuple val(prefix), path("${prefix}_mobtyper_results.txt"), emit: results
+ path("mobtyper_version.txt")
+
+ script:
+ def args = task.ext.args ?: ''
+ """
+ # Get version
+ mob_typer --version > mobtyper_version.txt ;
+
+ # run tool
+ mob_typer \\
+ --multi \\
+ --num_threads $task.cpus \\
+ --sample_id $prefix \\
+ --infile $genome \\
+ $args \\
+ --out_file ${prefix}_mobtyper_results.txt
+
+ # convert to gff if available
+ # for gbk in \$(ls *.gbk) ; do
+ # cat \$gbk >> ${prefix}_integrons.gbk ;
+ # done
+ """
+}
diff --git a/modules/MGEs/plasmidfinder.nf b/modules/MGEs/plasmidfinder.nf
index 318e93ea..91580170 100644
--- a/modules/MGEs/plasmidfinder.nf
+++ b/modules/MGEs/plasmidfinder.nf
@@ -4,7 +4,7 @@ process PLASMIDFINDER {
else null
}
tag "${prefix}"
- label = [ 'python', 'process_low' ]
+ label = [ 'process_low' ]
input:
tuple val(prefix), file(genome)
diff --git a/modules/MGEs/platon.nf b/modules/MGEs/platon.nf
index 086ab4d3..e4be53ec 100644
--- a/modules/MGEs/platon.nf
+++ b/modules/MGEs/platon.nf
@@ -5,7 +5,7 @@ process PLATON {
else null
}
tag "${prefix}"
- label = [ 'python', 'process_medium' ]
+ label = [ 'process_medium' ]
input:
tuple val(prefix), file(genome)
diff --git a/modules/assembly/flye.nf b/modules/assembly/flye.nf
index 05e73580..d588d01e 100644
--- a/modules/assembly/flye.nf
+++ b/modules/assembly/flye.nf
@@ -4,7 +4,7 @@ process FLYE {
else if (filename == "flye_${prefix}") "assembly"
else null
}
- label 'process_high'
+ label = [ 'process_high', 'error_retry' ]
tag "${prefix}"
input:
@@ -18,14 +18,21 @@ process FLYE {
script:
lr = (lr_type == 'nanopore') ? '--nano-raw' : '--pacbio-raw'
+ dedup_lr = params.enable_deduplication ?
+ "gunzip -cf $lreads | awk '{if(NR%4==1) \$0=sprintf(\"@1_%d\",(1+i++)); print;}' | gzip -c > ${prefix}_deduplicated_reads.fastq.gz" :
+ "ln -s $lreads ${prefix}_deduplicated_reads.fastq.gz"
+
"""
# Save flye version
flye -v > flye_version.txt ;
+ # remove duplicate reads
+ $dedup_lr
+
# Run flye
flye \\
${lr} \\
- $lreads \\
+ ${prefix}_deduplicated_reads.fastq.gz \\
--out-dir flye_${prefix} \\
--threads $task.cpus &> flye.log ;
diff --git a/modules/assembly/unicycler.nf b/modules/assembly/unicycler.nf
index 75883bf5..145f1c1f 100644
--- a/modules/assembly/unicycler.nf
+++ b/modules/assembly/unicycler.nf
@@ -4,7 +4,7 @@ process UNICYCLER {
else if (filename == "unicycler_${prefix}") "assembly"
else null
}
- label 'process_high'
+ label = [ 'process_high', 'error_retry' ]
tag "${prefix}"
input:
@@ -17,14 +17,55 @@ process UNICYCLER {
path('unicycler_version.txt'), emit: version
script:
- unpaired_param = (sreads.getName() != "input.3") ? "-s $sreads" : ""
- paired_param = (sread1.getName() != "input.1" && sread2.getName() != "input.2") ? "-1 $sread1 -2 $sread2" : ""
- lr_param = (lreads.getName() != "input.4") ? "-l $lreads" : ""
+ unpaired_param = ""
+ dedup_sreads = ""
+ paired_param = ""
+ dedup_paired = ""
+ lr_param = ""
+ dedup_lr = ""
+
+ // sreads
+ if (sreads.getName() != "input.3") {
+
+ dedup_sreads = params.enable_deduplication ?
+ "gunzip -cf $sreads | awk '{if(NR%4==1) \$0=sprintf(\"@1_%d\",(1+i++)); print;}' | gzip -c > ${prefix}_deduplicated_sreads.fastq.gz" :
+ "ln -s $sreads ${prefix}_deduplicated_sreads.fastq.gz"
+
+ unpaired_param = "-s ${prefix}_deduplicated_sreads.fastq.gz"
+
+ }
+
+ // paired
+ if (sread1.getName() != "input.1" && sread2.getName() != "input.2") {
+
+ dedup_paired = params.enable_deduplication ?
+ "gunzip -cf $sread1 | awk '{if(NR%4==1) \$0=sprintf(\"@1_%d\",(1+i++)); print;}' | gzip -c > ${prefix}_deduplicated_sread_R1.fastq.gz && gunzip -cf $sread2 | awk '{if(NR%4==1) \$0=sprintf(\"@1_%d\",(1+i++)); print;}' | gzip -c > ${prefix}_deduplicated_sread_R2.fastq.gz" :
+ "ln -s $sread1 ${prefix}_deduplicated_sread_R1.fastq.gz && ln -s $sread2 ${prefix}_deduplicated_sread_R2.fastq.gz"
+
+ paired_param = "-1 ${prefix}_deduplicated_sread_R1.fastq.gz -2 ${prefix}_deduplicated_sread_R2.fastq.gz"
+
+ }
+
+ // lreads
+ if (lreads.getName() != "input.4") {
+
+ dedup_lr = params.enable_deduplication ?
+ "gunzip -cf $lreads | awk '{if(NR%4==1) \$0=sprintf(\"@1_%d\",(1+i++)); print;}' | gzip -c > ${prefix}_deduplicated_lreads.fastq.gz" :
+ "ln -s $lreads ${prefix}_deduplicated_lreads.fastq.gz"
+
+ lr_param = "-l $lreads"
+
+ }
"""
# Save unicycler version
unicycler --version > unicycler_version.txt
+ # remove duplicate reads
+ $dedup_sreads
+ $dedup_paired
+ $dedup_lr
+
# Run unicycler
unicycler \\
$paired_param \\
diff --git a/modules/bacannot_dbs/amrfinder.nf b/modules/bacannot_dbs/amrfinder.nf
index 0d3b3955..c5039973 100644
--- a/modules/bacannot_dbs/amrfinder.nf
+++ b/modules/bacannot_dbs/amrfinder.nf
@@ -1,7 +1,7 @@
process AMRFINDER_DB {
publishDir "${params.output}/amrfinder_db", mode: 'copy', overwrite: "$params.force_update"
- label = [ 'db_download', 'process_ultralow' ]
-
+ label 'process_ultralow'
+
output:
file("*")
diff --git a/modules/bacannot_dbs/antismash.nf b/modules/bacannot_dbs/antismash.nf
index 9f0fdd2f..5e9b8962 100644
--- a/modules/bacannot_dbs/antismash.nf
+++ b/modules/bacannot_dbs/antismash.nf
@@ -1,7 +1,7 @@
process ANTISMASH_DB {
publishDir "${params.output}/antismash_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/argminer.nf b/modules/bacannot_dbs/argminer.nf
index bacbb91a..6b2881ea 100644
--- a/modules/bacannot_dbs/argminer.nf
+++ b/modules/bacannot_dbs/argminer.nf
@@ -1,7 +1,7 @@
process ARGMINER_DB {
publishDir "${params.output}/argminer_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/card.nf b/modules/bacannot_dbs/card.nf
index d28feacb..411802a0 100644
--- a/modules/bacannot_dbs/card.nf
+++ b/modules/bacannot_dbs/card.nf
@@ -1,7 +1,7 @@
process CARD_DB {
publishDir "${params.output}/card_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
@@ -11,5 +11,5 @@ process CARD_DB {
wget --tries=10 https://card.mcmaster.ca/latest/data
tar -xvf data ./card.json
rm data
- """
+ """
}
diff --git a/modules/bacannot_dbs/get_zenodo.nf b/modules/bacannot_dbs/get_zenodo.nf
new file mode 100644
index 00000000..cb9aab4f
--- /dev/null
+++ b/modules/bacannot_dbs/get_zenodo.nf
@@ -0,0 +1,19 @@
+process GET_ZENODO_DB {
+ publishDir "${params.output}", mode: 'copy', overwrite: "$params.force_update"
+ label = [ 'db_download', 'process_low' ]
+
+ tag "Downloading pre-built databases"
+
+ output:
+ file("*")
+
+ script:
+ """
+ # download database from zenodo
+ zenodo_get https://doi.org/10.5281/zenodo.7615811
+
+ # organize data
+ tar zxvf *.tar.gz && rm *.tar.gz
+ rm -rf \$( find . -name 'pipeline_info' )
+ """
+}
\ No newline at end of file
diff --git a/modules/bacannot_dbs/iceberg.nf b/modules/bacannot_dbs/iceberg.nf
index f7f859b8..1cebc233 100644
--- a/modules/bacannot_dbs/iceberg.nf
+++ b/modules/bacannot_dbs/iceberg.nf
@@ -1,7 +1,7 @@
process ICEBERG_DB {
publishDir "${params.output}/iceberg_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/kofamscan.nf b/modules/bacannot_dbs/kofamscan.nf
index 71baa955..2a5e6303 100644
--- a/modules/bacannot_dbs/kofamscan.nf
+++ b/modules/bacannot_dbs/kofamscan.nf
@@ -1,19 +1,28 @@
process KOFAMSCAN_DB {
publishDir "${params.output}/kofamscan_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_low' ]
-
+
output:
file("*")
script:
+ if (workflow.containerEngine != 'singularity') {
+ chmod_cmd = 'chmod a+rw profiles.tar.gz ko_list'
+ chown_cmd = 'chown -R root:\$(id -g) profiles'
+ tar_cmd = '--same-owner'
+ } else {
+ chmod_cmd = ''
+ chown_cmd = ''
+ tar_cmd = ''
+ }
"""
# download kofamscan database
wget --tries=10 ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
wget --tries=10 ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
gunzip ko_list.gz
- chmod a+rw profiles.tar.gz ko_list
- tar --same-owner -xvzf profiles.tar.gz
- chown -R root:\$(id -g) profiles
+ $chmod_cmd
+ tar $tar_cmd -xvzf profiles.tar.gz
+ $chown_cmd
rm -rf profiles.tar.gz
# for the sake of size and fastness
diff --git a/modules/bacannot_dbs/mlst.nf b/modules/bacannot_dbs/mlst.nf
index 5c6401de..11de39df 100644
--- a/modules/bacannot_dbs/mlst.nf
+++ b/modules/bacannot_dbs/mlst.nf
@@ -1,7 +1,7 @@
process MLST_DB {
publishDir "${params.output}/mlst_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/phast.nf b/modules/bacannot_dbs/phast.nf
index 6e1771e7..0c7587ef 100644
--- a/modules/bacannot_dbs/phast.nf
+++ b/modules/bacannot_dbs/phast.nf
@@ -1,7 +1,7 @@
process PHAST_DB {
publishDir "${params.output}/phast_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/phigaro.nf b/modules/bacannot_dbs/phigaro.nf
index 7c9c2bb5..03f9c5a5 100644
--- a/modules/bacannot_dbs/phigaro.nf
+++ b/modules/bacannot_dbs/phigaro.nf
@@ -1,7 +1,7 @@
process PHIGARO_DB {
publishDir "${params.output}/phigaro_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_medium' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/plasmidfinder.nf b/modules/bacannot_dbs/plasmidfinder.nf
index 18e778b4..05073cad 100644
--- a/modules/bacannot_dbs/plasmidfinder.nf
+++ b/modules/bacannot_dbs/plasmidfinder.nf
@@ -1,7 +1,7 @@
process PLASMIDFINDER_DB {
publishDir "${params.output}", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/platon.nf b/modules/bacannot_dbs/platon.nf
index e8e40c77..2d30881d 100644
--- a/modules/bacannot_dbs/platon.nf
+++ b/modules/bacannot_dbs/platon.nf
@@ -1,7 +1,7 @@
process PLATON_DB {
publishDir "${params.output}/platon_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_low' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/prokka.nf b/modules/bacannot_dbs/prokka.nf
index 594e7f59..f82a24e2 100644
--- a/modules/bacannot_dbs/prokka.nf
+++ b/modules/bacannot_dbs/prokka.nf
@@ -1,7 +1,7 @@
process PROKKA_DB {
publishDir "${params.output}/prokka_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_low' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/resfinder.nf b/modules/bacannot_dbs/resfinder.nf
index 46914a12..2c2d1f43 100644
--- a/modules/bacannot_dbs/resfinder.nf
+++ b/modules/bacannot_dbs/resfinder.nf
@@ -1,7 +1,7 @@
process RESFINDER_DB {
publishDir "${params.output}/resfinder_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/vfdb.nf b/modules/bacannot_dbs/vfdb.nf
index 6b9112f5..2a1673d9 100644
--- a/modules/bacannot_dbs/vfdb.nf
+++ b/modules/bacannot_dbs/vfdb.nf
@@ -1,7 +1,7 @@
process VFDB_DB {
publishDir "${params.output}/vfdb_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/bacannot_dbs/victors.nf b/modules/bacannot_dbs/victors.nf
index 6d9aa0f9..b5c28409 100644
--- a/modules/bacannot_dbs/victors.nf
+++ b/modules/bacannot_dbs/victors.nf
@@ -1,7 +1,7 @@
process VICTORS_DB {
publishDir "${params.output}/victors_db", mode: 'copy', overwrite: "$params.force_update"
label = [ 'db_download', 'process_ultralow' ]
-
+
output:
file("*")
diff --git a/modules/generic/bakta.nf b/modules/generic/bakta.nf
index 80253324..8d673ca2 100644
--- a/modules/generic/bakta.nf
+++ b/modules/generic/bakta.nf
@@ -1,11 +1,11 @@
process BAKTA {
publishDir "${params.output}/${prefix}", mode: 'copy', saveAs: { filename ->
- if (filename.indexOf("_version.txt") > 0) "tools_versioning/$filename"
- else if (filename == "annotation") "$filename"
- else null
+ if (filename.indexOf("_version.txt") > 0) "tools_versioning/$filename"
+ else if (filename == "annotation") "$filename"
+ else null
}
tag "${prefix}"
- label = [ 'misc', 'process_medium', 'error_retry' ]
+ label = [ 'process_medium', 'error_retry' ]
input:
tuple val(prefix), val(entrypoint), file(sread1), file(sread2), file(sreads), file(lreads), val(lr_type), file(fast5), file(assembly), val(resfinder_species)
@@ -33,6 +33,9 @@ process BAKTA {
# Save bakta version
bakta --version &> bakta_version.txt ;
+ # clean headers char limit
+ awk '{ if (\$0 ~ />/) print substr(\$0,1,21) ; else print \$0 }' $assembly > cleaned_header.fasta
+
# Run bakta
bakta \\
--output annotation \\
@@ -41,7 +44,7 @@ process BAKTA {
--prefix ${prefix} \\
--strain '${prefix}' \\
--db $bakta_db \\
- $assembly
+ cleaned_header.fasta
# fix fasta headers
cut -f 1 -d ' ' annotation/${prefix}.fna > tmp.fa
diff --git a/modules/generic/barrnap.nf b/modules/generic/barrnap.nf
index cea24619..b646ab91 100644
--- a/modules/generic/barrnap.nf
+++ b/modules/generic/barrnap.nf
@@ -4,7 +4,7 @@ process BARRNAP {
else "rRNA/$filename"
}
tag "${prefix}"
- label = [ 'perl', 'process_low' ]
+ label = [ 'process_low' ]
input:
tuple val(prefix), file(genome)
diff --git a/modules/generic/circos.nf b/modules/generic/circos.nf
index 4fe3d0bf..c0c7ccfe 100644
--- a/modules/generic/circos.nf
+++ b/modules/generic/circos.nf
@@ -4,8 +4,7 @@ process CIRCOS {
else "$filename"
}
tag "$prefix"
-
- label = [ 'perl', 'process_low' ]
+ label = [ 'misc', 'process_low' ]
input:
tuple val(prefix), path(inputs, stageAs: 'results*')
diff --git a/modules/generic/gc_skew.nf b/modules/generic/gc_skew.nf
index e4827c78..68ea6170 100644
--- a/modules/generic/gc_skew.nf
+++ b/modules/generic/gc_skew.nf
@@ -1,7 +1,6 @@
process GC_SKEW {
tag "$prefix"
-
- label = [ 'python', 'process_low' ]
+ label = [ 'misc', 'process_low' ]
input:
tuple val(prefix), path(inputs)
diff --git a/modules/generic/gff2gbk.nf b/modules/generic/gff2gbk.nf
index df73cb82..c1e0ff88 100644
--- a/modules/generic/gff2gbk.nf
+++ b/modules/generic/gff2gbk.nf
@@ -10,9 +10,6 @@ process GFF2GBK {
path "*.genbank", emit: results
"""
- # Activate env
- export PATH=/opt/conda/envs/antismash/bin:\$PATH
-
# Run emboss seqret
seqret \\
-sequence $input \\
diff --git a/modules/generic/gff2sql.nf b/modules/generic/gff2sql.nf
index 536609af..0c34824a 100644
--- a/modules/generic/gff2sql.nf
+++ b/modules/generic/gff2sql.nf
@@ -33,9 +33,6 @@ process CREATE_SQL {
fi
- # Save results with better name
- mv /work/${prefix}.sqlite . ;
-
# Save parser
cp /work/bscripts/run_server.sh . ;
"""
diff --git a/modules/generic/jbrowse.nf b/modules/generic/jbrowse.nf
index ea2cc183..d0b80a62 100644
--- a/modules/generic/jbrowse.nf
+++ b/modules/generic/jbrowse.nf
@@ -4,7 +4,7 @@ process JBROWSE {
tag "${prefix}"
input:
- tuple val(prefix), file(merged_gff), file(draft), file("prokka_gff"), file(barrnap), file(gc_bedGraph), file(gc_chrSizes), file(resfinder_gff), file(phigaro), file(genomic_islands), file("methylation"), file("chr.sizes"), file(phispy_tsv), file(digIS_gff), file(antiSMASH), file(custom_annotations)
+ tuple val(prefix), file(merged_gff), file(draft), file("prokka_gff"), file(barrnap), file(gc_bedGraph), file(gc_chrSizes), file(resfinder_gff), file(phigaro), file(genomic_islands), file("methylation"), file("chr.sizes"), file(phispy_tsv), file(digIS_gff), file(antiSMASH), file(custom_annotations), file(integron_finder)
output:
path "*", emit: results
@@ -29,6 +29,7 @@ process JBROWSE {
-S chr.sizes \\
-R $resfinder_gff \\
-d $digIS_gff \\
- -A $antiSMASH
+ -A $antiSMASH \\
+ -i $integron_finder
"""
}
diff --git a/modules/generic/karyotype.nf b/modules/generic/karyotype.nf
index eb8eb8d9..cc2b37ad 100644
--- a/modules/generic/karyotype.nf
+++ b/modules/generic/karyotype.nf
@@ -1,6 +1,5 @@
process MAKE_KARYOTYPE {
tag "$prefix"
-
label = [ 'misc', 'process_low' ]
input:
diff --git a/modules/generic/mash.nf b/modules/generic/mash.nf
index 6264fdb6..b2ca8961 100644
--- a/modules/generic/mash.nf
+++ b/modules/generic/mash.nf
@@ -4,7 +4,7 @@ process REFSEQ_MASHER {
else "refseq_masher/$filename"
}
tag "${prefix}"
- label = [ 'python', 'process_low' ]
+ label = [ 'process_low' ]
input:
tuple val(prefix), path(genome)
diff --git a/modules/generic/merge_annotations.nf b/modules/generic/merge_annotations.nf
index 96d589b7..7343bd8a 100644
--- a/modules/generic/merge_annotations.nf
+++ b/modules/generic/merge_annotations.nf
@@ -4,7 +4,7 @@ process MERGE_ANNOTATIONS {
tag "${prefix}"
input:
- tuple val(prefix), file('prokka_gff'), file(kofamscan), file(vfdb), file(victors), file(amrfinder), file(resfinder), file(rgi), file(iceberg), file(phast), file('digis_gff'), file(custom_databases)
+ tuple val(prefix), file('prokka_gff'), file(kofamscan), file(vfdb), file(victors), file(amrfinder), file(resfinder), file(rgi), file(iceberg), file(phast), file('digis_gff'), file(custom_databases), file(integron_finder)
output:
tuple val(prefix), path("${prefix}.gff") , emit: gff
@@ -108,5 +108,11 @@ process MERGE_ANNOTATIONS {
cat ${prefix}.gff transposable_elements_digis.gff | bedtools sort > tmp.out.gff ;
( cat tmp.out.gff > ${prefix}.gff && rm tmp.out.gff );
fi
+
+ ### integron_finder results
+ ### integrons are unique / complete elements and should not be intersected
+ cat ${prefix}.gff $integron_finder | bedtools sort > tmp.gff ;
+ cat tmp.gff > ${prefix}.gff
+ rm tmp.gff
"""
}
diff --git a/modules/generic/merge_summaries.nf b/modules/generic/merge_summaries.nf
index d4437b10..55c50389 100644
--- a/modules/generic/merge_summaries.nf
+++ b/modules/generic/merge_summaries.nf
@@ -1,7 +1,6 @@
process MERGE_SUMMARIES {
publishDir "${params.output}", mode: 'copy'
label = [ 'misc', 'process_low' ]
-
input:
path(summaries)
diff --git a/modules/generic/mlst.nf b/modules/generic/mlst.nf
index 488327c8..6a4ced52 100644
--- a/modules/generic/mlst.nf
+++ b/modules/generic/mlst.nf
@@ -1,10 +1,10 @@
process MLST {
publishDir "${params.output}/${prefix}", mode: 'copy', saveAs: { filename ->
- if (filename.indexOf("_version.txt") > 0) "tools_versioning/$filename"
- else "MLST/$filename"
+ if (filename.indexOf("_version.txt") > 0) "tools_versioning/$filename"
+ else "MLST/$filename"
}
tag "${prefix}"
- label = [ 'perl', 'process_ultralow' ]
+ label = [ 'process_ultralow' ]
input:
tuple val(prefix), file(genome)
@@ -19,9 +19,7 @@ process MLST {
script:
"""
# update tool database
- mlst_dir=\$(which mlst | sed 's/bin\\/mlst//g')
- cp ${bacannot_db}/mlst_db/* -r \${mlst_dir}/db/pubmlst/
- ( cd \$mlst_dir/scripts && ./mlst-make_blast_db )
+ mlst-make_blast_db.sh ${bacannot_db}/mlst_db
# Save mlst tool version
mlst --version > mlst_version.txt ;
diff --git a/modules/generic/prepare_circos.nf b/modules/generic/prepare_circos.nf
index 8afb64df..eec1f513 100644
--- a/modules/generic/prepare_circos.nf
+++ b/modules/generic/prepare_circos.nf
@@ -1,6 +1,5 @@
process PREPARE_CIRCOS {
tag "$prefix"
-
label = [ 'misc', 'process_low' ]
input:
diff --git a/modules/generic/prokka.nf b/modules/generic/prokka.nf
index 61ae6d7f..bd31e81d 100644
--- a/modules/generic/prokka.nf
+++ b/modules/generic/prokka.nf
@@ -1,11 +1,11 @@
process PROKKA {
publishDir "${params.output}/${prefix}", mode: 'copy', saveAs: { filename ->
- if (filename.indexOf("_version.txt") > 0) "tools_versioning/$filename"
- else if (filename == "annotation") "$filename"
- else null
+ if (filename.indexOf("_version.txt") > 0) "tools_versioning/$filename"
+ else if (filename == "annotation") "$filename"
+ else null
}
tag "${prefix}"
- label = [ 'perl', 'process_medium' ]
+ label = [ 'process_medium' ]
input:
tuple val(prefix), val(entrypoint), file(sread1), file(sread2), file(sreads), file(lreads), val(lr_type), file(fast5), file(assembly), val(resfinder_species)
@@ -26,11 +26,13 @@ process PROKKA {
path('prokka_version.txt'), emit: version
script:
- kingdom = (params.prokka_kingdom) ? "--kingdom ${params.prokka_kingdom}" : ''
- gcode = (params.prokka_genetic_code) ? "--gcode ${params.prokka_genetic_code}" : ''
- rnammer = (params.prokka_use_rnammer) ? "--rnammer" : ''
- models = (params.prokka_use_pgap) ? "PGAP_NCBI.hmm" : "TIGRFAMs_15.0.hmm"
+ kingdom = (params.prokka_kingdom) ? "--kingdom ${params.prokka_kingdom}" : ''
+ gcode = (params.prokka_genetic_code) ? "--gcode ${params.prokka_genetic_code}" : ''
+ rnammer = (params.prokka_use_rnammer) ? "--rnammer" : ''
+ models = (params.prokka_use_pgap) ? "PGAP_NCBI.hmm" : "TIGRFAMs_15.0.hmm"
"""
+ #!/usr/bin/env bash
+
# save prokka version
prokka -v &> prokka_version.txt ;
@@ -45,6 +47,9 @@ process PROKKA {
# hmmpress
( cd prokka_db/hmm/ ; for i in *.hmm ; do hmmpress -f \$i ; done )
+ # clean headers char limit
+ awk '{ if (\$0 ~ />/) print substr(\$0,1,21) ; else print \$0 }' $assembly > cleaned_header.fasta
+
# run prokka
prokka \\
--dbdir prokka_db \\
@@ -56,7 +61,7 @@ process PROKKA {
--genus '' \\
--species '' \\
--strain \"${prefix}\" \\
- $assembly
+ cleaned_header.fasta
# remove tmp dir to gain space
rm -r prokka_db
diff --git a/modules/generic/reports.nf b/modules/generic/reports.nf
index 29e2d418..772d470a 100644
--- a/modules/generic/reports.nf
+++ b/modules/generic/reports.nf
@@ -4,7 +4,7 @@ process REPORT {
tag "${prefix}"
input:
- tuple val(prefix), file('annotation_stats.tsv'), file(gff), file(barrnap), file(mlst), file(keggsvg), file(refseq_masher_txt), file(amrfinder), file(rgi), file(rgi_parsed), file(rgi_heatmap), file(argminer_out), file(resfinder_tab), file(resfinder_point), file(resfinder_phenotable), file(vfdb_blastn), file(victors_blastp), file(phigaro_txt), file(phispy_tsv), file(iceberg_blastp), file(iceberg_blastn), file(plasmids_tsv), file(platon_tsv), file(gi_image), file(phast_blastp), file(digIS)
+ tuple val(prefix), file('annotation_stats.tsv'), file(gff), file(barrnap), file(mlst), file(keggsvg), file(refseq_masher_txt), file(amrfinder), file(rgi), file(rgi_parsed), file(rgi_heatmap), file(argminer_out), file(resfinder_tab), file(resfinder_point), file(resfinder_phenotable), file(vfdb_blastn), file(victors_blastp), file(phigaro_txt), file(phispy_tsv), file(iceberg_blastp), file(iceberg_blastn), file(plasmids_tsv), file(platon_tsv), file(mobsuite_tsv), file(gi_image), file(phast_blastp), file(digIS), file(integronfinder)
output:
path '*.html', emit: results
@@ -23,54 +23,68 @@ process REPORT {
## Generate generic Report
rmarkdown::render("report_general.Rmd" , \
- params = list( generic_annotation = "annotation_stats.tsv", \
- generic_annotator = "${generic_annotator}", \
- kegg = "$keggsvg", \
- barrnap = "$barrnap", \
- mlst = "$mlst", \
- refseq_masher = "$refseq_masher_txt", \
- query = "${prefix}")) ;
+ params = list(
+ generic_annotation = "annotation_stats.tsv", \
+ generic_annotator = "${generic_annotator}", \
+ kegg = "$keggsvg", \
+ barrnap = "$barrnap", \
+ mlst = "$mlst", \
+ refseq_masher = "$refseq_masher_txt", \
+ query = "${prefix}"
+ )
+ ) ;
## Generate Resistance Report
- rmarkdown::render("report_resistance.Rmd", params = list(\
- blast_id = ${params.blast_resistance_minid} , \
- blast_cov = ${params.blast_resistance_mincov}, \
- amrfinder = "$amrfinder", \
- query = "${prefix}", \
- rgitool = "$rgi", \
- rgiparsed = "$rgi_parsed", \
- rgi_heatmap = "$rgi_heatmap", \
- argminer_blastp = "$argminer_out", \
- resfinder_tab = "$resfinder_tab", \
- resfinder_pointfinder = "$resfinder_point", \
- resfinder_phenotype = "$resfinder_phenotable", \
- generic_annotator = "${generic_annotator}", \
- gff = "$gff")) ;
+ rmarkdown::render("report_resistance.Rmd", \
+ params = list(\
+ blast_id = ${params.blast_resistance_minid} , \
+ blast_cov = ${params.blast_resistance_mincov}, \
+ amrfinder = "$amrfinder", \
+ query = "${prefix}", \
+ rgitool = "$rgi", \
+ rgiparsed = "$rgi_parsed", \
+ rgi_heatmap = "$rgi_heatmap", \
+ argminer_blastp = "$argminer_out", \
+ resfinder_tab = "$resfinder_tab", \
+ resfinder_pointfinder = "$resfinder_point", \
+ resfinder_phenotype = "$resfinder_phenotable", \
+ generic_annotator = "${generic_annotator}", \
+ gff = "$gff"
+ )
+ ) ;
## Generate Virulence Report
rmarkdown::render("report_virulence.Rmd" , \
- params = list( blast_id = ${params.blast_virulence_minid} , \
- blast_cov = ${params.blast_virulence_mincov}, \
- vfdb_blast = "$vfdb_blastn", \
- gff = "$gff", \
- victors_blast = "$victors_blastp", \
- query = "${prefix}")) ;
+ params = list(
+ blast_id = ${params.blast_virulence_minid} , \
+ blast_cov = ${params.blast_virulence_mincov}, \
+ vfdb_blast = "$vfdb_blastn", \
+ gff = "$gff", \
+ victors_blast = "$victors_blastp", \
+ query = "${prefix}"
+ )
+ ) ;
## Generate MGEs report
rmarkdown::render("report_MGEs.Rmd", \
- params = list( blast_id = ${params.blast_MGEs_minid}, \
- blast_cov = ${params.blast_MGEs_mincov}, \
- phigaro_dir = "${params.output}/prophages/phigaro", \
- phigaro_txt = "$phigaro_txt", \
- phispy_tsv = "$phispy_tsv", \
- ice_prot_blast = "$iceberg_blastp", \
- ice_genome_blast = "$iceberg_blastn", \
- plasmid_finder_tab = "$plasmids_tsv", \
- platon_tsv = "$platon_tsv", \
- query = "${prefix}", \
- gi_image = "$gi_image", \
- digis = "$digIS", \
- gff = "$gff", \
- phast_prot_blast = "$phast_blastp" )) ;
+ params = list(
+ blast_id = ${params.blast_MGEs_minid}, \
+ blast_cov = ${params.blast_MGEs_mincov}, \
+ phigaro_dir = "${params.output}/prophages/phigaro", \
+ phigaro_txt = "$phigaro_txt", \
+ phispy_tsv = "$phispy_tsv", \
+ ice_prot_blast = "$iceberg_blastp", \
+ ice_genome_blast = "$iceberg_blastn", \
+ plasmid_finder_tab = "$plasmids_tsv", \
+ platon_tsv = "$platon_tsv", \
+ mobsuite_tsv = "$mobsuite_tsv", \
+ query = "${prefix}", \
+ gi_image = "$gi_image", \
+ digis = "$digIS", \
+ integronfinder = "$integronfinder", \
+ gff = "$gff", \
+ phast_prot_blast = "$phast_blastp"
+ )
+ ) ;
"""
}
diff --git a/modules/generic/sequenceserver.nf b/modules/generic/sequenceserver.nf
index 2663ef97..2048dffe 100644
--- a/modules/generic/sequenceserver.nf
+++ b/modules/generic/sequenceserver.nf
@@ -1,8 +1,7 @@
process SEQUENCESERVER {
publishDir "${params.output}/${prefix}/SequenceServerDBs", mode: 'copy'
tag "${prefix}"
- label = [ 'server', 'process_ultralow' ]
-
+ label = [ 'server', 'process_ultralow' ]
input:
tuple val(prefix), file(genome), file(genes), file(proteins)
diff --git a/modules/generic/summary.nf b/modules/generic/summary.nf
index eda601ad..f443ee8c 100644
--- a/modules/generic/summary.nf
+++ b/modules/generic/summary.nf
@@ -1,22 +1,34 @@
process SUMMARY {
publishDir "${params.output}/${prefix}", mode: 'copy'
tag "${prefix}"
- label = [ 'python', 'process_low' ]
-
+ label = [ 'misc', 'process_low' ]
input:
tuple val(prefix),
- file(annotation), file(stageAs: "results/${prefix}/MLST/*"),
- file(stageAs: "results/${prefix}/rRNA/*"), file(stageAs: "results/${prefix}/*"),
- file(stageAs: "results/${prefix}/plasmids/*"), file(stageAs: "results/${prefix}/plasmids/*"),
- file(stageAs: "results/${prefix}/genomic_islands/*"), file(stageAs: "results/${prefix}/virulence/vfdb/*"),
- file(stageAs: "results/${prefix}/virulence/victors/*"), file(stageAs: "results/${prefix}/prophages/phast_db/*"),
- file(stageAs: "results/${prefix}/prophages/phigaro/*"), file(stageAs: "results/${prefix}/prophages/*"),
- file(stageAs: "results/${prefix}/ICEs/*"), file(stageAs: "results/${prefix}/resistance/AMRFinderPlus/*"),
- file(stageAs: "results/${prefix}/resistance/RGI/*"), file(stageAs: "results/${prefix}/resistance/ARGMiner/*"),
- file(stageAs: "results/${prefix}/resistance/*"), file(stageAs: "results/${prefix}/methylations/*"),
- file(stageAs: "results/${prefix}/refseq_masher/*"), file(stageAs: "results/${prefix}/*"),
- file(stageAs: "results/${prefix}/*"), file(stageAs: "results/${prefix}/gffs/*")
+ file(annotation),
+ file(stageAs: "results/${prefix}/MLST/*"),
+ file(stageAs: "results/${prefix}/rRNA/*"),
+ file(stageAs: "results/${prefix}/*"),
+ file(stageAs: "results/${prefix}/plasmids/*"),
+ file(stageAs: "results/${prefix}/plasmids/*"),
+ file(stageAs: "results/${prefix}/genomic_islands/*"),
+ file(stageAs: "results/${prefix}/virulence/vfdb/*"),
+ file(stageAs: "results/${prefix}/virulence/victors/*"),
+ file(stageAs: "results/${prefix}/prophages/phast_db/*"),
+ file(stageAs: "results/${prefix}/prophages/phigaro/*"),
+ file(stageAs: "results/${prefix}/prophages/*"),
+ file(stageAs: "results/${prefix}/ICEs/*"),
+ file(stageAs: "results/${prefix}/resistance/AMRFinderPlus/*"),
+ file(stageAs: "results/${prefix}/resistance/RGI/*"),
+ file(stageAs: "results/${prefix}/resistance/ARGMiner/*"),
+ file(stageAs: "results/${prefix}/resistance/*"),
+ file(stageAs: "results/${prefix}/methylations/*"),
+ file(stageAs: "results/${prefix}/refseq_masher/*"),
+ file(stageAs: "results/${prefix}/*"),
+ file(stageAs: "results/${prefix}/*"),
+ file(stageAs: "results/${prefix}/gffs/*"),
+ file(stageAs: "results/${prefix}/integron_finder/*"),
+ file(stageAs: "results/${prefix}/plasmids/mob_suite/*")
output:
tuple val(prefix), path("${prefix}_summary.json"), emit: summaries
@@ -25,7 +37,7 @@ process SUMMARY {
"""
mkdir -p results/${prefix}/annotation
ln -rs annotation/* results/${prefix}/annotation
- source activate falmeida-py
+ sed -i 's/s:/:/g' results/${prefix}/annotation/${prefix}.txt
falmeida-py bacannot2json -i results -o ${prefix}_summary.json
"""
}
diff --git a/modules/prophages/phigaro.nf b/modules/prophages/phigaro.nf
index f172cb86..915a2ec7 100644
--- a/modules/prophages/phigaro.nf
+++ b/modules/prophages/phigaro.nf
@@ -4,7 +4,7 @@ process PHIGARO {
else "prophages/phigaro/$filename"
}
tag "${prefix}"
- label = [ 'python', 'process_medium' ]
+ label = [ 'process_medium' ]
input:
tuple val(prefix), file("assembly.fasta")
@@ -18,10 +18,7 @@ process PHIGARO {
path('phigaro_version.txt') , emit: version
script:
- """
- # activate env
- source activate phigaro
-
+ """
# get tool version
phigaro -V > phigaro_version.txt ;
diff --git a/modules/prophages/phispy.nf b/modules/prophages/phispy.nf
index 84b8cab4..ef32fc90 100644
--- a/modules/prophages/phispy.nf
+++ b/modules/prophages/phispy.nf
@@ -5,7 +5,7 @@ process PHISPY {
else null
}
tag "${prefix}"
- label = [ 'python', 'process_medium' ]
+ label = [ 'process_medium' ]
input:
tuple val(prefix), file(input)
diff --git a/modules/resistance/amrfinder.nf b/modules/resistance/amrfinder.nf
index 9aa7c3cc..6f36e228 100644
--- a/modules/resistance/amrfinder.nf
+++ b/modules/resistance/amrfinder.nf
@@ -4,7 +4,7 @@ process AMRFINDER {
else "resistance/AMRFinderPlus/$filename"
}
tag "${prefix}"
- label = [ 'misc', 'process_medium' ]
+ label = [ 'process_medium' ]
input:
tuple val(prefix), file(proteins)
diff --git a/modules/resistance/amrfinder2tsv.nf b/modules/resistance/amrfinder2tsv.nf
index 51a09c37..41f52854 100644
--- a/modules/resistance/amrfinder2tsv.nf
+++ b/modules/resistance/amrfinder2tsv.nf
@@ -1,6 +1,5 @@
process AMRFINDER2TSV {
tag "$prefix"
-
label = [ 'renv', 'process_low' ]
input:
diff --git a/modules/resistance/resfinder.nf b/modules/resistance/resfinder.nf
index 36dbee8d..9b7105e9 100644
--- a/modules/resistance/resfinder.nf
+++ b/modules/resistance/resfinder.nf
@@ -20,14 +20,11 @@ process RESFINDER {
script:
resistance_minid = params.blast_resistance_minid / 100.00
resistance_mincov = params.blast_resistance_mincov / 100.00
- if (resfinder_species.toLowerCase() != "other")
+
"""
# activate env
source activate resfinder
- # Make databases available
- ln -rs ${bacannot_db}/resfinder_db/db_* \$(dirname \$(which run_resfinder.py))
-
# Run resfinder acquired resistance
run_resfinder.py \\
--inputfasta $genome \\
@@ -35,53 +32,39 @@ process RESFINDER {
--species \"${resfinder_species}\" \\
--min_cov ${resistance_mincov} \\
--threshold ${resistance_minid} \\
+ --db_path_point ${bacannot_db}/resfinder_db/db_pointfinder \\
+ --db_path_res ${bacannot_db}/resfinder_db/db_resfinder \\
--acquired ;
# Fix name of pheno table
mv resfinder/pheno_table.txt resfinder/args_pheno_table.txt &> /dev/null ;
# Run resfinder pointfinder resistance
- run_resfinder.py \\
- --inputfasta $genome \\
- -o resfinder \\
- --species \"${resfinder_species}\" \\
- --min_cov ${resistance_mincov} \\
- --threshold ${resistance_minid} \\
- --point ;
-
- # Fix name of pheno table
- mv resfinder/pheno_table.txt resfinder/mutation_pheno_table.txt &> /dev/null ;
-
- # Convert to GFF
- resfinder2gff.py \\
- -i resfinder/ResFinder_results_tab.txt > resfinder/results_tab.gff ;
- """
-
- else if (resfinder_species.toLowerCase() == "other")
- """
- # activate env
- source activate resfinder
+ if [ \"${resfinder_species.toLowerCase()}\" != "other" ]; then
- # Make databases available
- ln -rs ${bacannot_db}/resfinder_db/db_* \$(dirname \$(which run_resfinder.py))
+ run_resfinder.py \\
+ --inputfasta $genome \\
+ -o resfinder \\
+ --species \"${resfinder_species}\" \\
+ --min_cov ${resistance_mincov} \\
+ --threshold ${resistance_minid} \\
+ --db_path_point ${bacannot_db}/resfinder_db/db_pointfinder \\
+ --db_path_res ${bacannot_db}/resfinder_db/db_resfinder \\
+ --point ;
- # Run resfinder acquired resistance
- run_resfinder.py \\
- --inputfasta $genome \\
- -o resfinder \\
- --species \"${resfinder_species}\" \\
- --min_cov ${resistance_mincov} \\
- --threshold ${resistance_minid} \\
- --acquired ;
-
- # Fix name of pheno table
- mv resfinder/pheno_table.txt resfinder/args_pheno_table.txt &> /dev/null ;
-
- # touch pointfinder
- touch resfinder/PointFinder_results.txt ;
+ # Fix name of pheno table
+ mv resfinder/pheno_table.txt resfinder/mutation_pheno_table.txt &> /dev/null ;
+
+ else
+ # touch pointfinder
+ touch resfinder/PointFinder_results.txt ;
+
+ fi
+
# Convert to GFF
resfinder2gff.py \\
-i resfinder/ResFinder_results_tab.txt > resfinder/results_tab.gff ;
"""
+
}
diff --git a/modules/resistance/rgi_annotation.nf b/modules/resistance/rgi_annotation.nf
index cd83eed7..9ae28955 100644
--- a/modules/resistance/rgi_annotation.nf
+++ b/modules/resistance/rgi_annotation.nf
@@ -5,7 +5,7 @@ process CARD_RGI {
else "resistance/RGI/$filename"
}
tag "${prefix}"
- label = [ 'python', 'process_medium' ]
+ label = [ 'process_medium' ]
input:
tuple val(prefix), path(input)
@@ -20,10 +20,7 @@ process CARD_RGI {
path("*_version.txt") , emit: version
script:
- """
- # activate env
- source activate rgi
-
+ """
# load database
rgi load --card_json ${bacannot_db}/card_db/card.json --local
diff --git a/modules/virulence/vfdb2tsv.nf b/modules/virulence/vfdb2tsv.nf
index 595e1548..3a27daa6 100644
--- a/modules/virulence/vfdb2tsv.nf
+++ b/modules/virulence/vfdb2tsv.nf
@@ -1,6 +1,5 @@
process VFDB2TSV {
tag "$prefix"
-
label = [ 'renv', 'process_low' ]
input:
diff --git a/nextflow.config b/nextflow.config
index 45c8494d..11ead557 100644
--- a/nextflow.config
+++ b/nextflow.config
@@ -13,11 +13,12 @@ includeConfig 'conf/defaults.config'
params {
// Boilerplate options
- tracedir = "${params.output}/pipeline_info"
plaintext_email = false
monochrome_logs = false
help = false
get_config = false
+ get_docker_config = false
+ get_singularity_config = false
get_samplesheet = false
validate_params = true
show_hidden_params = false
@@ -82,19 +83,19 @@ process.shell = ['/bin/bash', '-euo', 'pipefail']
def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')
timeline {
enabled = true
- file = "${params.tracedir}/bacannot_timeline_${trace_timestamp}.html"
+ file = "${params.output}/pipeline_info/bacannot_timeline_${trace_timestamp}.html"
}
report {
enabled = true
- file = "${params.tracedir}/bacannot_report_${trace_timestamp}.html"
+ file = "${params.output}/pipeline_info/bacannot_report_${trace_timestamp}.html"
}
trace {
enabled = true
- file = "${params.tracedir}/bacannot_trace_${trace_timestamp}.txt"
+ file = "${params.output}/pipeline_info/bacannot_trace_${trace_timestamp}.txt"
}
dag {
enabled = true
- file = "${params.tracedir}/bacannot_pipeline_dag_${trace_timestamp}.svg"
+ file = "${params.output}/pipeline_info/bacannot_pipeline_dag_${trace_timestamp}.svg"
}
/*
@@ -106,8 +107,8 @@ manifest {
description = "Nextflow pipeline for bacterial genome annotation"
homePage = "https://github.com/fmalmeida/bacannot"
mainScript = "main.nf"
- nextflowVersion = ">=20.10.0"
- version = '3.2'
+ nextflowVersion = "!>=22.10.1"
+ version = '3.3'
}
// Function to ensure that resource requirements don't go beyond
diff --git a/nextflow_schema.json b/nextflow_schema.json
index 31ad9105..e95d3c34 100644
--- a/nextflow_schema.json
+++ b/nextflow_schema.json
@@ -11,10 +11,16 @@
"default": "",
"properties": {
"get_dbs": {
- "type": "boolean"
+ "type": "boolean",
+ "description": "Download and build all the required databases on the fly (get today's version)"
},
"force_update": {
- "type": "boolean"
+ "type": "boolean",
+ "description": "Should we overwriting existing databases if any?"
+ },
+ "get_zenodo_db": {
+ "type": "boolean",
+ "description": "Download latest pre-built databases from Zenodo?"
}
}
},
@@ -28,6 +34,10 @@
"type": "string",
"description": "Path to input samplesheet"
},
+ "enable_deduplication": {
+ "type": "boolean",
+ "description": "Execute deduplication on reads before assembly."
+ },
"output": {
"type": "string",
"description": "Path for output directory",
@@ -53,15 +63,18 @@
"properties": {
"max_cpus": {
"type": "integer",
- "default": 16
+ "default": 16,
+ "description": "Maximum number of cpus a single module can use."
},
"max_memory": {
"type": "string",
- "default": "20.GB"
+ "default": "20.GB",
+ "description": "Maximum memory a single module can use."
},
"max_time": {
"type": "string",
- "default": "40.h"
+ "default": "40.h",
+ "description": "Maximum time a module can run."
}
}
},
@@ -167,7 +180,7 @@
"plasmids_minid": {
"type": "number",
"description": "Identity threshold for plasmid annotation",
- "default": 90.0,
+ "default": 90,
"minimum": 0,
"maximum": 100,
"help_text": "Must be between 0 and 100",
@@ -176,7 +189,7 @@
"plasmids_mincov": {
"type": "number",
"description": "overage threshold for plasmid annotation",
- "default": 60.0,
+ "default": 60,
"minimum": 0,
"maximum": 100,
"help_text": "Must be between 0 and 100",
@@ -185,7 +198,7 @@
"blast_virulence_minid": {
"type": "number",
"description": "Identity threshold for virulence factors annotation",
- "default": 90.0,
+ "default": 90,
"minimum": 0,
"maximum": 100,
"help_text": "Must be between 0 and 100",
@@ -194,7 +207,7 @@
"blast_virulence_mincov": {
"type": "number",
"description": "overage threshold for virulence factors annotation",
- "default": 90.0,
+ "default": 90,
"minimum": 0,
"maximum": 100,
"help_text": "Must be between 0 and 100",
@@ -203,7 +216,7 @@
"blast_resistance_minid": {
"type": "number",
"description": "Identity threshold for resistance genes annotation",
- "default": 90.0,
+ "default": 90,
"minimum": 0,
"maximum": 100,
"help_text": "Must be between 0 and 100",
@@ -212,7 +225,7 @@
"blast_resistance_mincov": {
"type": "number",
"description": "overage threshold for resistance genes annotation",
- "default": 90.0,
+ "default": 90,
"minimum": 0,
"maximum": 100,
"help_text": "Must be between 0 and 100",
@@ -221,7 +234,7 @@
"blast_MGEs_minid": {
"type": "number",
"description": "Identity threshold for ICEs and prophages annotation",
- "default": 85.0,
+ "default": 85,
"minimum": 0,
"maximum": 100,
"help_text": "Must be between 0 and 100",
@@ -230,7 +243,7 @@
"blast_MGEs_mincov": {
"type": "number",
"description": "overage threshold for ICEs and prophages annotation",
- "default": 85.0,
+ "default": 85,
"minimum": 0,
"maximum": 100,
"help_text": "Must be between 0 and 100",
@@ -260,7 +273,7 @@
"blast_custom_minid": {
"type": "number",
"description": "Min. identity % for the annotation using user's custom database",
- "default": 65.0,
+ "default": 65,
"minimum": 0,
"maximum": 100,
"hidden": true
@@ -268,7 +281,7 @@
"blast_custom_mincov": {
"type": "number",
"description": "Min. gene/subject coverage % for the annotation using user's custom database",
- "default": 65.0,
+ "default": 65,
"minimum": 0,
"maximum": 100,
"hidden": true
@@ -292,6 +305,16 @@
"description": "Download template config for parameters",
"fa_icon": "fas fa-question-circle"
},
+ "get_docker_config": {
+ "type": "boolean",
+ "description": "Download template docker config for containers.",
+ "fa_icon": "fas fa-question-circle"
+ },
+ "get_singularity_config": {
+ "type": "boolean",
+ "description": "Download template singularity config for containers.",
+ "fa_icon": "fas fa-question-circle"
+ },
"get_samplesheet": {
"type": "boolean",
"fa_icon": "fas fa-question-circle",
@@ -302,13 +325,6 @@
"help_text": "Number of minimum overlapping base pairs required for merging\nNegative values, such as -20, means the number of required overlapping bases for merging.\nPositive values, such as 5, means the maximum distance accepted between features for merging.\nBy default (if Blank), this process is not executed. For execution the user needs to provide a value",
"description": "Minimum overlapping base pairs required for merging"
},
- "tracedir": {
- "type": "string",
- "description": "Directory to keep pipeline Nextflow logs and reports.",
- "default": "${params.output}/pipeline_info",
- "fa_icon": "fas fa-cogs",
- "hidden": true
- },
"validate_params": {
"type": "boolean",
"description": "Boolean whether to validate parameters against the schema at runtime",
@@ -326,17 +342,17 @@
"unicycler_version": {
"type": "string",
"description": "Select quay.io image tag for tool",
- "default": "0.4.8--py38h8162308_3"
+ "default": "0.5.0--py310h6cc9453_3"
},
"flye_version": {
"type": "string",
"description": "Select quay.io image tag for tool",
- "default": "2.9--py39h39abbe0_0"
+ "default": "2.9--py39h6935b12_1"
},
"bakta_version": {
"type": "string",
"description": "Select quay.io image tag for tool",
- "default": "1.6.1--pyhdfd78af_0"
+ "default": "1.7.0--pyhdfd78af_1"
}
}
},
diff --git a/workflows/bacannot.nf b/workflows/bacannot.nf
index 57abf5d9..93087adf 100644
--- a/workflows/bacannot.nf
+++ b/workflows/bacannot.nf
@@ -14,12 +14,15 @@ include { KOFAMSCAN } from '../modules/KOs/kofamscan'
include { KEGG_DECODER } from '../modules/KOs/kegg-decoder'
include { PLASMIDFINDER } from '../modules/MGEs/plasmidfinder'
include { PLATON } from '../modules/MGEs/platon'
+include { MOBSUITE } from '../modules/MGEs/mob_suite'
include { VFDB } from '../modules/virulence/vfdb'
include { VICTORS } from '../modules/virulence/victors'
include { PHAST } from '../modules/prophages/phast'
include { PHIGARO } from '../modules/prophages/phigaro'
include { PHISPY } from '../modules/prophages/phispy'
include { ICEBERG } from '../modules/MGEs/iceberg'
+include { INTEGRON_FINDER } from '../modules/MGEs/integron_finder'
+include { INTEGRON_FINDER_2GFF } from '../modules/MGEs/integron_finder_2gff'
include { ISLANDPATH } from '../modules/MGEs/islandpath'
include { DRAW_GIS } from '../modules/MGEs/draw_gis'
include { DIGIS } from '../modules/MGEs/digIS'
@@ -120,16 +123,26 @@ workflow BACANNOT {
PLATON( annotation_out_ch.genome, dbs_ch )
platon_output_ch = PLATON.out.results
platon_all_ch = PLATON.out.all
+ // mob suite
+ MOBSUITE( annotation_out_ch.genome )
+ mobsuite_output_ch = MOBSUITE.out.results
} else {
plasmidfinder_all_ch = Channel.empty()
plasmidfinder_output_ch = Channel.empty()
platon_output_ch = Channel.empty()
platon_all_ch = Channel.empty()
+ mobsuite_output_ch = Channel.empty()
}
+ // TODO: Maybe add in MGE optional?
+
// IslandPath software
ISLANDPATH( annotation_out_ch.gbk )
+ // Integron_finder software
+ INTEGRON_FINDER( annotation_out_ch.genome )
+ INTEGRON_FINDER_2GFF( INTEGRON_FINDER.out.gbk )
+
// Virulence search
if (params.skip_virulence_search == false) {
// VFDB
@@ -286,7 +299,8 @@ workflow BACANNOT {
.join(iceberg_output_blastp_ch, remainder: true)
.join(phast_output_ch, remainder: true)
.join(DIGIS.out.gff, remainder: true)
- .join(ch_custom_annotations, remainder: true)
+ .join(ch_custom_annotations, remainder: true)
+ .join(INTEGRON_FINDER_2GFF.out.gff, remainder: true)
)
/*
@@ -326,6 +340,7 @@ workflow BACANNOT {
.join( MERGE_ANNOTATIONS.out.digis_gff )
.join( antismash_output_ch, remainder: true )
.join( MERGE_ANNOTATIONS.out.customdb_gff.groupTuple(), remainder: true )
+ .join( INTEGRON_FINDER_2GFF.out.gff, remainder: true )
)
// Render reports
@@ -357,9 +372,11 @@ workflow BACANNOT {
.join( iceberg_output_blastn_ch, remainder: true )
.join( plasmidfinder_output_ch, remainder: true )
.join( platon_output_ch, remainder: true )
+ .join( mobsuite_output_ch, remainder: true )
.join( DRAW_GIS.out.example, remainder: true )
.join( phast_output_ch, remainder: true )
.join( MERGE_ANNOTATIONS.out.digis_gff )
+ .join( INTEGRON_FINDER_2GFF.out.gff, remainder: true )
)
//
@@ -367,27 +384,29 @@ workflow BACANNOT {
//
SUMMARY(
annotation_out_ch.all
- .join( MLST.out.all , remainder: true )
- .join( BARRNAP.out.all , remainder: true )
- .join( kofamscan_all_ch , remainder: true )
- .join( plasmidfinder_all_ch , remainder: true )
- .join( platon_all_ch , remainder: true )
- .join( ISLANDPATH.out.results , remainder: true )
- .join( vfdb_all_ch , remainder: true )
- .join( victors_all_ch , remainder: true )
- .join( phast_all_ch , remainder: true )
- .join( phigaro_all_ch , remainder: true )
- .join( phispy_all_ch , remainder: true )
- .join( iceberg_all_ch , remainder: true )
- .join( amrfinder_all_ch , remainder: true )
- .join( rgi_all_ch , remainder: true )
- .join( argminer_all_ch , remainder: true )
- .join( resfinder_all_ch , remainder: true )
- .join( CALL_METHYLATION.out.all , remainder: true )
- .join( REFSEQ_MASHER.out.results, remainder: true )
- .join( DIGIS.out.all , remainder: true )
- .join( antismash_all_ch , remainder: true )
- .join( MERGE_ANNOTATIONS.out.all, remainder: true )
+ .join( MLST.out.all , remainder: true )
+ .join( BARRNAP.out.all , remainder: true )
+ .join( kofamscan_all_ch , remainder: true )
+ .join( plasmidfinder_all_ch , remainder: true )
+ .join( platon_all_ch , remainder: true )
+ .join( ISLANDPATH.out.results , remainder: true )
+ .join( vfdb_all_ch , remainder: true )
+ .join( victors_all_ch , remainder: true )
+ .join( phast_all_ch , remainder: true )
+ .join( phigaro_all_ch , remainder: true )
+ .join( phispy_all_ch , remainder: true )
+ .join( iceberg_all_ch , remainder: true )
+ .join( amrfinder_all_ch , remainder: true )
+ .join( rgi_all_ch , remainder: true )
+ .join( argminer_all_ch , remainder: true )
+ .join( resfinder_all_ch , remainder: true )
+ .join( CALL_METHYLATION.out.all , remainder: true )
+ .join( REFSEQ_MASHER.out.results , remainder: true )
+ .join( DIGIS.out.all , remainder: true )
+ .join( antismash_all_ch , remainder: true )
+ .join( MERGE_ANNOTATIONS.out.all , remainder: true )
+ .join( INTEGRON_FINDER_2GFF.out.gff, remainder: true )
+ .join( mobsuite_output_ch , remainder: true )
)
MERGE_SUMMARIES(
SUMMARY.out.summaries.map{ it[1] }.collect()
diff --git a/workflows/bacannot_dbs.nf b/workflows/bacannot_dbs.nf
index c44d4d36..ca6c76e3 100644
--- a/workflows/bacannot_dbs.nf
+++ b/workflows/bacannot_dbs.nf
@@ -16,6 +16,7 @@ include { ICEBERG_DB } from '../modules/bacannot_dbs/iceberg.nf'
include { PHAST_DB } from '../modules/bacannot_dbs/phast.nf'
include { KOFAMSCAN_DB } from '../modules/bacannot_dbs/kofamscan.nf'
include { ANTISMASH_DB } from '../modules/bacannot_dbs/antismash.nf'
+include { GET_ZENODO_DB } from '../modules/bacannot_dbs/get_zenodo.nf'
/*
DEF WORKFLOW
@@ -23,21 +24,25 @@ include { ANTISMASH_DB } from '../modules/bacannot_dbs/antismash.nf'
workflow CREATE_DBS {
- download_db("prokka", "PROKKA_DB")
- download_db("mlst", "MLST_DB")
- download_db("kofamscan", "KOFAMSCAN_DB")
- download_db("card", "CARD_DB")
- download_db("resfinder", "RESFINDER_DB")
- download_db("amrfinder", "AMRFINDER_DB")
- download_db("argminer", "ARGMINER_DB")
- download_db("platon", "PLATON_DB")
- download_db("plasmidfinder", "PLASMIDFINDER_DB")
- download_db("phigaro", "PHIGARO_DB")
- download_db("phast", "PHAST_DB")
- download_db("vfdb", "VFDB_DB")
- download_db("victors", "VICTORS_DB")
- download_db("iceberg", "ICEBERG_DB")
- download_db("antismash", "ANTISMASH_DB")
+ if ( params.get_dbs && !params.get_zenodo_db ) {
+ download_db("prokka", "PROKKA_DB")
+ download_db("mlst", "MLST_DB")
+ download_db("kofamscan", "KOFAMSCAN_DB")
+ download_db("card", "CARD_DB")
+ download_db("resfinder", "RESFINDER_DB")
+ download_db("amrfinder", "AMRFINDER_DB")
+ download_db("argminer", "ARGMINER_DB")
+ download_db("platon", "PLATON_DB")
+ download_db("plasmidfinder", "PLASMIDFINDER_DB")
+ download_db("phigaro", "PHIGARO_DB")
+ download_db("phast", "PHAST_DB")
+ download_db("vfdb", "VFDB_DB")
+ download_db("victors", "VICTORS_DB")
+ download_db("iceberg", "ICEBERG_DB")
+ download_db("antismash", "ANTISMASH_DB")
+ } else if ( !params.get_dbs && params.get_zenodo_db ) {
+ GET_ZENODO_DB()
+ }
}