bacannot, is a customisable, easy to use, pipeline that uses state-of-the-art software for comprehensively annotating prokaryotic genomes having only Docker and Nextflow as dependencies. It is able to annotate and detect virulence and resistance genes, plasmids, secondary metabolites, genomic islands, prophages, ICEs, KO, and more, while providing nice an beautiful interactive documents for results exploration.
",
"license": "other-open",
"title": "fmalmeida/bacannot: A generic but comprehensive bacterial annotation pipeline",
- "version": "v3.3.3",
+ "version": "v3.4.0",
"upload_type": "software",
"creators": [
{
diff --git a/README.md b/README.md
index 86868a3..3360952 100644
--- a/README.md
+++ b/README.md
@@ -37,7 +37,7 @@ Its main steps are:
| Analysis steps | Used software or databases |
| :------------- | :------------------------- |
| Genome assembly (if raw reads are given) | [Flye](https://github.com/fenderglass/Flye) and [Unicycler](https://github.com/rrwick/Unicycler) |
-| Identification of closest 10 NCBI Refseq genomes | [RefSeq Masher](https://github.com/phac-nml/refseq_masher) |
+| Identification of closest 10 NCBI Refseq genomes and comparison of genomes | [RefSeq Masher](https://github.com/phac-nml/refseq_masher) and [Sourmash](https://sourmash.readthedocs.io/en/latest/) |
| Generic annotation and gene prediction | [Prokka](https://github.com/tseemann/prokka) or [Bakta](https://github.com/oschwengers/bakta) |
| rRNA prediction | [barrnap](https://github.com/tseemann/barrnap) |
| Classification within multi-locus sequence types (STs) | [mlst](https://github.com/tseemann/mlst) |
diff --git a/conf/defaults.config b/conf/defaults.config
index 4bce1ca..e8c5f69 100644
--- a/conf/defaults.config
+++ b/conf/defaults.config
@@ -122,6 +122,9 @@ params {
// (NOT RUN?) antiSMASH (secondary metabolite) annotation
skip_antismash = false
+// (NOT RUN?) sourmash
+ skip_sourmash = false
+
// (NOT RUN?) integron finder tool
skip_integron_finder = false
@@ -178,6 +181,17 @@ params {
// User's custom database coverage threshold
blast_custom_mincov = 65
+ /*
+ * Sourmash configuration
+ */
+// kmer size (21, 31 or 51)
+ sourmash_kmer = 31
+
+// scale, e.g. a scale 1000 on a 5Mb genome will generate 5000 hashes
+// 1000 is generally recommended by the tool's developers
+ sourmash_scale = 1000
+
+
/*
* Resources allocation configuration
* Defaults only, expecting to be overwritten
diff --git a/conf/docker.config b/conf/docker.config
index fe8600f..7de1e86 100644
--- a/conf/docker.config
+++ b/conf/docker.config
@@ -19,12 +19,12 @@ process {
// Custom pipeline's containers with various tools for general purposes
//
withLabel: 'db_download|db_tools|misc' {
- container = 'fmalmeida/bacannot@sha256:bdb31637cacf99736656ab3b69f1f01ba1b5eb026771d5c266b4c84e96153057'
+ container = 'fmalmeida/bacannot@sha256:5c6f105157d30fe9a6ca1ad41fe884e75a29e6bd23ddb2e4fc06dd3d05854cd2'
}
// container for R tools
withLabel: 'renv' {
- container = 'fmalmeida/bacannot@sha256:952f58a2c03e50f8a376073346fb1ccda28d6249e3fdfea07a3286a6ff1adf0c'
+ container = 'fmalmeida/bacannot@sha256:23a0713d3694a10ee4c570a4e65a471045781a73711495aa08ae7d40f9b65097'
}
// container for bacannot server
@@ -113,5 +113,9 @@ process {
container = "quay.io/biocontainers/rgi:5.2.1--pyhdfd78af_1"
}
+ withName: 'SOURMASH_LCA|SOURMASH_ALL' {
+ container = "quay.io/biocontainers/sourmash:4.8.2--hdfd78af_0"
+ }
+
}
diff --git a/conf/singularity.config b/conf/singularity.config
index e57ebea..f05eb5d 100644
--- a/conf/singularity.config
+++ b/conf/singularity.config
@@ -22,12 +22,12 @@ process {
// Custom pipeline's containers with various tools for general purposes
//
withLabel: 'db_download|db_tools|misc' {
- container = 'docker://fmalmeida/bacannot@sha256:bdb31637cacf99736656ab3b69f1f01ba1b5eb026771d5c266b4c84e96153057'
+ container = 'docker://fmalmeida/bacannot@sha256:5c6f105157d30fe9a6ca1ad41fe884e75a29e6bd23ddb2e4fc06dd3d05854cd2'
}
// container for R tools
withLabel: 'renv' {
- container = 'docker://fmalmeida/bacannot@sha256:952f58a2c03e50f8a376073346fb1ccda28d6249e3fdfea07a3286a6ff1adf0c'
+ container = 'docker://fmalmeida/bacannot@sha256:23a0713d3694a10ee4c570a4e65a471045781a73711495aa08ae7d40f9b65097'
}
// container for bacannot server
@@ -117,5 +117,9 @@ process {
container = "https://depot.galaxyproject.org/singularity/rgi:5.2.1--pyhdfd78af_1"
}
+ withName: 'SOURMASH_LCA|SOURMASH_ALL' {
+ container = "https://depot.galaxyproject.org/singularity/sourmash:4.8.2--hdfd78af_0"
+ }
+
}
diff --git a/docker/misc/Dockerfile b/docker/misc/Dockerfile
index 2d6f10f..35e1a97 100644
--- a/docker/misc/Dockerfile
+++ b/docker/misc/Dockerfile
@@ -93,5 +93,8 @@ RUN python3 -m pip install cryptography==38.0.4 'biopython==1.83' 'matplotlib==3
# install get zenodo
RUN pip3 install zenodo_get
+# install unzip
+RUN apt-get install -y unzip
+
# fix permissions
RUN chmod 777 -R /work
diff --git a/docker/renv/build.sh b/docker/renv/build.sh
index a7bc212..51153a5 100644
--- a/docker/renv/build.sh
+++ b/docker/renv/build.sh
@@ -1,2 +1 @@
-source ../set_version.sh
-../../bin/build_image.sh $NEW_VERSION
+../../bin/build_image.sh $1
diff --git a/docker/renv/reports/report_general.Rmd b/docker/renv/reports/report_general.Rmd
index 0651624..54222a7 100644
--- a/docker/renv/reports/report_general.Rmd
+++ b/docker/renv/reports/report_general.Rmd
@@ -9,6 +9,7 @@ params:
barrnap:
mlst:
refseq_masher:
+ sourmash_png:
query:
output:
bookdown::html_document2:
@@ -58,6 +59,13 @@ if (file.exists(params$kegg)) {
kegg_not_null <- FALSE
}
+if (file.exists(params$sourmash_png)) {
+ sourmash_not_null <- TRUE
+ sourmash_png <- params$sourmash_png
+} else {
+ sourmash_not_null <- FALSE
+}
+
if (params$generic_annotator == "prokka") {
annotator_url <- "https://github.com/tseemann/prokka"
prokka_not_null <- TRUE
@@ -166,3 +174,6 @@ datatable(barrnap_gff,
```{r kegg_svg, echo=FALSE, results='asis', eval=kegg_not_null, child='yes_kegg.Rmd'}
```
+
+```{r sourmash_svg, echo=FALSE, results='asis', eval=sourmash_not_null, child='yes_sourmash.Rmd'}
+```
diff --git a/docker/renv/reports/yes_sourmash.Rmd b/docker/renv/reports/yes_sourmash.Rmd
new file mode 100644
index 0000000..d75e11a
--- /dev/null
+++ b/docker/renv/reports/yes_sourmash.Rmd
@@ -0,0 +1,13 @@
+## Sourmash
+
+[Sourmash](https://sourmash.readthedocs.io/en/latest/) is a command-line tool and Python/Rust library for metagenome analysis and genome comparison using k-mers. It supports the compositional analysis of metagenomes, rapid search of large sequence databases, and flexible taxonomic profiling with both NCBI and GTDB taxonomies. Sourmash works well with sequences 30kb or larger, including bacterial and viral genomes.
+
+In Bacannot, the sourmash tool was used for performing genome comparison and dendogram plot with all the genomes given as input, plus, all the 10 first genomes identified as closest to each genome based on refseq_masher results.
+
+> Duplicate genomes were removed (same genome is closest to multiple inputs).
+>
+> The sourmash genome comparison results, and the compositional data of each sample is given as output, so that users can further utilize them to make customised sourmash plots as described in their documentation.
+
+```{r, out.width='100%', fig.cap='Sourmash genome comparison', fig.align='center'}
+include_graphics(params$sourmash_png)
+```
\ No newline at end of file
diff --git a/docs/defaults.config b/docs/defaults.config
index 0d43fe4..e8c5f69 100644
--- a/docs/defaults.config
+++ b/docs/defaults.config
@@ -122,6 +122,15 @@ params {
// (NOT RUN?) antiSMASH (secondary metabolite) annotation
skip_antismash = false
+// (NOT RUN?) sourmash
+ skip_sourmash = false
+
+// (NOT RUN?) integron finder tool
+ skip_integron_finder = false
+
+// (NOT RUN?) CIRCOS tool
+ skip_circos = false
+
/*
* Custom databases can be used to annotate additional genes in the genome.
* It runs a BLAST alignment against the genome, therefore, the custom database
@@ -172,6 +181,17 @@ params {
// User's custom database coverage threshold
blast_custom_mincov = 65
+ /*
+ * Sourmash configuration
+ */
+// kmer size (21, 31 or 51)
+ sourmash_kmer = 31
+
+// scale, e.g. a scale 1000 on a 5Mb genome will generate 5000 hashes
+// 1000 is generally recommended by the tool's developers
+ sourmash_scale = 1000
+
+
/*
* Resources allocation configuration
* Defaults only, expecting to be overwritten
diff --git a/docs/index.md b/docs/index.md
index 4cb28ba..3d1b969 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -23,7 +23,7 @@ The pipeline's main steps are:
| Analysis steps | Used software or databases |
| :------------- | :------------------------- |
| Genome assembly (if raw reads are given) | [Flye](https://github.com/fenderglass/Flye) and [Unicycler](https://github.com/rrwick/Unicycler) |
-| Identification of closest 10 NCBI Refseq genomes | [RefSeq Masher](https://github.com/phac-nml/refseq_masher) |
+| Identification of closest 10 NCBI Refseq genomes and comparison of genomes | [RefSeq Masher](https://github.com/phac-nml/refseq_masher) and [Sourmash](https://sourmash.readthedocs.io/en/latest/) |
| Generic annotation and gene prediction | [Prokka](https://github.com/tseemann/prokka) or [Bakta](https://github.com/oschwengers/bakta) |
| rRNA prediction | [barrnap](https://github.com/tseemann/barrnap) |
| Classification within multi-locus sequence types (STs) | [mlst](https://github.com/tseemann/mlst) |
diff --git a/docs/manual.md b/docs/manual.md
index 1a2b1df..fdc8095 100644
--- a/docs/manual.md
+++ b/docs/manual.md
@@ -85,6 +85,15 @@ The use of this parameter sets a default value for input samples. If a sample ha
| :--------------------------------------- | :------- | :------ | :---------- |
| `--resfinder_species` | :material-close: | NA | Resfinder species panel. It activates the resfinder annotation process using the given species panel. Check the available species at [their main page](https://cge.cbs.dtu.dk/services/ResFinder/) and in [their repository page](https://bitbucket.org/genomicepidemiology/resfinder/src/master/#usage). If your species is not available in Resfinder panels, you may use it with the "Other" panel (`--resfinder_species "Other"`). |
+## Sourmash comparison
+
+The parameteers below, configure how [sourmash](https://sourmash.readthedocs.io/en/latest/) is executed in the pipeline. They are relatively simple, and have sensible defaults.
+
+|
Parameter
| Required | Default | Description |
+| :--------------------------------------- | :------- | :------ | :---------- |
+| `--sourmash_kmer` | :material-close: | 31 | Kmer size for sourmash genome comparison |
+| `--sourmash_scale` | :material-close: | 1000 | Scale for for sourmash genome comparison. A scale 1000 on a 5Mb genome will generate 5000 hashes. 1000 is generally recommended by the tool's developers |
+
## On/Off processes
|