Skip to content

Commit

Permalink
Merge pull request #126 from fmalmeida/dev
Browse files Browse the repository at this point in the history
Wrap up release v3.4.0
  • Loading branch information
fmalmeida authored Jul 7, 2024
2 parents e6fa674 + 66c2650 commit 4c068b6
Show file tree
Hide file tree
Showing 30 changed files with 879 additions and 1,491 deletions.
2 changes: 1 addition & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"description": "<p>The pipeline</p>\n\n<p>bacannot, is a customisable, easy to use, pipeline that uses state-of-the-art software for comprehensively annotating prokaryotic genomes having only Docker and Nextflow as dependencies. It is able to annotate and detect virulence and resistance genes, plasmids, secondary metabolites, genomic islands, prophages, ICEs, KO, and more, while providing nice an beautiful interactive documents for results exploration.</p>",
"license": "other-open",
"title": "fmalmeida/bacannot: A generic but comprehensive bacterial annotation pipeline",
"version": "v3.3.3",
"version": "v3.4.0",
"upload_type": "software",
"creators": [
{
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Its main steps are:
| Analysis steps | Used software or databases |
| :------------- | :------------------------- |
| Genome assembly (if raw reads are given) | [Flye](https://github.com/fenderglass/Flye) and [Unicycler](https://github.com/rrwick/Unicycler) |
| Identification of closest 10 NCBI Refseq genomes | [RefSeq Masher](https://github.com/phac-nml/refseq_masher) |
| Identification of closest 10 NCBI Refseq genomes and comparison of genomes | [RefSeq Masher](https://github.com/phac-nml/refseq_masher) and [Sourmash](https://sourmash.readthedocs.io/en/latest/) |
| Generic annotation and gene prediction | [Prokka](https://github.com/tseemann/prokka) or [Bakta](https://github.com/oschwengers/bakta) |
| rRNA prediction | [barrnap](https://github.com/tseemann/barrnap) |
| Classification within multi-locus sequence types (STs) | [mlst](https://github.com/tseemann/mlst) |
Expand Down
14 changes: 14 additions & 0 deletions conf/defaults.config
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,9 @@ params {
// (NOT RUN?) antiSMASH (secondary metabolite) annotation
skip_antismash = false

// (NOT RUN?) sourmash
skip_sourmash = false

// (NOT RUN?) integron finder tool
skip_integron_finder = false

Expand Down Expand Up @@ -178,6 +181,17 @@ params {
// User's custom database coverage threshold
blast_custom_mincov = 65

/*
* Sourmash configuration
*/
// kmer size (21, 31 or 51)
sourmash_kmer = 31

// scale, e.g. a scale 1000 on a 5Mb genome will generate 5000 hashes
// 1000 is generally recommended by the tool's developers
sourmash_scale = 1000


/*
* Resources allocation configuration
* Defaults only, expecting to be overwritten
Expand Down
8 changes: 6 additions & 2 deletions conf/docker.config
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ process {
// Custom pipeline's containers with various tools for general purposes
//
withLabel: 'db_download|db_tools|misc' {
container = 'fmalmeida/bacannot@sha256:bdb31637cacf99736656ab3b69f1f01ba1b5eb026771d5c266b4c84e96153057'
container = 'fmalmeida/bacannot@sha256:5c6f105157d30fe9a6ca1ad41fe884e75a29e6bd23ddb2e4fc06dd3d05854cd2'
}

// container for R tools
withLabel: 'renv' {
container = 'fmalmeida/bacannot@sha256:952f58a2c03e50f8a376073346fb1ccda28d6249e3fdfea07a3286a6ff1adf0c'
container = 'fmalmeida/bacannot@sha256:23a0713d3694a10ee4c570a4e65a471045781a73711495aa08ae7d40f9b65097'
}

// container for bacannot server
Expand Down Expand Up @@ -113,5 +113,9 @@ process {
container = "quay.io/biocontainers/rgi:5.2.1--pyhdfd78af_1"
}

withName: 'SOURMASH_LCA|SOURMASH_ALL' {
container = "quay.io/biocontainers/sourmash:4.8.2--hdfd78af_0"
}

}

8 changes: 6 additions & 2 deletions conf/singularity.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ process {
// Custom pipeline's containers with various tools for general purposes
//
withLabel: 'db_download|db_tools|misc' {
container = 'docker://fmalmeida/bacannot@sha256:bdb31637cacf99736656ab3b69f1f01ba1b5eb026771d5c266b4c84e96153057'
container = 'docker://fmalmeida/bacannot@sha256:5c6f105157d30fe9a6ca1ad41fe884e75a29e6bd23ddb2e4fc06dd3d05854cd2'
}

// container for R tools
withLabel: 'renv' {
container = 'docker://fmalmeida/bacannot@sha256:952f58a2c03e50f8a376073346fb1ccda28d6249e3fdfea07a3286a6ff1adf0c'
container = 'docker://fmalmeida/bacannot@sha256:23a0713d3694a10ee4c570a4e65a471045781a73711495aa08ae7d40f9b65097'
}

// container for bacannot server
Expand Down Expand Up @@ -117,5 +117,9 @@ process {
container = "https://depot.galaxyproject.org/singularity/rgi:5.2.1--pyhdfd78af_1"
}

withName: 'SOURMASH_LCA|SOURMASH_ALL' {
container = "https://depot.galaxyproject.org/singularity/sourmash:4.8.2--hdfd78af_0"
}

}

3 changes: 3 additions & 0 deletions docker/misc/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -93,5 +93,8 @@ RUN python3 -m pip install cryptography==38.0.4 'biopython==1.83' 'matplotlib==3
# install get zenodo
RUN pip3 install zenodo_get

# install unzip
RUN apt-get install -y unzip

# fix permissions
RUN chmod 777 -R /work
3 changes: 1 addition & 2 deletions docker/renv/build.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
source ../set_version.sh
../../bin/build_image.sh $NEW_VERSION
../../bin/build_image.sh $1
11 changes: 11 additions & 0 deletions docker/renv/reports/report_general.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ params:
barrnap:
mlst:
refseq_masher:
sourmash_png:
query:
output:
bookdown::html_document2:
Expand Down Expand Up @@ -58,6 +59,13 @@ if (file.exists(params$kegg)) {
kegg_not_null <- FALSE
}
if (file.exists(params$sourmash_png)) {
sourmash_not_null <- TRUE
sourmash_png <- params$sourmash_png
} else {
sourmash_not_null <- FALSE
}
if (params$generic_annotator == "prokka") {
annotator_url <- "https://github.com/tseemann/prokka"
prokka_not_null <- TRUE
Expand Down Expand Up @@ -166,3 +174,6 @@ datatable(barrnap_gff,

```{r kegg_svg, echo=FALSE, results='asis', eval=kegg_not_null, child='yes_kegg.Rmd'}
```

```{r sourmash_svg, echo=FALSE, results='asis', eval=sourmash_not_null, child='yes_sourmash.Rmd'}
```
13 changes: 13 additions & 0 deletions docker/renv/reports/yes_sourmash.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## Sourmash

[Sourmash](https://sourmash.readthedocs.io/en/latest/) is a command-line tool and Python/Rust library for metagenome analysis and genome comparison using k-mers. It supports the compositional analysis of metagenomes, rapid search of large sequence databases, and flexible taxonomic profiling with both NCBI and GTDB taxonomies. Sourmash works well with sequences 30kb or larger, including bacterial and viral genomes.

In Bacannot, the sourmash tool was used for performing genome comparison and dendogram plot with all the genomes given as input, plus, all the 10 first genomes identified as closest to each genome based on refseq_masher results.

> Duplicate genomes were removed (same genome is closest to multiple inputs).
>
> The sourmash genome comparison results, and the compositional data of each sample is given as output, so that users can further utilize them to make customised sourmash plots as described in their documentation.
```{r, out.width='100%', fig.cap='Sourmash genome comparison', fig.align='center'}
include_graphics(params$sourmash_png)
```
20 changes: 20 additions & 0 deletions docs/defaults.config
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,15 @@ params {
// (NOT RUN?) antiSMASH (secondary metabolite) annotation
skip_antismash = false

// (NOT RUN?) sourmash
skip_sourmash = false

// (NOT RUN?) integron finder tool
skip_integron_finder = false

// (NOT RUN?) CIRCOS tool
skip_circos = false

/*
* Custom databases can be used to annotate additional genes in the genome.
* It runs a BLAST alignment against the genome, therefore, the custom database
Expand Down Expand Up @@ -172,6 +181,17 @@ params {
// User's custom database coverage threshold
blast_custom_mincov = 65

/*
* Sourmash configuration
*/
// kmer size (21, 31 or 51)
sourmash_kmer = 31

// scale, e.g. a scale 1000 on a 5Mb genome will generate 5000 hashes
// 1000 is generally recommended by the tool's developers
sourmash_scale = 1000


/*
* Resources allocation configuration
* Defaults only, expecting to be overwritten
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The pipeline's main steps are:
| Analysis steps | Used software or databases |
| :------------- | :------------------------- |
| Genome assembly (if raw reads are given) | [Flye](https://github.com/fenderglass/Flye) and [Unicycler](https://github.com/rrwick/Unicycler) |
| Identification of closest 10 NCBI Refseq genomes | [RefSeq Masher](https://github.com/phac-nml/refseq_masher) |
| Identification of closest 10 NCBI Refseq genomes and comparison of genomes | [RefSeq Masher](https://github.com/phac-nml/refseq_masher) and [Sourmash](https://sourmash.readthedocs.io/en/latest/) |
| Generic annotation and gene prediction | [Prokka](https://github.com/tseemann/prokka) or [Bakta](https://github.com/oschwengers/bakta) |
| rRNA prediction | [barrnap](https://github.com/tseemann/barrnap) |
| Classification within multi-locus sequence types (STs) | [mlst](https://github.com/tseemann/mlst) |
Expand Down
10 changes: 10 additions & 0 deletions docs/manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,15 @@ The use of this parameter sets a default value for input samples. If a sample ha
| :--------------------------------------- | :------- | :------ | :---------- |
| `--resfinder_species` | :material-close: | NA | Resfinder species panel. It activates the resfinder annotation process using the given species panel. Check the available species at [their main page](https://cge.cbs.dtu.dk/services/ResFinder/) and in [their repository page](https://bitbucket.org/genomicepidemiology/resfinder/src/master/#usage). If your species is not available in Resfinder panels, you may use it with the "Other" panel (`--resfinder_species "Other"`). |

## Sourmash comparison

The parameteers below, configure how [sourmash](https://sourmash.readthedocs.io/en/latest/) is executed in the pipeline. They are relatively simple, and have sensible defaults.

| <div style="width:160px">Parameter</div> | Required | Default | Description |
| :--------------------------------------- | :------- | :------ | :---------- |
| `--sourmash_kmer` | :material-close: | 31 | Kmer size for sourmash genome comparison |
| `--sourmash_scale` | :material-close: | 1000 | Scale for for sourmash genome comparison. A scale 1000 on a 5Mb genome will generate 5000 hashes. 1000 is generally recommended by the tool's developers |

## On/Off processes

| <div style="width:180px">Parameter</div> | Required | Default | Description |
Expand All @@ -96,6 +105,7 @@ The use of this parameter sets a default value for input samples. If a sample ha
| `--skip_prophage_search` | :material-close: | false | Tells whether not to run prophage annotation modules |
| `--skip_kofamscan` | :material-close: | false | Tells whether not to run KEGG orthology (KO) annotation with KofamScan |
| `--skip_antismash` | :material-close: | false | Tells whether or not to run antiSMASH (secondary metabolite) annotation. AntiSMASH is executed using only its core annotation modules in order to keep it fast. |
| `--skip_sourmash` | :material-close: | false | Tells whether or not to run sourmash to compare input genomes and closest reference genomes |
| `--skip_circos` | :material-close: | false | Tells whether or not to run the final `CIRCOS` module. When the input genome has many contigs, its results are not meaningful. |
| `--skip_integron_finder` | :material-close: | false | Tells whether or not to run the integron finder tool. |

Expand Down
2 changes: 2 additions & 0 deletions docs/outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ After a successful execution, you will have something like this:
# Directory tree from the running dir
.
├── _ANNOTATION
│   └── sourmash_all # Sourmash results of genome comparison of all input genomes and all identified references
| └── ecoli_ref.fna
│   └── ecoli
│   ├── assembly # Assembly files (when raw reads are given)
Expand All @@ -38,6 +39,7 @@ After a successful execution, you will have something like this:
|   ├── resistance # AMR annotation results from ARGminer, AMRFinderPlus, RGI and Resfinder
|   ├── rRNA # barrnap annotation results
|   ├── SequenceServerDBs # SequenceServer pre-formatted databases to be used with SequenceServer blast application
|   ├── sourmash # Sourmash summary and signature file for the specific sample
|   ├── SQLdb # The SQLdb of the annotation used by the shiny server for rapid parsing
|   ├── tools_versioning # Versions of tools and databases used (whenever available)
|   ├── virulence # Virulence genes annotation results from Victors and VFDB databases
Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Bacannot databases are not inside the docker images anymore to avoid huge images

#### Pre-formatted

Users can directly download pre-formatted databases from Zenodo: https://doi.org/10.5281/zenodo.7615811
Users can directly download pre-formatted databases from Zenodo: <https://doi.org/10.5281/zenodo.7615811>

Useful for standardization and also overcoming known issues that may arise when formatting databases with `singularity` profile.

Expand Down
Loading

0 comments on commit 4c068b6

Please sign in to comment.