-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Pipeline remodelling. Issues 24, 31 and 36 (#44)
* Split out the config profiles (#43) * Split out the config profiles * Resolve the conflict using the upstream config * trying to have a create dbs process * found way to download dbs * adding resfinder and plasmidfinder download rules * added phigaro, vfdb and amrfinder download rules * added last database download rules * added prokka HMM download rule * added label to use docker for tools * fixed argminer download * fixed victors download * fixed iceberg download * prokka using given database and bioconda image * docker specific for downloading databases * update packages * added mlst database download rule * add bacannot db info * also downloads PGAP db * added prokka and mlst * fix conditional * fixed pgap conditional * added barrnap * added 'compute_gc' module -- think about modules with two tools * adding identation * fixing label * first tries on conda envs * removing label * adding plasmidfinder * added platon * working until islandpath * added: VFDB * added: Victors * changing name and organization of MISC image * added: PHAST * added: phigaro * added: phispy * added: iceberg * added: kofamscan db download * removing named outputs as it will be solved in another PR * added: kofamscan from downloaded database * added: kegg_decoder * Refactor channel identifiers and process names (#45) **Main change** Processes, channels and workflow names have had its case names changed in order to meet nextflow's community standards. **Commits** * Refactor names * refactor names in the workflows * Update workflows/bacannot.nf * Update workflows/bacannot.nf * Apply suggestions from code review * Update workflows/bacannot.nf wrt iceberg * Accomodate code review and fix the channel name * fix MERGE_ANNOTATIONS process case in line 331 Co-authored-by: Felipe Marques de Almeida <felipemarques89@gmail.com> * trying to brind amrfinder, card_rgi and argminer * trying to fix how the scale is calculated for amrfinder * changing scale to perl * typo fix * removing unnecessary comma * Update amrfinder.nf passing calculation of thresold into two decimals scale to groovy * properly working until amrfinder and card_rgi * added small dataset test profile * fixing name of quicktest profile * fixing urls of testing samplesheets * removing unnecessary files * fix missing label * adding resfinder to miscellaneous image * fixing gitignore * change manifest to upcoming version * updating db download workflow behaviour * removing .git dirs * fixing argminder download * properly added resfinder * adding script that parses nanopolish methyl call * removing unnecessary labels * starting to change how images are done * added perl tools Added: * prokka * barrnap * mlst * added misc module added: * gc_compute module * adding labels * added kofam analysis * added main pyenv added tools: * platon * plasmidfinder * fixed perlenv image Added tool: * islandpath * included virulence modules they use the miscellaneous image * pyenv image updated added prophage tools: * phast scan * phigaro *phispy * added iceberg db module it uses the miscellaneous image * added py36env image added tool: * rgi * added resistance tools added: * argminer db module * resfinder * amrfinderplus * added nanopolish * added refseq_masher * added digIS Docker image and modules were updated to run until digIS * added antismash * added sequence server * added merge_annotation module * added draw_gis modules * added gff2gbk module * added create_sql module * added first resource management labels * fixing resouce label for phigaro * little finx in env * fixing how tuples should be passed * arrived at jbrowse step * fixing draw_gis module input tuple * jbrowse added * Create test_pr.yml This is how new versions will be organised after the remodeling branch update, and, therefore, the test_pr must be updated. * fixed phast db incorporation * fixing inputs on report * Update test_pr.yml Will only act in ready for review PRs * adding ENV VAR for current version * adding scripts to automatically build images * fixed iceberg db incorporation * Update digIS.nf DIGis gff usage fixed * fixed vfdb incorporation * changing file to path input resolution * fixed victors db incorporation * changed channel names in main script * fixed argminer and prokka tables * fixed custom db annotations incorporation * fixed bacannot server loading * fixed custom db reports * begin incorporation of nf-core framework * nf-core libs have been added to the pipeline * custom database annotation added to JBrowse * fixed custom database gff generation * Update Dockerfile diminished image size by removing antismash db from inside it * Update docker.config uncommenting assemblers images * creating singularity profile * begin documentation update * added singularity profile * made image compatible with earlier versions * given 777 permissions to workdir * adjusted default values * fixed prokka to work with singularity * fixed rgi for singularity usage * updating PR test action * update targeted branches * limiting process resources * updated image for singularity * removing unused dbs in quicktest * not using big dbs in quicktest * Update resfinder.nf * adding gitpod config * fixed yml * fixing custom db report getattributefileld snippet * begin change to mkdocs * add requirements * Update .readthedocs.yml * Update .readthedocs.yml * Update .readthedocs.yml * Update .readthedocs.yml * update * update requirements * added index * fixed tags * added installation information * now on quickstart * changed some admonitions * added quickstart * added samplesheet page to mkdocs * included dir with images * Update samplesheet.md fixed header levels * outputs page added to mkdocs * Update standard.config removing "paralallel_jobs" definition * Update defaults.config fixed blast default values * updated and tested quickstart * updating gitpod.yml * creates a testing dir with more space * added profile selection information * Update manual.md manual updated for mkdocs and with new --ncbi_proteins parameter * Update nextflow_schema.json updated --custom_db parameter description * Update nextflow.config removed parameters that are in default and should not be in the boilerplate * added config file page * information about custom databases added * Update nextflow_schema.json * defaults need to be loaded before boilerplate * changing label of unicycler and flye * fixed antismash installation * fixed keggdecoder requires py36 * fixed resfinder module fixed db setup * pipeline fixed for docker Co-authored-by: Abhinav Sharma <abhi18av@users.noreply.github.com>
- Loading branch information
Showing
214 changed files
with
21,429 additions
and
17,587 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
name: Testing new PR with singularity | ||
on: | ||
pull_request: | ||
branches: [ master, dev, develop ] | ||
types: [ ready_for_review, synchronize, reopened ] | ||
|
||
jobs: | ||
run_nextflow: | ||
name: Run pipeline for the upcoming PR | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
|
||
- name: Check out pipeline code | ||
uses: actions/checkout@v2 | ||
|
||
- name: Install Nextflow | ||
env: | ||
CAPSULE_LOG: none | ||
run: | | ||
wget -qO- get.nextflow.io | bash | ||
sudo mv nextflow /usr/local/bin/ | ||
- name: Install Singularity | ||
uses: eWaterCycle/setup-singularity@v7 | ||
with: | ||
singularity-version: 3.8.3 | ||
|
||
- name: Clean environment | ||
run: | | ||
sudo rm -rf /usr/local/lib/android # will release about 10 GB if you don't need Android | ||
sudo rm -rf /usr/share/dotnet # will release about 20GB if you don't need .NET | ||
- name: Build bacannot database | ||
run: | | ||
nextflow run main.nf -profile singularity --get_dbs --output bacannot_dbs --max_cpus 2 --max_memory '6.GB' --max_time '6.h' | ||
rm -rf bacannot_dbs/antismash_db bacannot_dbs/kofamscan_db bacannot_dbs/prokka_db/PGAP_NCBI.hmm # remove unused in quicktest to diminish size | ||
- name: Run the pipeline | ||
run: | | ||
nextflow run main.nf -profile singularity,quicktest --bacannot_db bacannot_dbs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,4 +4,3 @@ | |
.Ruserdata | ||
TESTE | ||
docs/_html | ||
teste |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
image: nfcore/gitpod:latest | ||
|
||
tasks: | ||
- before: | | ||
wget -qO- get.nextflow.io | bash | ||
chmod 777 nextflow | ||
sudo mv nextflow /usr/local/bin/ | ||
pip install tiptop | ||
pip install nf-core | ||
mkdir -p /testing | ||
sudo chmod 777 -R /testing | ||
ln -rs /testing . | ||
vscode: | ||
extensions: # based on nf-core.nf-core-extensionpack | ||
- codezombiech.gitignore # Language support for .gitignore files | ||
# - cssho.vscode-svgviewer # SVG viewer | ||
- davidanson.vscode-markdownlint # Markdown/CommonMark linting and style checking for Visual Studio Code | ||
- eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed | ||
- EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files | ||
- Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar | ||
- mechatroner.rainbow-csv # Highlight columns in csv files in different colors | ||
# - nextflow.nextflow # Nextflow syntax highlighting | ||
- oderwat.indent-rainbow # Highlight indentation level | ||
- streetsidesoftware.code-spell-checker # Spelling checker for source code | ||
|
||
ports: | ||
- port: 3000 | ||
onOpen: open-preview |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,20 @@ | ||
# .readthedocs.yml | ||
# .readthedocs.yaml | ||
# Read the Docs configuration file | ||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details | ||
|
||
# Required | ||
version: 2 | ||
|
||
# Build documentation in the docs/ directory with Sphinx | ||
sphinx: | ||
configuration: docs/conf.py | ||
# Set the version of Python and other tools you might need | ||
build: | ||
os: ubuntu-20.04 | ||
tools: | ||
python: "3.9" | ||
|
||
# Build documentation with MkDocs | ||
# mkdocs: | ||
# configuration: mkdocs.yml | ||
mkdocs: | ||
configuration: mkdocs.yml | ||
|
||
# Optionally build your docs in additional formats such as PDF and ePub | ||
formats: all | ||
|
||
# Optionally set the version of Python and requirements required to build your docs | ||
# Optionally declare the Python requirements required to build your docs | ||
python: | ||
version: 3.7 | ||
install: | ||
- requirements: docs/requirements.txt | ||
install: | ||
- requirements: docs/requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
#!/usr/bin/Rscript | ||
# Setting Help | ||
'usage: addBedtoolsIntersect.R [--txt=<file> --gff=<file> --type=<chr> --source=<chr> --out=<chr>] | ||
options: | ||
-g, --gff=<file> GFF file to merge annotation | ||
-t, --txt=<file> Bedtools intersect file | ||
--type=<chr> Feature type [default: BLAST] | ||
--source=<chr> Feature source [default: CDS] | ||
-o, --out=<chr> Output file name [default: out.gff]' -> doc | ||
|
||
# Parse parameters | ||
suppressMessages(library(docopt)) | ||
opt <- docopt(doc) | ||
|
||
if (is.null(opt$gff)){ | ||
stop("At least one argument must be supplied (gff file)\n", call.=FALSE) | ||
} | ||
|
||
if (is.null(opt$txt)){ | ||
stop("At least one argument must be supplied (intersection file)\n", call.=FALSE) | ||
} | ||
|
||
# Load libraries | ||
suppressMessages(library(ballgown)) | ||
suppressMessages(library(DataCombine)) | ||
suppressMessages(library(dplyr)) | ||
suppressMessages(library(stringr)) | ||
suppressMessages(library(tidyr)) | ||
|
||
# Function used to remove redundancy | ||
reduce_row = function(i) { | ||
d <- unlist(strsplit(i, split=",")) | ||
paste(unique(d), collapse = ',') | ||
} | ||
|
||
# Function to get Attribute Fields | ||
getAttributeField <- function (x, field, attrsep = ";") { | ||
s = strsplit(as.character(x), split = attrsep, fixed = TRUE) | ||
sapply(s, function(atts) { | ||
a = strsplit(atts, split = "=", fixed = TRUE) | ||
m = match(field, sapply(a, "[", 1)) | ||
if (!is.na(m)) { rv = a[[m]][2] | ||
} | ||
else { | ||
rv = as.character(NA) | ||
} | ||
return(rv) | ||
}) | ||
} | ||
|
||
# Operator to discard patterns found | ||
'%ni%' <- Negate('%in%') | ||
|
||
if (file.info(opt$txt)$size > 0) { | ||
|
||
# Load GFF file | ||
gff <- gffRead(opt$gff) | ||
|
||
# Create a column in the intersection file with ids | ||
gff$ID <- getAttributeField(gff$attributes, "ID", ";") | ||
|
||
# Load intersection file | ||
bedtools_intersect <- read.csv(opt$txt, header = F, sep = "\t") | ||
colnames(bedtools_intersect) <- c("seqname1", "source1", "feature1", "start1", "end1", "score1", "strand1", "frame1", "attributes1", | ||
"seqname2", "source2", "feature2", "start2", "end2", "score2", "strand2", "frame2", "attributes2", | ||
"len") | ||
|
||
# Create a column in the intersection file with ids | ||
bedtools_intersect$ID <- getAttributeField(bedtools_intersect$attributes2, "ID", ";") | ||
|
||
# save ids | ||
ids <- bedtools_intersect$ID | ||
|
||
# Subset based on gene IDs | ||
## Lines with our IDs | ||
sub <- gff %>% | ||
filter(ID %in% ids) %>% | ||
select(seqname, source, feature, start, end, score, strand, frame, attributes, ID) | ||
## Lines without our IDs | ||
not <- gff %>% | ||
filter(ID %ni% ids) %>% | ||
select(seqname, source, feature, start, end, score, strand, frame, attributes) | ||
|
||
# Change fields values | ||
## source | ||
s <- sub$source | ||
sn <- as.character(opt$source) | ||
snew <- paste(s, sn, sep = ",") | ||
sub$source <- snew | ||
|
||
## feature | ||
f <- sub$feature | ||
fn <- as.character(opt$type) | ||
fnew <- paste(f, fn, sep = ",") | ||
sub$feature <- fnew | ||
|
||
## attributes | ||
sub <- merge.data.frame(sub, bedtools_intersect, by = "ID", all = TRUE) | ||
new_ID <- paste(opt$source, "_ID=", sep = "", collapse = "") | ||
sub$attributes1 <- gsub(pattern = "ID=", replacement = as.character(new_ID), x=sub$attributes1) | ||
sub <- unite(sub, "attributes", c("attributes", "attributes1"), sep = ";") %>% | ||
select(seqname, source, feature, start, end, score, strand, frame, attributes) | ||
|
||
# Merge files | ||
merged_df <- merge.data.frame(sub, not, all = TRUE) | ||
feat <- merged_df$feature | ||
merged_df$feature <- sapply(feat, reduce_row) | ||
source <- merged_df$source | ||
merged_df$source <- sapply(source, reduce_row) | ||
merged_df <- merged_df[str_order(merged_df$attributes, numeric = TRUE), ] | ||
|
||
# Write output | ||
write.table(merged_df, file = opt$out, quote = FALSE, sep = "\t", col.names = FALSE, row.names = FALSE) | ||
|
||
} else { | ||
# Load GFF file | ||
gff <- gffRead(opt$gff) | ||
# Write output | ||
write.table(gff, file = opt$out, quote = FALSE, sep = "\t", col.names = FALSE, row.names = FALSE) | ||
} |
0
docker/scripts/rscripts/addBlast2Gff.R → bin/addBlast2Gff.R
100644 → 100755
File renamed without changes.
File renamed without changes.
0
docker/scripts/rscripts/addKO2Gff.R → bin/addKO2Gff.R
100644 → 100755
File renamed without changes.
0
docker/scripts/rscripts/addNCBIamr2Gff.R → bin/addNCBIamr2Gff.R
100644 → 100755
File renamed without changes.
0
docker/scripts/rscripts/addRGI2gff.R → bin/addRGI2gff.R
100644 → 100755
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#!/bin/bash | ||
name=$(basename $(pwd)) | ||
docker build -t fmalmeida/bacannot:${1}_${name} . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
#! /usr/bin/env python3 | ||
|
||
import sys | ||
import csv | ||
import argparse | ||
import gzip | ||
|
||
class SiteStats: | ||
def __init__(self, g_size, g_seq): | ||
self.num_reads = 0 | ||
self.called_sites = 0 | ||
self.called_sites_methylated = 0 | ||
self.group_size = g_size | ||
self.sequence = g_seq | ||
|
||
def update_call_stats(key, num_called_cpg_sites, is_methylated, sequence): | ||
if key not in sites: | ||
sites[key] = SiteStats(num_called_cpg_sites, sequence) | ||
|
||
sites[key].num_reads += 1 | ||
sites[key].called_sites += num_called_cpg_sites | ||
if is_methylated > 0: | ||
sites[key].called_sites_methylated += num_called_cpg_sites | ||
|
||
parser = argparse.ArgumentParser( description='Calculate methylation frequency at genomic CpG sites') | ||
parser.add_argument('-c', '--call-threshold', type=float, required=False, default=2.0) | ||
parser.add_argument('-s', '--split-groups', action='store_true') | ||
args, input_files = parser.parse_known_args() | ||
assert(args.call_threshold is not None) | ||
|
||
sites = dict() | ||
# iterate over input files and collect per-site stats | ||
for f in input_files: | ||
if f[-3:] == ".gz": | ||
in_fh = gzip.open(f, 'rt') | ||
else: | ||
in_fh = open(f) | ||
csv_reader = csv.DictReader(in_fh, delimiter='\t') | ||
for record in csv_reader: | ||
|
||
num_sites = int(record['num_motifs']) | ||
llr = float(record['log_lik_ratio']) | ||
|
||
# Skip ambiguous call | ||
if abs(llr) < args.call_threshold * num_sites: | ||
continue | ||
sequence = record['sequence'] | ||
|
||
is_methylated = llr > 0 | ||
|
||
# if this is a multi-cpg group and split_groups is set, break up these sites | ||
if args.split_groups and num_sites > 1: | ||
c = str(record['chromosome']) | ||
s = int(record['start']) | ||
e = int(record['end']) | ||
|
||
# find the position of the first CG dinucleotide | ||
sequence = record['sequence'] | ||
cg_pos = sequence.find("CG") | ||
first_cg_pos = cg_pos | ||
while cg_pos != -1: | ||
key = (c, s + cg_pos - first_cg_pos, s + cg_pos - first_cg_pos) | ||
update_call_stats(key, 1, is_methylated, "split-group") | ||
cg_pos = sequence.find("CG", cg_pos + 1) | ||
else: | ||
key = (str(record['chromosome']), int(record['start']), int(record['end'])) | ||
update_call_stats(key, num_sites, is_methylated, sequence) | ||
|
||
# header | ||
print("\t".join(["chromosome", "start", "end", "num_motifs_in_group", "called_sites", "called_sites_methylated", "methylated_frequency", "group_sequence"])) | ||
|
||
sorted_keys = sorted(list(sites.keys()), key = lambda x: x) | ||
|
||
for key in sorted_keys: | ||
if sites[key].called_sites > 0: | ||
(c, s, e) = key | ||
f = float(sites[key].called_sites_methylated) / sites[key].called_sites | ||
print("%s\t%s\t%s\t%d\t%d\t%d\t%.3f\t%s" % (c, s, e, sites[key].group_size, sites[key].called_sites, sites[key].called_sites_methylated, f, sites[key].sequence)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
hmmer: | ||
bin: CHANGE_HMMSEARCH | ||
e_value_threshold: 0.00445 | ||
pvog_path: CHANGE_PVOG | ||
phigaro: | ||
mean_gc: 0.46354823199323625 | ||
penalty_black: 2.2 | ||
penalty_white: 0.7 | ||
threshold_max_abs: 52.96 | ||
threshold_max_basic: 46.0 | ||
threshold_max_without_gc: 11.42 | ||
threshold_min_abs: 50.32 | ||
threshold_min_basic: 45.39 | ||
threshold_min_without_gc: 11.28 | ||
window_len: 32 | ||
prodigal: | ||
bin: CHANGE_PRODIGAL |
0
docker/scripts/bscripts/draw_gis.sh → bin/draw_gis.sh
100644 → 100755
File renamed without changes.
Oops, something went wrong.