Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort input to krakenuniq to enable retrieval of cached batch runs. #570 #576

Merged
merged 14 commits into from
Mar 6, 2025
Merged
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [#576](https://github.com/nf-core/taxprofiler/pull/576) Sort input to krakenuniq to enable retrieval of cached batch tasks (added by @muniheart)

### `Fixed`

- [573](https://github.com/nf-core/taxprofiler/pull/573) Improved help messages and documentation to state many of the taxpasta related params require taxonomy files to be input (❤️ to @alexhbnr for reporting, fix by @jfy133)
Expand Down
1 change: 1 addition & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,7 @@ process {
}

withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
tag = { "${meta.db_name}|${task.index}" }
ext.args = { "${meta.db_params}" }
// one run with multiple samples, so fix ID to just db name to ensure clean log name
ext.prefix = { "${meta.db_name}.krakenuniq" }
Expand Down
6 changes: 5 additions & 1 deletion subworkflows/local/profiling.nf
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,8 @@ workflow PROFILING {
}

if ( params.run_krakenuniq ) {
ch_input_for_krakenuniq = ch_input_for_profiling.krakenuniq

ch_input_for_krakenuniq = ch_input_for_profiling.krakenuniq
.map {
meta, reads, db_meta, db ->
def seqtype = (reads[0].name ==~ /.+?\.f\w{0,3}a(\.gz)?$/) ? 'fasta' : 'fastq'
Expand All @@ -395,6 +396,9 @@ workflow PROFILING {
}
.groupTuple(by: [0,2,3])
.flatMap { single_meta, reads, db_meta, db ->
// Sort reads array by comparing last element, prefix. This will ensure batch membership remains
// constant across runs, enabling retrieval of cached tasks.
reads.sort { a,b -> a[-1] <=> b[-1] }
def batches = reads.collate(params.krakenuniq_batch_size)
return batches.collect { batch ->
// We split the sample identifier from the reads again after batching.
Expand Down