Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort input to krakenuniq to enable retrieval of cached batch runs. #570 #576

Merged
merged 14 commits into from
Mar 6, 2025
1 change: 1 addition & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,7 @@ process {
}

withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
tag = { "${meta.db_name}|${task.index}" }
ext.args = { "${meta.db_params}" }
// one run with multiple samples, so fix ID to just db name to ensure clean log name
ext.prefix = { "${meta.db_name}.krakenuniq" }
Expand Down
17 changes: 16 additions & 1 deletion subworkflows/local/profiling.nf
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,22 @@ workflow PROFILING {
}

if ( params.run_krakenuniq ) {
ch_input_for_krakenuniq = ch_input_for_profiling.krakenuniq
// Collect channel into list. Sort to ensure batch membership remains constant across runs.
// This will enable retrieval of cached tasks. This is a blocking operation.
ch_input_for_krakenuniq_sorted = ch_input_for_profiling.krakenuniq
.collect(
flat: false,
sort: {
a,b -> a[0].id <=> b[0].id ?:
a[0].run_accession <=> b[0].run_accession ?:
a[0].db_meta.db_name <=> b[0].db_meta.db_name ?:
a[0].db <=> b[0].db
}
)
// Apply inverse of collect operator. Result is multi-value channel.
.flatMap { it -> it.toList() }

ch_input_for_krakenuniq = ch_input_for_krakenuniq_sorted
.map {
meta, reads, db_meta, db ->
def seqtype = (reads[0].name ==~ /.+?\.f\w{0,3}a(\.gz)?$/) ? 'fasta' : 'fastq'
Expand Down
Loading