Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified parallel GBWT construction #4221

Merged
merged 11 commits into from
Feb 9, 2024
Merged

Conversation

jltsiren
Copy link
Contributor

@jltsiren jltsiren commented Feb 7, 2024

Changelog Entry

To be copied to the draft changelog by merger:

  • Multithreaded path cover / local haplotypes GBWT construction.

Description

This PR updates GBWTGraph to include unified support for multithreaded GBWT construction (see jltsiren/gbwtgraph#36). The main idea is to partition the graph into GBWT construction jobs using gbwt_construction_jobs(), create the final metadata with MetadataBuilder, pass reference paths with assign_paths() and insert_paths(), build partial GBWTs for the jobs in parallel, merge the GBWTs, and add metadata from MetadataBuilder.

  • Path cover / local haplotypes GBWT construction uses this now, both in vg gbwt and vg autoindex. As a byproduct, we get parallelization for them.
  • GBWT/GBZ construction from GFA already used this.
  • Haplotype sampling in vg haplotypes uses this to the extent possible.
  • rebuild_gbwt() still uses the old parallelization scheme, as it's built into the interface.

@jltsiren jltsiren merged commit 18bf9cd into master Feb 9, 2024
2 checks passed
@jltsiren jltsiren deleted the parallel-gbwt-construction branch February 9, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants