Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deconstruct option to cluster similar alleles together #4301

Merged
merged 4 commits into from
May 29, 2024
Merged

Conversation

glennhickey
Copy link
Contributor

Changelog Entry

To be copied to the draft changelog by merger:

  • Experimental option -L added to vg deconstruct in order to cluster similar allele traversals together. The value given is a (length-weighted) threshold for the jaccard coefficient between the oriented nodes of two traversals. So if -L 0.75 is given, then alleles that have >= 0.75 similarity based on their graph positions will be merged into one. Two new FORMAT fields are added to keep track of the difference, TS (jaccard distance) and TL (length difference). Clustering is done greedily starting with selected reference paths.

Description

I don't think the clustering is especially useful on its own (though it can be used to make much simpler VCFs), but it's an important part of improved multi-level vcf support (finally coming in next PR). (it's also why I'm clustering on graph path and not the actual dna string, since only the paths themselves can be used to anchor the child snarls)

@glennhickey glennhickey merged commit 2fea419 into master May 29, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants