Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-canonical syncmers and randstrobes #474

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Non-canonical syncmers and randstrobes #474

wants to merge 9 commits into from

Conversation

marcelm
Copy link
Collaborator

@marcelm marcelm commented Dec 19, 2024

This changes to non-canonical randstrobes and to non-canonical syncmers. This is a bit incomplete (only single-end works), needs some cleanup, and some improvements because it also reduces the accuracy. I just want to have a single place where this change can be discussed and where I can keep a to-do list.

To Do

  • Make paired-end mapping work
  • Remove reverse_nam_if_needed
  • Remove first_strobe_is_main (it’s always true)
  • To alleviate the problem that filtering now works differently for forward and reverse sequences, we could perhaps somehow also look at the reverse complemented sequence when checking whether a randstrobe is filtered. That is, let the filter work as with canonical syncmers/randstrobes. Not sure how.

@ksahlin
Copy link
Owner

ksahlin commented Dec 19, 2024

Good idea with this PR.

About the fourth point:

Very vague but; if a read has more or equal number of distinct hits found in the index in the masked direction than as the non-masked direction it could be flagged as suspicious. For suspicious reads we could do some extra work (eg collecting some matches in both directions) to assess which direction is best?

Are the seeds sorted also w.r.t chr ID and pos in the index, or how does the pdq sort work again? If they were we could attempt something like storing the first hit of each masked hit in the hope that they well map to the same beginning of a repetitive region (this is not very robust though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants