Skip to content

Commit affcebf

Browse files
committed
Changelog and readme section for mcs
1 parent 6d63f03 commit affcebf

File tree

3 files changed

+32
-3
lines changed

3 files changed

+32
-3
lines changed

CHANGES.md

+8-2
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,20 @@
22

33
## development version
44

5+
* #476: Significantly improve speed and accuracy by enabling by default a new
6+
variant of multi-context seeds: When no regular seeds - which consist
7+
of two strobes - can be found for the entire query, strobealign attempts to find
8+
single-strobe ("partial") seeds.
9+
The `--mcs` option is still available for now. It is a bit slower, but
10+
slightly more accurate.
511
* #468: Be less strict when checking reference sequence names.
612

713
## v0.15.0 (2024-12-13)
814

915
* #388 and #426: Increase accuracy and mapping rate for reads shorter than
1016
about 200 bp by introducing multi-context seeds.
11-
Previously, seeds always consisted of two k-mers and would only be found if
12-
both occur in query and reference.
17+
Previously, seeds always consisted of two k-mers ("strobes") and would only
18+
be found if both occur in query and reference.
1319
With this change, strobealign falls back to looking up just one of the k-mers
1420
when appropriate.
1521
This feature is currently *experimental* and only enabled when using the

README.md

+23
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,29 @@ actual mapping:
223223
- Index files are about four times as large as the reference.
224224

225225

226+
## Explanation
227+
228+
### Multi-context seeds
229+
230+
Strobealign uses randstrobes as seeds, which in our case consist of two k-mers
231+
("strobes") that are somewhat close to each other. When a seed is looked up
232+
in the index, it is only found if both strobes match. By changing the way in
233+
which the index is stored in v0.15.0, it became possible to support
234+
*multi-context seeds*. With those changes, strobealign falls back to looking
235+
up only one of the strobes (a "partial seed") if the full seed cannot be found.
236+
This results in better mapping rate and accuracy for read lengths of up to
237+
about 200 nt.
238+
239+
Usage of multi-context seeds is enabled by default in strobealign since v0.16.0.
240+
The strategy is to first search for all full seeds of the query and fall back to
241+
partial seeds if *no* seeds could be found.
242+
243+
A slightly more accurate, but slower mode of using multi-context seeds is
244+
available by using option `--mcs`: With it, the strategy is changed to a
245+
fallback *per seed*: If an individual full seed cannot be found, its partial
246+
version is looked up in the index.
247+
248+
226249
## Changelog
227250

228251
See [Changelog](CHANGES.md).

src/cmdline.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ CommandLineOptions parse_command_line_arguments(int argc, char **argv) {
5454
args::ValueFlag<int> end_bonus(parser, "INT", "Soft clipping penalty [10]", {'L'});
5555

5656
args::Group search(parser, "Search parameters:");
57-
args::Flag mcs(parser, "mcs", "Use multi-context seeds for finding hits", {"mcs"});
57+
args::Flag mcs(parser, "mcs", "Use extended multi-context seed mode for finding hits. Slightly more accurate, but slower", {"mcs"});
5858
args::ValueFlag<float> f(parser, "FLOAT", "Top fraction of repetitive strobemers to filter out from sampling [0.0002]", {'f'});
5959
args::ValueFlag<float> S(parser, "FLOAT", "Try candidate sites with mapping score at least S of maximum mapping score [0.5]", {'S'});
6060
args::ValueFlag<int> M(parser, "INT", "Maximum number of mapping sites to try [20]", {'M'});

0 commit comments

Comments
 (0)