Skip to content

Reference-less contig stitching #1221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 339 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
339 commits
Select commit Hold shift + click to select a range
33288ca
Remove redundant calculation in referenceless
Donaim Jan 10, 2025
4ea1fd5
Improve scoring in referenceless
Donaim Jan 10, 2025
a159da1
Fix scoring in referenceless
Donaim Jan 10, 2025
40f3c88
Revert to non-adaptable scoring in referenceless
Donaim Jan 10, 2025
26ef042
Revert to before Score class in referenceless
Donaim Jan 10, 2025
8aea4be
Remove redundant comments in referenceless
Donaim Jan 10, 2025
393180c
Wrap max_acceptable_prob in referenceless
Donaim Jan 10, 2025
865bc32
Limit number of contigs in referenceless
Donaim Jan 11, 2025
e63d9e4
Improve MaximumAcceptableProbability calculation in referenceless
Donaim Jan 11, 2025
cdf16f2
Do not clear MaximumAcceptableProbability pool in referenceless
Donaim Jan 11, 2025
c93be63
Revert "Do not clear MaximumAcceptableProbability pool in referenceless"
Donaim Jan 11, 2025
d60e474
Add a final loop in referenceless to collect covered contigs
Donaim Jan 13, 2025
e040cb6
Add another heuristic to referenceless stitcher
Donaim Jan 13, 2025
60e1cb7
Rename classes in referenceless
Donaim Jan 13, 2025
45e50dc
Small improvements to pool structure in referenceless
Donaim Jan 13, 2025
a1a4eca
Reduce logging in referenceless
Donaim Jan 13, 2025
d846b08
Further improvements to the pool structure of referenceless
Donaim Jan 13, 2025
24825e0
Replace Fractions by ints in referenceless
Donaim Jan 13, 2025
bbf87b0
Use correlate instead of convolve in find_maximum_overlap
Donaim Jan 13, 2025
1225d33
Use oaconvolve in find_maximum_overlap
Donaim Jan 13, 2025
1639dad
Use np.convolve in find_maxmimum_overlap
Donaim Jan 13, 2025
2e03d9d
Do not use scipy in find_maximum_overlap
Donaim Jan 13, 2025
ce5971e
Use faster method for conversion of strings into np.arrays
Donaim Jan 13, 2025
94017ad
Improve concordance calculations in contig stitcher
Donaim Jan 14, 2025
4de1569
Make probability the only score in referenceless
Donaim Jan 14, 2025
d058a17
Remove redundant method in referenceless
Donaim Jan 14, 2025
8483bbe
Do not use len(SortedList)
Donaim Jan 14, 2025
56ff67e
Fix initial filtering in referenceless
Donaim Jan 14, 2025
95d85b1
Improve logging in referenceless
Donaim Jan 14, 2025
bc49847
Shortcut referenceless final laps
Donaim Jan 14, 2025
be0090e
Fix punctuation in referenceless logging
Donaim Jan 14, 2025
0260a02
Do not print average concordance in contig stitcher
Donaim Jan 14, 2025
983cd8e
Optimize smallest score access in referenceless
Donaim Jan 14, 2025
9a8c6e5
Fix caching in referenceless
Donaim Jan 14, 2025
569e3c5
Fix some logging in referenceless
Donaim Jan 14, 2025
1e3ccbf
Optimize conversion into numpy arrays in find_maximum_overlap
Donaim Jan 14, 2025
f1204b9
Reduce amount of caching in referenceless
Donaim Jan 14, 2025
abe0be2
Remove CombineCache from referenceless completely
Donaim Jan 14, 2025
e6bd7e8
Implement a seed idea for referenceless
Donaim Jan 14, 2025
1dee289
Add singleton method to ContigsPath in referenceless
Donaim Jan 14, 2025
85c5d78
Bring back combine_cache to referenceless
Donaim Jan 14, 2025
262ba71
Factor out cutoffs handling in referenceless
Donaim Jan 14, 2025
6262410
Make cutoffs cache global again in referenceless
Donaim Jan 14, 2025
fe6c8d7
Make OverlapFinder global in referenceless
Donaim Jan 14, 2025
5b53ccc
Introduce ALIGN_CACHE into referenceless
Donaim Jan 14, 2025
06d37f4
Improve heuristics that skips overlap search in referenceless
Donaim Jan 14, 2025
6d071f6
Remove useless checks in referenceless
Donaim Jan 14, 2025
3e7f20c
Reorder heuristics calculations in referenceless
Donaim Jan 14, 2025
0b40c4b
Make sure to start with longer seeds first in referenceless
Donaim Jan 16, 2025
98a2807
Add cat utility
Donaim Jan 16, 2025
8fb4f7d
Use cat in sample.py
Donaim Jan 16, 2025
4951ac4
Remove all options from cat utility
Donaim Jan 16, 2025
66acb77
Factor out code related to concordance calculations
Donaim Jan 20, 2025
83463ca
Merge remote-tracking branch 'origin/master' into contig-merger
Donaim Jan 21, 2025
4b0352c
Simplify exp_accumulate_array code
Donaim Jan 21, 2025
237bf45
Make sure that exp_accumulate_array receives a binary array
Donaim Jan 21, 2025
92a4d28
Improve typing in exp_accumulate_array
Donaim Jan 21, 2025
466e473
Rewrite concordance calculations in numpy
Donaim Jan 21, 2025
6c74dd0
Factor out exp_accumulate_array in overlap_stitcher
Donaim Jan 21, 2025
751d20f
Optimize numpy.fromiter calls
Donaim Jan 21, 2025
cce2d95
Fix cat.py exceptional behaviour
Donaim Jan 22, 2025
b8f1d9e
Allow to accept 0 files in cat.py
Donaim Jan 22, 2025
d7e6d83
Uncomment perfectly good test cases
Donaim Jan 23, 2025
e84baf7
Move overlap stitcher tests into a separate file
Donaim Jan 23, 2025
f710943
Fix logical error in overlap_stitcher
Donaim Jan 23, 2025
64f418c
Fall back to scipy.signal.convolve for very big arrays
Donaim Jan 24, 2025
a2ae0be
Make version check more reliable in MiCall
Donaim Jan 31, 2025
d17e42c
Merge remote-tracking branch 'origin/master' into contig-merger
Donaim Mar 25, 2025
cf20410
Improve logging in referenceless
Donaim Mar 25, 2025
e787c82
Simplify loop code in referenceless
Donaim Mar 25, 2025
3724cba
Factor out several helpful methods in referenceless
Donaim Mar 26, 2025
2054b52
Implement find_maximum_overlap local to ContigWithAligner
Donaim Mar 26, 2025
b49c334
Use the local find_maximum_overlap
Donaim Mar 26, 2025
4a4f439
Remove numba dependency
Donaim Mar 26, 2025
c52fb20
Implement exponential dropoff function
Donaim Mar 26, 2025
7cdeef9
Use exp_dropoff_array in referenceless
Donaim Mar 26, 2025
609a3f2
Make find_maximum_overlap value a float
Donaim Mar 26, 2025
c3868f2
Adjust exp_dropoff factor in referenceless
Donaim Mar 27, 2025
fac1ef6
Improve precision in fasta_to_fastq
Donaim Mar 27, 2025
a3de8c1
Improve typing in fasta_to_fastq
Donaim Mar 27, 2025
a078dc6
Improve precision in fasta_to_fastq again
Donaim Mar 27, 2025
d88517f
Remove unused variable
Donaim Mar 27, 2025
dac0eca
Fix theshold bug in referenceless
Donaim Mar 27, 2025
e2d05b0
Improve score calculation in referenceless
Donaim Mar 28, 2025
f54dc8d
Adjust factors in referenceless
Donaim Mar 28, 2025
acf0bcc
Further improve score calculation in referenceless
Donaim Mar 28, 2025
ef26209
Add a tool for calculating kmer frequencies
Donaim Mar 28, 2025
e90f32c
Improve accuracy of kmer frequencies calculation
Donaim Mar 28, 2025
b23e852
Remove unreachable code
Donaim Mar 28, 2025
d833233
Fix off by 1 error in kmer frequencies calculation
Donaim Mar 28, 2025
68ed6a9
Fix another off by 1 error in kmer frequencies calculation
Donaim Mar 28, 2025
0863847
Improve code in kmer frequencies calculator
Donaim Mar 28, 2025
638b728
Improve logging in kmer frequencies script
Donaim Mar 28, 2025
24e725c
Fix another off by 1 error in kmer frequencies calculation
Donaim Mar 28, 2025
8934f04
Add tests for calculate kmer frequencies
Donaim Mar 28, 2025
aeb9ee0
Fix small issues in tests
Donaim Mar 28, 2025
3a1d957
Rewrite multiple inputs handling in kmer frequency calculation
Donaim Mar 28, 2025
4600ee2
Update documentation in calc frequency calculator
Donaim Mar 28, 2025
0ca8d32
Small code improvements in kmer frequencies calculator
Donaim Mar 28, 2025
e3b3aae
Factor out SortedRing structure
Donaim Apr 1, 2025
bc7a260
Revert "Factor out SortedRing structure"
Donaim Apr 1, 2025
24e922f
Add fastq_to_fasta tool
Donaim Apr 4, 2025
34a8240
Improve logging in several utils
Donaim Apr 4, 2025
89eea75
Add a randomized test for referenceless contig stitcher
Donaim Apr 4, 2025
52f8a3b
Ensure to always cover the whole genome in fasta_to_fastq
Donaim Apr 4, 2025
afd8fa9
Test perfect reconstruction in referenceless fuzzer
Donaim Apr 4, 2025
baa3caa
Remove unused variable
Donaim Apr 4, 2025
ed3b645
Shuffle in fasta_to_fastq
Donaim Apr 4, 2025
c28a5ca
Remove extraction_num argument from fasta_to_fastq
Donaim Apr 4, 2025
085b0f8
Split StitcherContext into two
Donaim Apr 4, 2025
76ce05a
Make end-to-end referenceless stitcher test more comprehensive
Donaim Apr 7, 2025
69c19a4
Make end-to-end referenceless stitcher test more demanding
Donaim Apr 7, 2025
b014a88
Simplify overlap score calculation
Donaim Apr 7, 2025
58a1088
Cache overlap score calculations
Donaim Apr 7, 2025
5b6a232
Simplify minimum score calculations
Donaim Apr 7, 2025
bd30dd6
Fix off-by-1 error in referenceless
Donaim Apr 7, 2025
59aa761
Fix overlap size calculation
Donaim Apr 8, 2025
c4522c2
Implement initial version of find_maximum_overlap_length
Donaim Apr 8, 2025
07c00ea
Fix tests of referenceless
Donaim Apr 8, 2025
4252b0d
Use find_max_overlap_length
Donaim Apr 8, 2025
4476471
Rename contig stitcher events to referencefull ones
Donaim Apr 8, 2025
cd5b585
Greatly improve logging in referenceless
Donaim Apr 8, 2025
0180213
Refactor code in referenceless tests
Donaim Apr 8, 2025
fcfad27
Add an exact logger test to referenceless
Donaim Apr 8, 2025
9c6fa8e
Factor out a context-inheriting version of main in referenceless
Donaim Apr 8, 2025
f59e240
Improve logger tests in referenceless contig stitcher
Donaim Apr 8, 2025
9a2b4cb
Fix file format in logger test of referenceless stitcher
Donaim Apr 8, 2025
1996aee
Add more logs to referenceless stitcher
Donaim Apr 8, 2025
4a8252f
Improve debuggability of referenceless stitcher tests
Donaim Apr 8, 2025
c8173a6
Merge branch 'master' into contig-merger
Donaim Apr 9, 2025
371abce
Split stitcher registry from the stitcher main context
Donaim Apr 9, 2025
4a1bc1a
Improve context stitcher context handling
Donaim Apr 9, 2025
16636ae
Add more logging to referenceless
Donaim Apr 9, 2025
4d3ffde
Improve find_max_overlap_length implementation
Donaim Apr 9, 2025
8bfbdfe
Implement the idea of reference reduction
Donaim Apr 9, 2025
8dc3dda
Check logs first in referenceless tests
Donaim Apr 9, 2025
799cbd8
Remove a redundant step in referenceless
Donaim Apr 9, 2025
c21e1bc
Log cutoffs values in referenceless
Donaim Apr 9, 2025
3895a33
Fix length opportunistic optimization in referenceless
Donaim Apr 9, 2025
7b32e49
Improve test performance in referenceless
Donaim Apr 9, 2025
ff9f0eb
Refactor fuzz tests for referenceless
Donaim Apr 9, 2025
2a652ef
Add quicker tests for referenceless
Donaim Apr 9, 2025
5840b81
Improve the small values test in referenceless
Donaim Apr 9, 2025
d26a417
Improve stitcher context structure
Donaim Apr 11, 2025
59788dc
Fix typing in contig stitcher context
Donaim Apr 11, 2025
68fd517
Split contig stitcher contexts
Donaim Apr 11, 2025
f0e254b
Improve staging in contig stitcher contexts
Donaim Apr 11, 2025
607b12f
Further improve typing of contig stitcher context
Donaim Apr 11, 2025
cf2ac1a
Add a lower debug level to referenceless
Donaim Apr 11, 2025
1838772
Small improvement to coverage test
Donaim Apr 11, 2025
2293f55
Use debug2 level in referenceless
Donaim Apr 11, 2025
9b67df4
Use debug2 level in some refenreceless tests
Donaim Apr 11, 2025
2ba2fc9
Improve format of serialized events in referenceless tests
Donaim Apr 11, 2025
3d14e96
Improve referenceless test case generation
Donaim Apr 11, 2025
f879402
Fix handling of completely covered contigs in referenceless
Donaim Apr 11, 2025
8227d8d
Fix find_maximum_overlap when sequences are truly random
Donaim Apr 12, 2025
86d26f9
Add more cases to referenceless tests
Donaim Apr 12, 2025
24bd3be
Finish implementation of reference size reduction optimization for re…
Donaim Apr 12, 2025
1af2b8f
Do not always try to reduce reference in referenceless
Donaim Apr 12, 2025
2fd2c0f
Improve map_overlap contract in referenceless
Donaim Apr 15, 2025
836badc
Improve failing tests in referenceless
Donaim Apr 15, 2025
ef1964c
Require all negative tests in referenceless to fail
Donaim Apr 15, 2025
b2db31e
Change test cases in referenceless big test
Donaim Apr 15, 2025
731bf9f
Adopt gotoh for overlap adjustments in referenceless
Donaim Apr 15, 2025
bd7f3ae
Revert "Adopt gotoh for overlap adjustments in referenceless"
Donaim Apr 15, 2025
6f76abd
Adopt BioPython aligner for overlap adjustments in referenceless
Donaim Apr 15, 2025
d840eb2
Revert "Adopt BioPython aligner for overlap adjustments in referencel…
Donaim Apr 15, 2025
c1aab1e
Improve code quality of referenceless alignment adjustement code
Donaim Apr 15, 2025
bf011db
Abstract MappyAligner in referenceless
Donaim Apr 16, 2025
e400d93
Another try at the Biopython aligner for overlap adjustments in refer…
Donaim Apr 16, 2025
464d3a2
Revert "Another try at the Biopython aligner for overlap adjustments …
Donaim Apr 16, 2025
b1e2d7b
Try do fix mappy by adding padding
Donaim Apr 16, 2025
7852548
Change tests order in referenceless tests
Donaim Apr 16, 2025
6f3bd5d
Add an assert in referenceless
Donaim Apr 16, 2025
b3b450c
Implement a pessimistic improvement to overlap adjuster in referenceless
Donaim Apr 16, 2025
da25a6e
Fix overlap scoring function in referenceless
Donaim Apr 16, 2025
43ed0a2
Rename "calc_overlap_pvalue" to "calculate_overlap_score"
Donaim Apr 16, 2025
fb25682
Improve typing in overlap_stitcher.py
Donaim Apr 16, 2025
0c77671
Change from language of probabilities to language of scores in refere…
Donaim Apr 16, 2025
0c40c0b
Remove size restriction for overlaps in referenceless
Donaim Apr 16, 2025
2f63545
Adjust score calculations in referenceless
Donaim Apr 16, 2025
01ac50e
Ensure that small covered overlaps are not discarded in referenceless
Donaim Apr 16, 2025
07a93f7
Fix covered overlap cutoffs calculation
Donaim Apr 17, 2025
8b2e877
Optimize find_max_overlap resulting calculation
Donaim Apr 17, 2025
265fdda
Improve find_maximum_overlap final calculation
Donaim Apr 17, 2025
3400ebd
Improve implementation of calc_overlap_score and its usage in referen…
Donaim Apr 17, 2025
d2434f9
Improve overlap size calculations depending on whether covered or not
Donaim Apr 18, 2025
3b3910f
Use calculate_overlap_score in find_maximum_overlap
Donaim Apr 17, 2025
8c903d8
Improve get_overlap_results precision for covered contigs
Donaim Apr 18, 2025
810332f
Increase ACCEPTABLE_STITCHING_SCORE in referenceless to 15,14
Donaim Apr 18, 2025
534fcfe
Add comments for test_referenceless_contig_stitcher
Donaim Apr 23, 2025
33c7826
Factor out SortedRing data structure
Donaim Apr 23, 2025
c9d6768
Remove Pool.resize method in referenceless
Donaim Apr 23, 2025
fb50688
Do not assume that Pool.paths.capacity can be None
Donaim Apr 23, 2025
07bf739
Return bool from SortedRing.insert
Donaim Apr 23, 2025
4ada03e
Simplify add method in referenceless Pool
Donaim Apr 23, 2025
db017ce
Fix docstring of SortedRing
Donaim Apr 23, 2025
4270811
Optimize SortedRing insert operation
Donaim Apr 23, 2025
8d59183
Optimize SortedRing insert operation [2]
Donaim Apr 23, 2025
a659319
Add docstrings to referenceless and make small code improvements
Donaim Apr 24, 2025
4b7a464
Use Biopython aligner instead of gotoh in referenceless
Donaim Apr 24, 2025
353e605
Use Biopythons default aligner scoring in referenceless
Donaim Apr 24, 2025
0b6e6a8
Optimize align_queries implementation by using a shared aligner
Donaim Apr 24, 2025
9eb5940
Ensure that we are not asking for every alignment in align_queries
Donaim Apr 24, 2025
e8e5998
Disable acceptance probability check in referenceless tests
Donaim Apr 24, 2025
aab8100
Increase ACCEPTABLE_STITCHING_SCORE in referenceless to 71,70
Donaim Apr 24, 2025
d9e2e59
Add missing period in a log message of referenceless
Donaim Apr 24, 2025
b81d12f
Merge branch 'master' into contig-merger
Donaim Apr 30, 2025
51a87b3
Remove download entrypoint from analyze_kive_batches.py
Donaim Apr 30, 2025
9511442
Do not download unfinished runs in download.py
Donaim Apr 30, 2025
899ee9c
Update call to contig stitcher
Donaim Apr 30, 2025
cbebafd
Implement intrapolation of MAX_ALTERNATIVES
Donaim Apr 30, 2025
7057c25
Adjust implementation of intrapolate_number_of_alternatives
Donaim Apr 30, 2025
dccb9c2
Download conseq files too
Donaim Apr 30, 2025
88327aa
Extract final list of runs during download stage
Donaim May 1, 2025
eff7c1c
Skip previously failed downloads
Donaim May 1, 2025
1150660
Support comments in batch names
Donaim May 1, 2025
0f8d5b3
Fix runs.txt format
Donaim May 1, 2025
e764987
Skip useless runs in download
Donaim May 1, 2025
c2adfcc
Do not always overwrite runs_txt
Donaim May 1, 2025
d4a2bb6
Increase verbosity of some process_info messages
Donaim May 1, 2025
70ecb55
Redownload active runs
Donaim May 1, 2025
962bc9a
Do not redownload if already exists
Donaim May 1, 2025
3a33043
Speed-up download process
Donaim May 1, 2025
75c6b0c
Simplify download code
Donaim May 1, 2025
a9ff61d
Set kivecli logger into debug mode
Donaim May 1, 2025
b4ab704
Improve downloads
Donaim May 2, 2025
383d1ad
Simplify code
Donaim May 2, 2025
590344a
Only call kivecli via its API
Donaim May 2, 2025
a4607d2
Write all files atomically
Donaim May 2, 2025
d0611a1
Improve download logic
Donaim May 2, 2025
65d635d
Simplify download code
Donaim May 2, 2025
1f4783b
Fix download logic
Donaim May 5, 2025
49cef61
Improve batch reading
Donaim May 5, 2025
cee6349
bump kivecli
Donaim May 5, 2025
ac61b2e
do not filter in extract_run_ids
Donaim May 5, 2025
eac44e6
use extract_runs_ids
Donaim May 5, 2025
2bda5f5
Fix serialization in get-batch
Donaim May 5, 2025
dc043d0
Fix typing errors
Donaim May 5, 2025
f1e373b
Fix loading of KiveRuns
Donaim May 5, 2025
8e9a06e
Fix failed flag handling
Donaim May 5, 2025
731ccd5
improve the download workflow
Donaim May 6, 2025
aab2019
Fix type error in download
Donaim May 6, 2025
f115ad4
Return without keeping temp directory
Donaim May 6, 2025
3ee1d38
Make sure to break if still processing
Donaim May 6, 2025
a55baee
Switch away from iterators
Donaim May 6, 2025
4577aa1
Make sure we are not overwriting local constants
Donaim May 6, 2025
ceaabab
Factor out helper procedures
Donaim May 6, 2025
8ffae39
Simplify fetch code
Donaim May 6, 2025
e00b9b6
Improve laziness of download.py
Donaim May 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
714 changes: 52 additions & 662 deletions micall/core/contig_stitcher.py

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions micall/core/plot_contigs.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import typing
from typing import Dict, Tuple, List, Set, Iterable, NoReturn
from typing import Dict, Tuple, List, Set, Iterable, NoReturn, Sequence
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter, FileType
from collections import Counter, defaultdict
from csv import DictReader
Expand All @@ -22,8 +22,8 @@
from micall.core.project_config import ProjectConfig
from micall.utils.alignment_wrapper import align_nucs
from micall.utils.contig_stitcher_contigs import Contig, GenotypedContig, AlignedContig
from micall.utils.contig_stitcher_context import StitcherContext
import micall.utils.contig_stitcher_events as events
from micall.utils.contig_stitcher_context import ReferencefullStitcherContext
import micall.utils.referencefull_contig_stitcher_events as events
from micall.data.landmark_reader import LandmarkReader


Expand Down Expand Up @@ -401,7 +401,7 @@ def build_coverage_figure(genome_coverage_csv, blast_csv=None, use_concordance=F


def plot_stitcher_coverage(logs: Iterable[events.EventType], genome_coverage_svg_path: str):
with StitcherContext.stage():
with ReferencefullStitcherContext.stage():
f = build_stitcher_figure(logs)
f.show(w=970).save_svg(genome_coverage_svg_path, context=draw.Context(invert_y=True))
return f
Expand Down Expand Up @@ -555,11 +555,11 @@ def hit_to_insertions(contig: GenotypedContig, hit: CigarHit):
yield CigarHit.from_default_alignment(q_st=hit.q_ei + 1, q_ei=len(contig.seq) - 1,
r_st=hit.r_ei + 1, r_ei=hit.r_ei)

def hits_to_insertions(contig: GenotypedContig, hits: List[CigarHit]):
def hits_to_insertions(contig: GenotypedContig, hits: Iterable[CigarHit]):
for hit in hits:
yield from hit_to_insertions(contig, hit)

def record_initial_hit(contig: GenotypedContig, hits: List[CigarHit]):
def record_initial_hit(contig: GenotypedContig, hits: Sequence[CigarHit]):
insertions = [gap for gap in hits_to_insertions(contig, hits)]
unaligned_map[contig.id] = insertions

Expand Down
27 changes: 21 additions & 6 deletions micall/drivers/sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
from micall.core.aln2counts import aln2counts
from micall.core.amplicon_finder import write_merge_lengths_plot, merge_for_entropy
from micall.core.cascade_report import CascadeReport
from micall.core.contig_stitcher import contig_stitcher
from micall.core.coverage_plots import coverage_plot, concordance_plot
from micall.core.plot_contigs import plot_genome_coverage
from micall.core.prelim_map import prelim_map
Expand All @@ -21,6 +20,10 @@
from micall.g2p.fastq_g2p import fastq_g2p, DEFAULT_MIN_COUNT, MIN_VALID, MIN_VALID_PERCENT
from micall.utils.driver_utils import makedirs
from micall.utils.fasta_to_csv import fasta_to_csv
from micall.utils.csv_to_fasta import csv_to_fasta, NoContigsInCSV
from micall.utils.referencefull_contig_stitcher import referencefull_contig_stitcher
from micall.utils.referenceless_contig_stitcher import referenceless_contig_stitcher
from micall.utils.cat import cat as concatenate_files
from contextlib import contextmanager

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -150,7 +153,7 @@ def get_default_path(self, output_name):
if self.scratch_path is None:
raise AttributeError(
'Unknown output {} and no scratch path.'.format(output_name))
for extension in ('csv', 'fastq', 'pdf', 'svg', 'png'):
for extension in ('csv', 'fastq', 'pdf', 'svg', 'png', 'fasta'):
if output_name.endswith('_'+extension):
file_name = output_name[:-(len(extension)+1)] + '.' + extension
break
Expand Down Expand Up @@ -426,18 +429,30 @@ def run_denovo(self, excluded_seeds):
merged_contigs_csv,
)

with open(self.merged_contigs_csv, 'r'):
try:
csv_to_fasta(self.merged_contigs_csv, Path(self.merged_contigs_fasta))
except NoContigsInCSV:
Path(self.merged_contigs_fasta).touch()

concatenate_files(inputs=[self.unstitched_contigs_fasta,
self.merged_contigs_fasta],
output=self.combined_contigs_fasta)

with open(self.combined_contigs_fasta, 'r') as combined_contigs_fasta, \
open(self.stitched_contigs_fasta, 'w') as stitched_contigs_fasta:
referenceless_contig_stitcher(combined_contigs_fasta, stitched_contigs_fasta)

with open(self.unstitched_contigs_csv, 'w') as unstitched_contigs_csv, \
open(self.merged_contigs_csv, 'r') as merged_contigs_csv, \
open(self.blast_csv, 'w') as blast_csv:
fasta_to_csv(Path(self.unstitched_contigs_fasta),
fasta_to_csv(Path(self.stitched_contigs_fasta),
unstitched_contigs_csv,
merged_contigs_csv,
blast_csv=blast_csv,
)

with open(self.unstitched_contigs_csv, 'r') as unstitched_contigs_csv, \
open(self.contigs_csv, 'w') as contigs_csv:
contig_stitcher(unstitched_contigs_csv, contigs_csv, self.stitcher_plot_svg)
referencefull_contig_stitcher(unstitched_contigs_csv, contigs_csv, self.stitcher_plot_svg)

logger.info('Running remap on %s.', self)
if self.debug_remap:
Expand Down
11 changes: 8 additions & 3 deletions micall/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,14 @@
"micall/monitor/update_qai.py",
"micall/monitor/micall_watcher.py",
"micall/tcr/igblast.py",
"micall/utils/find_maximum_overlap.py",
"micall/utils/csv_to_fasta.py",
"micall/utils/cat.py",
"micall/utils/fasta_to_fastq.py",
"micall/utils/append_primers.py",
"micall/utils/randomize_fastq.py",
"micall/utils/calculate_kmer_frequencies.py",
"micall/utils/fastq_to_fasta.py",
"micall/utils/analyze_kive_batches/analyze_kive_batches.py",
]

Expand Down Expand Up @@ -120,10 +125,10 @@ def execute_module_as_main(module_name: str, arguments: Sequence[str]) -> int:


def get_version() -> str:
if __package__ is None:
return "development"
else:
try:
return str(version(__package__))
except BaseException:
return "development"


def get_parser() -> argparse.ArgumentParser:
Expand Down
Loading