-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CRAM writer (alpha) #313
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #313 +/- ##
==========================================
+ Coverage 88.85% 89.28% +0.43%
==========================================
Files 93 97 +4
Lines 8091 8841 +750
Branches 465 481 +16
==========================================
+ Hits 7189 7894 +705
- Misses 437 466 +29
- Partials 465 481 +16 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this feature!
I think both the design and the implementation are excellent 👍👍
I have added one minor comment about the document, so please check it 🙏
src/cljam/io/cram/data_series.clj
Outdated
a map {<data series name> <encoder>}, where: | ||
- <data series name>: a keyword representing the data series name | ||
- <encoding>: a map representing the encoding of the data series | ||
- <encoder>: a function that takes one argument to encode" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The encoder
is actually an overloaded function that acts as both a single-argument function that performs encoding and a zero-argument function that returns the encoding result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It slipped my mind when I was writing these docstrings. Thanks for pointing that out 🙏
Fixed in 800eecf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review.
LGTM!
This PR adds an implementation of the CRAM writer, based on CRAM v3.1.
Like the initial implementation of the CRAM reader, the current CRAM writer is quite naive and supports only a limited set of features. Additionally, its performance has not been tuned yet.
The structure of the CRAM writer mirrors that of the CRAM reader in many ways, making it easy to understand for those familiar with the CRAM reader's implementation.
The biggest difference can be found in
cljam.io.cram.writer
, which scans and aggregates the records in slices and containers to calculate various header components. Some of these might be moved to separate namespaces in the future.Currently, the following features are not supported (roughly ordered by priority):
AP
data seriesAlso, the PR contains the following commits for miscellaneous fixes within the CRAM reader:
:external-id
to:content-id
for consistency with the CRAM writer:bases
case forrecord-cigar