Support for delta encoding of AP data series for sorted CRAM files #323
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for delta encoding of the
AP
data series for position-sorted CRAM files.The CRAM specification says that if the
AP
flag is set in the container's compression header, delta encoding for the AP data series will be enabled. When enabled, theAP
data series encodes the difference in the:pos
field between each two consecutive records. For the first record in a slice, the difference is calculated relative to the alignment start in the slice header. See the following sections in the CRAM specification for more details:In this implementation,
AP
delta encoding is enabled if (and only if) the CRAM file is declared asSO:coordinate
in the CRAM header. With this change, alignment stats, which were previously calculated during record encoding, must be calculated during preprocessing. The calculated alignment stats are now passed to record encoding via container context/slice contexts.To cover more comprehensive testing scenarios for
AP
delta encoding, the existing test resourcemedium.cram
has been re-encoded withAP
delta encoding enabled. The re-encoding was done using the following script, which is a modified version of the script previously used: