Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace remaining LSB protocol implementations with ordinary functions #293

Merged
merged 8 commits into from
Dec 11, 2023

Conversation

athos
Copy link
Member

@athos athos commented Dec 4, 2023

This is the last step of #289.

This PR adds the following implementations of LSB protocols as ordinary functions:

  • reader
    • DataInput (defined in cljam.io.util.lsb.data-io)
    • InputStream (defined in cljam.io.util.lsb.io-stream)
  • writer
    • OutputStream (defined in cljam.io.util.lsb.io-stream)

The cljam.io.util.lsb namespace has some other implementations as well, but they are no longer in use.

The following table shows the implementations used before and after the change:

  • reader

    before change after change
    cljam.io.bam.core DataInputStream (DataInput) cljam.io.util.lsb.data-io
    cljam.io.bam.reader DataInputStream (DataInput) cljam.io.util.lsb.data-io
    cljam.io.bam-index.reader FileInputStream (InputStream) cljam.io.util.lsb.io-stream
    cljam.io.bcf.reader BGZFInputStream (InputStream) cljam.io.util.lsb.io-stream
    cljam.io.bigwig RandomAccessFile (DataInput) cljam.io.util.lsb.data-io
    cljam.io.csi DataInputStream (DataInput) cljam.io.util.lsb.data-io
    cljam.io.tabix DataInputStream (DataInput) cljam.io.util.lsb.data-io
  • writer

    before change after change
    cljam.algo.convert ByteBuffer cljam.io.util.lsb.io-stream
    cljam.io.bam.encoder DataOutputStream cljam.io.util.lsb.io-stream
    cljam.io.bam.writer DataOutputStream cljam.io.util.lsb.io-stream
    cljam.io.bam-index.writer DataOutputStream cljam.io.util.lsb.io-stream
    cljam.io.bcf.writer DataOutputStream cljam.io.util.lsb.io-stream
    cljam.io.csi DataOutputStream cljam.io.util.lsb.io-stream

I've confirmed that this change doesn't have significant negative impact on the performance of the related readers/writers (This PR also includes a tweak of the benchmark code for the BAM reader to make it more stable and reliable by reducing extra overhead of sequence allocation in the benchmark code itself):

before change after change
BAM reader 5.835351 s 5.677128 s
BAM writer 12.149866 s 12.076689 s
BCF reader 6.469314 s 6.468553 s
BCF writer 1.641983 s 1.676683 s
cljam.algo.convert 182.333152 ms 180.971159 ms
BigWig reader 84.117612 μs 83.471144 μs
CSI reader 22.210736 μs 21.966407 μs
CSI writer 127.886597 μs 127.627769 μs
Tabix reader 24.434094 μs 25.902402 μs

See below for more about the benchmarking details:

benchmarking details

For those with existing benchmark code, I benchmarked them using the following lein libra commands:

target benchmarking command
BAM reader lein libra :only cljam.io.sam-bench/decode-bam-alignment-long-bench
BAM writer lein libra :only cljam.io.sam-bench/encode-alignment-long-bench
BCF reader lein libra :only cljam.io.vcf-bench/decode-variant-large-bcf-bench
BCF writer lein libra :only cljam.io.vcf-bench/encode-variant-large-bcf-bench
cljam.algo.convert lein libra :only cljam.algo.convert-bench

For the others, I used the following tiny benchmark code:

(require '[criterium.core :as cr]))

;; BigWig reader
(require '[cljam.io.bigwig :as bigwig])
(cr/quick-bench
 (with-open [r (bigwig/reader "test-resources/bigwig/test-bedgraph.bigWig")]
   (dorun (bigwig/read-tracks r))))

(require '[cljam.io.csi :as csi])
;; CSI reader
(cr/quick-bench (csi/read-index "test-resources/csi/test-v4_3-complex.vcf.gz.csi"))
;; CSI writer
(let [idx (csi/read-index "test-resources/csi/test-v4_3-complex.vcf.gz.csi")]
  (cr/quick-bench (csi/write-index "out.csi" idx)))

;; Tabix reader
(require '[cljam.io.tabix :as tbi])
(cr/quick-bench (tbi/read-index "test-resources/tabix/test.gtf.gz.tbi"))

@athos athos self-assigned this Dec 4, 2023
@athos athos mentioned this pull request Dec 4, 2023
3 tasks
Copy link

codecov bot commented Dec 4, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (d6d1af8) 88.31% compared to head (8612caa) 87.94%.

Files Patch % Lines
src/cljam/io/csi.clj 73.91% 0 Missing and 6 partials ⚠️
src/cljam/io/util/lsb/io_stream.clj 98.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #293      +/-   ##
==========================================
- Coverage   88.31%   87.94%   -0.37%     
==========================================
  Files          81       83       +2     
  Lines        7058     7192     +134     
  Branches      488      488              
==========================================
+ Hits         6233     6325      +92     
- Misses        337      379      +42     
  Partials      488      488              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@athos athos force-pushed the feature/2bit-writer-rewrite branch from 231298a to a2a001d Compare December 5, 2023 01:31
@athos athos force-pushed the feature/lsb-revamp branch 2 times, most recently from 8fb577a to a8993b8 Compare December 5, 2023 03:04
@athos athos changed the base branch from feature/2bit-writer-rewrite to master December 5, 2023 03:04
@athos athos marked this pull request as ready for review December 5, 2023 03:07
@athos athos requested review from alumi and a team as code owners December 5, 2023 03:07
@athos athos requested review from niyarin and removed request for a team December 5, 2023 03:07
Copy link
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Now that cljam.io.util.lsb is no longer used, we can either remove the ns and cut a new release 0.9.0, or deprecate functions in it and keep the version 0.8.5.
I've searched our internal repositories and found just one function depending on the ns, so I think it's almost safe to remove it. Could you tell me what you think about this?

@athos athos force-pushed the feature/lsb-revamp branch from a8993b8 to 61749bb Compare December 8, 2023 02:50
@athos athos force-pushed the feature/lsb-revamp branch from 61749bb to 8612caa Compare December 8, 2023 02:51
@athos
Copy link
Member Author

athos commented Dec 8, 2023

I’d prefer more graceful deprecation, so I've added the :deprecated attr to cljam.io.util.lsb.

I also added another commit to add notes that warn use of the new lsb namespaces to discourage undesired abuse of them from outside cljam in the future.

Copy link
Contributor

@niyarin niyarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@niyarin niyarin merged commit a514b2e into master Dec 11, 2023
17 checks passed
@niyarin niyarin deleted the feature/lsb-revamp branch December 11, 2023 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants