-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathpapers.json
3234 lines (3234 loc) · 353 KB
/
papers.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{
"M004": {
"abstract": "Using state-of-the-art deep learning (DL) models to diagnose cancer from histology data presents several challenges related to the nature and availability of labeled histology images, including image size, stain variations, and label ambiguity. In addition, cancer grading and the localization of regions of interest (ROIs) in such images normally rely on both image- and pixel-level labels, with the latter requiring a costly annotation process. Deep weakly-supervised object localization (WSOL) methods provide different strategies for low-cost training of DL models. Given only image-class annotations, these methods can be trained to simultaneously classify an image, and yield class activation maps (CAMs) for ROI localization. This paper provides a review of deep WSOL methods to identify and locate diseases in histology images, without the need for pixel-level annotations. We propose a taxonomy in which these methods are divided into bottom-up and top-down methods according to the information flow in models. Although the latter have seen only limited progress, recent bottom-up methods are currently driving a lot of progress with the use of deep WSOL methods. Early works focused on designing different spatial pooling functions. However, those methods quickly peaked in term of localization accuracy and revealed a major limitation, namely, \u2013 the under-activation of CAMs, which leads to high false negative localization. Subsequent works aimed to alleviate this shortcoming and recover the complete object from the background, using different techniques such as perturbation, self-attention, shallow features, pseudo-annotation, and task decoupling.<br>In the present paper, representative deep WSOL methods from our taxonomy are also evaluated and compared in terms of classification and localization accuracy using two challenging public histology datasets \u2013 one for colon cancer (GlaS), and a second, for breast cancer (CAMELYON16). Overall, the results indicate poor localization performance, particularly for generic methods that were initially designed to process natural images. Methods designed to address the challenges posed by histology data often use priors such as ROI size, or additional pixel-wise supervision estimated from a pre-trained classifier, allowing them to achieve better results. However, all the methods suffer from high false positive/negative localization. Classification performance is mainly affected by the model selection process, which uses either the classification or the localization metric. Finally, four key challenges are identified in the application of deep WSOL methods in histology, namely, \u2013 under-/over-activation of CAMs, sensitivity to thresholding, and model selection \u2013 and research avenues are provided to mitigate them. Our code is publicly available at <a href='https://github.com/jeromerony/survey_wsl_histology'>https://github.com/jeromerony/survey_wsl_histology</a>",
"authors": "J\u00e9r\u00f4me Rony, Soufiane Belharbi, Jose Dolz, Ismail, Ben Ayed, Luke McCaffrey, Eric Granger",
"award": null,
"id": "4",
"melba": "True",
"or_id": "",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W46",
"schedule": "Wednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Deep Weakly-Supervised Learning Methods for Classification and Localization in Histology Images: A Survey",
"yt_full": ""
},
"M018": {
"abstract": "We propose neural network layers that explicitly combine frequency and image feature representations and show that they can be used as a versatile building block for reconstruction from frequency space data. Our work is motivated by the challenges arising in MRI acquisition where the signal is a corrupted Fourier transform of the desired image. The proposed joint learning schemes enable both correction of artifacts native to the frequency space and manipulation of image space representations to reconstruct coherent image structures at every layer of the network. This is in contrast to most current deep learning approaches for image reconstruction that treat frequency and image space features separately and often operate exclusively in one of the two spaces. We demonstrate the advantages of joint convolutional learning for a variety of tasks, including motion correction, denoising, reconstruction from undersampled acquisitions, and combined undersampling and motion correction on simulated and real world multicoil MRI data. The joint models produce consistently high quality output images across all tasks and datasets. When integrated into a state of the art unrolled optimization network with physics-inspired data consistency constraints for undersampled reconstruction, the proposed architectures significantly improve the optimization landscape, which yields an order of magnitude reduction of training time. This result suggests that joint representations are particularly well suited for MRI signals in deep learning networks. Our code and pretrained models are publicly available at <a href='https://github.com/nalinimsingh/interlacer'>https://github.com/nalinimsingh/interlacer</a>.",
"authors": "Nalini M., Singh, Juan Eugenio, Iglesias, Elfar Adalsteinsson, Adrian V., Dalca, Polina Golland",
"award": null,
"id": "18",
"melba": "True",
"or_id": "",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W45",
"schedule": "Wednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Joint Frequency and Image Space Learning for MRI Reconstruction and Analysis",
"yt_full": ""
},
"O006": {
"abstract": "Interpreting deep learning models typically relies on post-hoc saliency map techniques. However, these techniques often fail to serve as actionable feedback to clinicians, and they do not directly explain the decision mechanism. Here, we propose an inherently interpretable model that combines the feature extraction capabilities of deep neural networks with advantages of sparse linear models in interpretability. Our approach relies on straightforward but effective changes to a deep bag-of-local-features model (BagNet). These modifications lead to fine-grained and sparse class evidence maps which, by design, correctly reflect the model's decision mechanism. Our model is particularly suited for tasks which rely on characterising regions of interests that are very small and distributed over the image. In this paper, we focus on the detection of Diabetic Retinopathy, which is characterised by the progressive presence of small retinal lesions on fundus images. We observed good classification accuracy despite our added sparseness constraint. In addition, our model precisely highlighted retinal lesions relevant for the disease grading task and excluded irrelevant regions from the decision mechanism. The results suggest our sparse BagNet model can be a useful tool for clinicians as it allows efficient inspection of the model predictions and facilitates clinicians' and patients' trust.",
"authors": "Kerol R. Djoumessi Donteu, Indu Ilanchezian, Laura K\u00fchlewein, Hanna Faber, Christian F. Baumgartner, Bubacarr Bah, Philipp Berens, Lisa M. Koch",
"award": "",
"id": "6",
"melba": "False",
"or_id": "us8BFTsWOq",
"oral": "True",
"pmlr_url": "",
"poster_loc": "W08",
"schedule": "Wednesday, July 12: Oral session 8 - Computer-assisted diagnosis \u2014 14:00\u201315:00\nWednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Sparse Activations for Interpretable Disease Grading",
"yt_full": ""
},
"O007": {
"abstract": "Breast cancer is the most commonly diagnosed cancer and the use of artificial intelligence (AI) to help diagnose the disease from digital pathology images has the potential to greatly improve patient outcomes. However, current methods for detecting, segmenting, and sub-typing breast neoplasms and other proliferative lesions often rely on costly and time-consuming manual annotation efforts, which can be impractical for large-scale datasets. In this work, we propose an annotation-free learning framework to jointly detect, segment, and subtype breast neoplasms. Our approach leverages top-k multiple instance learning to train an initial neoplasm detection backbone network from weakly-labeled whole slide images, which is then used to automatically generate pixel-level pseudo-labels for whole slides with only one subtype. A second network is trained using these pseudo-labels, and slide-level classification is performed by training an aggregator network that fuses the embeddings from both backbone networks. We trained and validated our framework on large-scale datasets with more than 100k whole slide images and demonstrate its effectiveness on tasks including breast neoplasms detection, segmentation, and subtyping.",
"authors": "Adam Casson, Siqi Liu, Ran A Godrich, Hamed Aghdam, Brandon Rothrock, Kasper Malfroid, Christopher Kanan, Thomas Fuchs",
"award": "",
"id": "7",
"melba": "False",
"or_id": "rXVtHHFLRIz",
"oral": "True",
"pmlr_url": "",
"poster_loc": "W26",
"schedule": "Wednesday, July 12: Oral session 7 - Segmentation 2 \u2014 9:30\u201310:15\nWednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Joint Breast Neoplasm Detection and Subtyping using Multi-Resolution Network Trained on Large-Scale H&E Whole Slide Images with Weak Labels",
"yt_full": ""
},
"O021": {
"abstract": "The main benefit of unsupervised anomaly detection is the ability to identify arbitrary, rare data instances of pathologies even in the absence of training labels or sufficient examples of the rare class(es). In the clinical workflow, such methods have the potential to assist in screening and pre-filtering exams for potential pathologies and thus meaningfully support radiologists. Even though much work has been done on using auto-encoders (AE) for anomaly detection, there are still two critical challenges to overcome: First, learning compact and detailed representations of the healthy distribution is cumbersome. Recent work shows that AEs can reconstruct some types of anomalies even better than actual samples from the training distribution. Second, while the majority of unsupervised algorithms are tailored to detect hyperintense lesions on FLAIR brain MR scans, recent improvements in basic intensity thresholding techniques have outperformed them. Moreover, we found that even state-of-the-art (SOTA) AEs fail to detect several classes of non-hyperintense anomalies on T1w brain MRIs, such as brain atrophy, edema, or resections. In this work, we propose reversed AEs (RA) to generate pseudo-healthy reconstructions and localize various brain pathologies. We extensively evaluate our method on T1w brain scans and increase the detection of global pathology and artefacts from 73.1 to 89.4 AUROC and the amount of detected local pathologies from 52.6% to 86.0% compared to SOTA methods.",
"authors": "Cosmin I. Bercea, Benedikt Wiestler, Daniel Rueckert, Julia A Schnabel",
"award": "",
"id": "21",
"melba": "False",
"or_id": "8ojx-Ld3yjR",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M06",
"schedule": "Monday, July 10: Oral session 2 - Unsupervised/weakly supervised methods \u2014 14:00\u201315:00\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Generalizing Unsupervised Anomaly Detection: Towards Unbiased Pathology Screening",
"yt_full": ""
},
"O024": {
"abstract": "This paper explores training medical vision-language models (VLMs) -- where the visual and language inputs are embedded into a common space -- with a particular focus on scenarios where training data is limited, as is often the case in clinical datasets. We explore several candidate methods to improve low-data performance, including: (i) adapting generic pre-trained models to novel image and text domains (i.e.\\ medical imaging and reports) via unimodal self-supervision; (ii) using local (e.g.\\ GLoRIA) \\& global (e.g. InfoNCE) contrastive loss functions as well as a combination of the two; (iii) extra supervision during VLM training, via: (a) image- and text-only self-supervision, and (b) creating additional positive image-text pairs for training through augmentation and nearest-neighbour search. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports. Combined, they significantly improve retrieval compared to fine-tuning CLIP, roughly equivalent to training with $10\\times$ the data. A similar pattern is found in the downstream task classification of CXR-related conditions with our method outperforming CLIP and also BioVIL, a strong CXR VLM benchmark, in the zero-shot and linear probing settings. We conclude with a set of recommendations for researchers aiming to train vision-language models on other medical imaging modalities when training data is scarce. To facilitate further research, we will make our code and models publicly available. ",
"authors": "Rhydian Windsor, Amir Jamaludin, Timor Kadir, Andrew Zisserman",
"award": "",
"id": "24",
"melba": "False",
"or_id": "2XVITHcQCfj",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T08",
"schedule": "Tuesday, July 11: Oral session 5 - Semi-supervised/self-supervised methods \u2014 14:00\u201315:00\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime",
"yt_full": ""
},
"O026": {
"abstract": "Multiplex Immunohistochemistry (mIHC) is a cost-effective and accessible method for in situ labeling of multiple protein biomarkers in a tissue sample. By assigning a different stain to each biomarker, it allows the visualization of different types of cells within the tumor vicinity for downstream analysis. However, to detect different types of stains in a given mIHC image is a challenging problem, especially when the number of stains is high. Previous deep-learning-based methods mostly assume full supervision; yet the annotation can be costly. In this paper, we propose a novel unsupervised stain decomposition method to detect different stains simultaneously. Our method does not require any supervision, except for color samples of different stains. A main technical challenge is that the problem is underdetermined and can have multiple solutions. To conquer this issue, we propose a novel inversion regulation technique, which eliminates most undesirable solutions. On a 7-plexed IHC image dataset, the proposed method achieves high quality stain decomposition results without human annotation.",
"authors": "Shahira Abousamra, Danielle Fassler, Jiachen Yao, Rajarsi R. Gupta, Tahsin Kurc, Luisa Escobar-Hoyos, Dimitris Samaras, Kenneth Shroyer, Joel Saltz, Chao Chen",
"award": "",
"id": "26",
"melba": "False",
"or_id": "J0VD-I2IOOg",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M02",
"schedule": "Monday, July 10: Oral session 2 - Unsupervised/weakly supervised methods \u2014 14:00\u201315:00\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Unsupervised Stain Decomposition via Inversion Regulation for Multiplex Immunohistochemistry Images",
"yt_full": ""
},
"O029": {
"abstract": "Concerns about the reproducibility of deep learning research are more prominent than ever, with no clear solution in sight. The Medical Imaging with Deep Learning (MIDL) conference has made advancements in employing empirical rigor with regards to reproducibility by advocating open access, and recently also recommending authors to make their code public---both aspects being adopted by the majority of the conference submissions. We have evaluated all accepted full paper submissions to MIDL between 2018 and 2022 using established, but adjusted guidelines addressing the reproducibility and quality of the public repositories. The evaluations show that publishing repositories and using public datasets are becoming more popular, which helps traceability, but the quality of the repositories shows room for improvement in every aspect. Merely 22% of all submissions contain a repository that was deemed repeatable using our evaluations. From the commonly encountered issues during the evaluations, we propose a set of guidelines for machine learning-related research for medical imaging applications, adjusted specifically for future submissions to MIDL. We presented our results to future MIDL authors who were eager to continue an open discussion on the topic of code reproducibility.",
"authors": "Attila Simk\u00f3, Anders Garpebring, Joakim Jonsson, Tufve Nyholm, Tommy L\u00f6fstedt",
"award": "",
"id": "29",
"melba": "False",
"or_id": "_P59zCfXOt",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M07",
"schedule": "Monday, July 10: MIDL board special session \u2014 10:30\u201311:00\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\nWednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/O029.pdf",
"title": "Reproducibility of the Methods in Medical Imaging with Deep Learning.",
"yt_full": "https://youtu.be/bNSsm0ptYmQ"
},
"O037": {
"abstract": "Due to the low signal-to-noise ratio and limited resolution of functional MRI data, and the high complexity of natural images, reconstructing a visual stimulus from human brain fMRI measurements is a challenging task. In this work, we propose a novel approach for this task, which we call Cortex2Image, to decode visual stimuli with high semantic fidelity and rich fine-grained detail. In particular, we train a surface-based convolutional network model that maps from brain response to semantic image features first (Cortex2Semantic). We then combine this model with a high-quality image generator (Instance-Conditioned GAN) to train another mapping from brain response to fine-grained image features using a variational approach (Cortex2Detail). Image reconstructions obtained by our proposed method achieve state-of-the-art semantic fidelity, while yielding good fine-grained similarity with the ground-truth stimulus. Our code is available on \\url{https://github.com/zijin-gu/meshconv-decoding.git}.",
"authors": "Zijin Gu, Keith Jamison, Amy Kuceyeski, Mert R. Sabuncu",
"award": "",
"id": "37",
"melba": "False",
"or_id": "V5vvti2Y9PA",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T05",
"schedule": "Tuesday, July 11: Oral session 4 - Neuroimaging \u2014 9:00\u201310:15\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Decoding natural image stimuli from fMRI data with a surface-based convolutional network",
"yt_full": ""
},
"O039": {
"abstract": "Three-dimensional segmentation in magnetic resonance images (MRI), which reflects the true shape of the objects, is challenging since high-resolution isotropic MRIs are rare and typical MRIs are anisotropic, with the out-of-plane dimension having a much lower resolution. A potential remedy to this issue lies in the fact that often multiple sequences are acquired on different planes. However, in practice, these sequences are not orthogonal to each other, limiting the applicability of many previous solutions to reconstruct higher-resolution images from multiple lower-resolution ones. We propose a novel deep learning-based solution to generating high-resolution masks from multiple low-resolution images. Our method combines segmentation and unsupervised registration networks by introducing two new regularizations to make registration and segmentation reinforce each other. Finally, we introduce a multi-view fusion method to generate high-resolution target object masks. The experimental results on two datasets show the superiority of our methods. Importantly, the advantage of not using high-resolution images in the training process makes our method applicable to a wide variety of MRI segmentation tasks.",
"authors": "Hanxue Gu, Hongyu He, Roy Colglazier, Jordan Axelrod, Robert French, Maciej A Mazurowski",
"award": "",
"id": "39",
"melba": "False",
"or_id": "oi5psB9R_l",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M05",
"schedule": "Monday, July 10: Oral session 1 - Segmentation 1 \u2014 9:30\u201310:00\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "SuperMask: Generating High-resolution object masks from multi-view, unaligned low-resolution MRIs",
"yt_full": ""
},
"O047": {
"abstract": "Tagged magnetic resonance imaging (MRI) has been used for decades to observe and quantify the detailed motion of deforming tissue. However, this technique faces several challenges such as tag fading, large motion, long computation times, and difficulties in obtaining diffeomorphic incompressible flow fields. To address these issues, this paper presents a novel unsupervised phase-based 3D motion estimation technique for tagged MRI. We introduce two key innovations. First, we apply a sinusoidal transformation to the harmonic phase input, which enables end-to-end training and avoids the need for phase interpolation. Second, we propose a Jacobian determinant-based learning objective to encourage incompressible flow fields for deforming biological tissues. Our method efficiently estimates 3D motion fields that are accurate, dense, and approximately diffeomorphic and incompressible. The efficacy of the method is assessed using human tongue motion during speech, and includes both healthy controls and patients that have undergone glossectomy. We show that the method outperforms existing approaches, and also exhibits improvements in speed, robustness to tag fading, and large tongue motion. The code is available.",
"authors": "Zhangxing Bian, Fangxu Xing, Jinglun Yu, Muhan Shao, Yihao Liu, Aaron Carass, Jonghye Woo, Jerry L Prince",
"award": "",
"id": "47",
"melba": "False",
"or_id": "jkSC4UHHVzy",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M08",
"schedule": "Monday, July 10: Oral session 2 - Unsupervised/weakly supervised methods \u2014 14:00\u201315:00\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "DRIMET: Deep Registration-based 3D Incompressible Motion Estimation in Tagged-MRI with Application to the Tongue",
"yt_full": ""
},
"O058": {
"abstract": "Grading precancerous lesions on whole slide images is a challenging task: the continuous space of morphological phenotypes makes clear-cut decisions between different grades often difficult, leading to low inter- and intra-rater agreements. More and more Artificial Intelligence (AI) algorithms are developed to help pathologists perform and standardize their diagnosis. However, those models can render their prediction without consideration of the ambiguity of the classes and can fail without notice which prevent their wider acceptance in a clinical context. In this paper, we propose a new score to measure the confidence of AI models in grading tasks. Our confidence score is specifically adapted to ordinal output variables, is versatile and does not require extra training or additional inferences nor particular architecture changes. Comparison to other popular techniques such as Monte Carlo Dropout and deep ensembles shows that our method provides state-of-the art results, while being simpler, more versatile and less computationally intensive. The score is also easily interpretable and consistent with real life hesitations of pathologists. We show that the score is capable of accurately identifying mispredicted slides and that accuracy for high confidence decisions is significantly higher than for low-confidence decisions (gap in AUC of 17.1\\% on the test set). We believe that the proposed confidence score could be leveraged by pathologists directly in their workflow and assist them on difficult tasks such as grading precancerous lesions.",
"authors": "Melanie Lubrano, Ya\u00eblle Bellahsen Harrar, Rutger RH Fick, C\u00e9cile Badoual, Thomas Walter",
"award": "",
"id": "58",
"melba": "False",
"or_id": "DA1hOTvcMWa",
"oral": "True",
"pmlr_url": "",
"poster_loc": "W06",
"schedule": "Wednesday, July 12: Oral session 8 - Computer-assisted diagnosis \u2014 14:00\u201315:00\nWednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Simple and Efficient Confidence Score for Grading Whole Slide Images",
"yt_full": ""
},
"O063": {
"abstract": "Accuracy validation of cortical thickness measurement is a difficult problem due to the lack of ground truth data. To address this need, many methods have been developed to synthetically induce gray matter (GM) atrophy in an MRI via deformable registration, creating a set of images with known changes in cortical thickness. However, these methods often cause blurring in atrophied regions, and cannot simulate realistic atrophy within deep sulci where cerebrospinal fluid (CSF) is obscured or absent. In this paper, we present a solution using a self-supervised inpainting model to generate CSF in these regions and create images with more plausible GM/CSF boundaries. Specifically, we introduce a novel, 3D GAN model that incorporates patch-based dropout training, edge map priors, and sinusoidal positional encoding, all of which are established methods previously limited to 2D domains. We show that our framework significantly improves the quality of the resulting synthetic images and is adaptable to unseen data with fine-tuning. We also demonstrate that our resulting dataset can be employed for accuracy validation of cortical segmentation and thickness measurement.",
"authors": "Jiacheng Wang, Kathleen Larson, Ipek Oguz",
"award": "",
"id": "63",
"melba": "False",
"or_id": "HR1GtDQnuw",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T06",
"schedule": "Tuesday, July 11: Oral session 5 - Semi-supervised/self-supervised methods \u2014 14:00\u201315:00\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Self-Supervised CSF Inpainting for Improved Accuracy Validation of Cortical Surface Analyses ",
"yt_full": ""
},
"O069": {
"abstract": "We focus on the problem of producing well-calibrated out-of-distribution (OOD) detectors, in order to enable safe deployment of medical image classifiers. Motivated by the difficulty of curating suitable calibration datasets, synthetic augmentations have become highly prevalent for inlier/outlier specification. While there have been rapid advances in data augmentation techniques, this paper makes a striking finding that the space in which the inliers and outliers are synthesized, in addition to the type of augmentation, plays a critical role in calibrating OOD detectors. Using the popular energy-based OOD detection framework, we find that the optimal protocol is to synthesize latent-space inliers along with diverse pixel-space outliers. Based on empirical studies with multiple medical imaging benchmarks, we demonstrate that our approach consistently leads to superior OOD detection ($15\\% - 35\\%$ in AUROC) over the state-of-the-art in a variety of open-set recognition settings.",
"authors": "Vivek Narayanaswamy, Yamen Mubarka, Rushil Anirudh, Deepta Rajan, Andreas Spanias, Jayaraman J. Thiagarajan",
"award": "",
"id": "69",
"melba": "False",
"or_id": "RU7fr0-M8N",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T12",
"schedule": "Tuesday, July 11: Oral session 6 - Synthesis \u2014 16:00\u201316:45\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Know Your Space: Inlier and Outlier Construction for Calibrating Medical OOD Detectors",
"yt_full": ""
},
"O077": {
"abstract": "We harness a Transformer-based model and a pre-training procedure for fingerprinting on fMRI data, to enhance the accuracy of stress predictions. Our model, called MetricFMRI, first optimizes a pixel-based reconstruction loss. In a second unsupervised training phase, a triplet loss is used to encourage fMRI sequences of the same subject to have closer representations, while sequences from different subjects are pushed away from each other. Finally, supervised learning is used for the target task, based on the learned representation. We evaluate the performance of our model and other alternatives and conclude that the triplet training for the fingerprinting task is key to the improved accuracy of our method for the task of stress prediction. To obtain insights regarding the learned model, gradient-based explainability techniques are used, indicating that sub-cortical brain regions that are known to play a central role in stress-related processes are highlighted by the model. ",
"authors": "Gony Rosenman, Itzik Malkiel, Ayam Greental, Talma Hendler, Lior Wolf",
"award": "",
"id": "77",
"melba": "False",
"or_id": "W9qI8DwoUFF",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T09",
"schedule": "Tuesday, July 11: Oral session 4 - Neuroimaging \u2014 9:00\u201310:15\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Pre-Training Transformers for Fingerprinting to Improve Stress Prediction in fMRI",
"yt_full": ""
},
"O088": {
"abstract": "Colon resection is often the treatment of choice for colorectal cancer (CRC) patients. However, especially for minimally invasive cancer, such as pT1, simply removing the polyps may be enough to stop cancer progression. Different histopathological risk factors such as tumor grade and invasion depth currently found the basis for the need for colon resection in pT1 CRC patients. Here, we investigate two additional risk factors, tumor budding and lymphocyte infiltration at the invasive front, which are known to be clinically relevant. We capture the spatial layout of tumor buds and T-cells and use graph-based deep learning to investigate them as potential risk predictors. Our pT1 Hotspot Tumor Budding T-cell Graph (pT1-HBTG) dataset consists of 626 tumor budding hotspots from 575 patients. We propose and compare three different graph structures, as well as combinations of the node labels. The best-performing Graph Neural Network architecture is able to increase specificity by 20% compared to the currently recommended risk stratification based on histopathological risk factors, without losing any sensitivity. We believe that using a graph-based analysis can help to assist pathologists in making risk assessments for pT1 CRC patients, and thus decrease the number of patients undergoing potentially unnecessary surgery. Both the code and dataset are made publicly available.",
"authors": "Linda Studer, JM Bokhorst, I Nagtegaal, Inti Zlobec, Heather Dawson, Andreas Fischer",
"award": "",
"id": "88",
"melba": "False",
"or_id": "ruaXPgZCk6i",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M12",
"schedule": "Monday, July 10: Oral session 3 - Graph-based methods \u2014 16:00\u201316:45\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Tumor Budding T-cell Graphs: Assessing the Need for Resection in pT1 Colorectal Cancer Patients",
"yt_full": ""
},
"O100": {
"abstract": "Deep learning models benefit from training with a large dataset (labeled or unlabeled). Following this motivation, we present an approach to learn a deep learning model for the automatic segmentation of Organs at Risk (OARs) in cervical cancer radiation treatment from a large clinically available dataset of Computed Tomography (CT) scans containing data inhomogeneity, label noise, and missing annotations. We employ simple heuristics for automatic data cleaning to minimize data inhomogeneity and label noise. Further, we develop a semi-supervised learning approach utilizing a teacher-student setup, annotation imputation, and uncertainty-guided training to learn in presence of missing annotations. Our experimental results show that learning from a large dataset with our approach yields a significant improvement in the test performance despite missing annotations in the data. Further, the contours generated from the segmentation masks predicted by our model are found to be equally clinically acceptable as manually generated contours.",
"authors": "Monika Grewal, Dustin van Weersel, Henrike Westerveld, Peter Bosman, Tanja Alderliesten",
"award": "",
"id": "100",
"melba": "False",
"or_id": "uPRFWdz03_",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M01",
"schedule": "Monday, July 10: Oral session 1 - Segmentation 1 \u2014 9:30\u201310:00\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Learning Clinically Acceptable Segmentation of Organs at Risk in Cervical Cancer Radiation Treatment from Clinically Available Annotations",
"yt_full": ""
},
"O125": {
"abstract": "Single-cell high-throughput microscopy images contain key biological information underlying normal and pathological cellular processes. Image-based analysis and profiling are powerful and promising for extracting this information but are made difficult due to substantial complexity and heterogeneity in cellular phenotype. Hand-crafted methods and machine learning models are popular ways to extract cell image information. Representations extracted via machine learning models, which often exhibit good reconstruction performance, lack biological interpretability. Hand-crafted representations, on the contrary, have clear biological meanings and thus are interpretable. Whether these hand-crafted representations can also generate realistic images is not clear. In this paper, we propose a CellProfiler to image (CP2Image) model that can directly generate realistic cell images from CellProfiler representations. We also demonstrate most biological information encoded in the CellProfiler representations is well-preserved in the generating process. This is the first time hand-crafted representations be shown to have generative ability and provide researchers with an intuitive way for their further analysis.",
"authors": "Yanni Ji, Marie Cutiongco, Bj\u00f8rn Sand Jensen, Ke Yuan",
"award": "",
"id": "125",
"melba": "False",
"or_id": "4LqtcKZoKeB",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T10",
"schedule": "Tuesday, July 11: Oral session 6 - Synthesis \u2014 16:00\u201316:45\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "CP2Image: Generating high-quality single-cell images using CellProfiler representations",
"yt_full": ""
},
"O130": {
"abstract": "In this paper, we propose a novel two-component loss for biomedical image segmentation tasks called the Instance-wise and Center-of-Instance (ICI) loss, a loss function that addresses the instance imbalance problem commonly encountered when using pixel-wise loss functions such as the Dice loss. The Instance-wise component improves the detection of small instances or ``blobs\" in image datasets with both large and small instances. The Center-of-Instance component improves the overall detection accuracy. We compared the ICI loss with two existing losses, the Dice loss and the blob loss, in the task of stroke lesion segmentation using the ATLAS R2.0 challenge dataset from MICCAI 2022. Compared to the other losses, the ICI loss provided a better balanced segmentation, and significantly outperformed the Dice loss with an improvement of $1.7-3.7\\%$ and the blob loss by $0.6-5.0\\%$ in terms of the Dice similarity coefficient on both validation and test set, suggesting that the ICI loss is a potential solution to the instance imbalance problem.",
"authors": "Febrian Rachmadi, Charissa Poon, henrik skibbe",
"award": "",
"id": "130",
"melba": "False",
"or_id": "8o83y0_YtE",
"oral": "True",
"pmlr_url": "",
"poster_loc": "W03",
"schedule": "Wednesday, July 12: Oral session 7 - Segmentation 2 \u2014 9:30\u201310:15\nWednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Improving Segmentation of Objects with Varying Sizes in Biomedical Images using Instance-wise and Center-of-Instance Segmentation Loss Function",
"yt_full": ""
},
"O134": {
"abstract": "We present Roto-Translation Equivariant Spherical Deconvolution (RT-ESD), an $E(3)\\times SO(3)$ equivariant framework for sparse deconvolution of volumes where each voxel contains a spherical signal. Such 6D data naturally arises in diffusion MRI (dMRI), a medical imaging modality widely used to measure microstructure and structural connectivity. As each dMRI voxel is typically a mixture of various overlapping structures, there is a need for blind deconvolution to recover crossing anatomical structures such as white matter tracts. Existing dMRI work takes either an iterative or deep learning approach to sparse spherical deconvolution, yet it typically does not account for relationships between neighboring measurements. This work constructs equivariant deep learning layers which respect to symmetries of spatial rotations, reflections, and translations, alongside the symmetries of voxelwise spherical rotations. As a result, RT-ESD improves on previous work across several tasks including fiber recovery on the DiSCo dataset, deconvolution-derived partial volume estimation on real-world in vivo human brain dMRI, and improved downstream reconstruction of fiber tractograms on the Tractometer dataset. The code will be released.",
"authors": "Axel Elaldi, Guido Gerig, Neel Dey",
"award": "",
"id": "134",
"melba": "False",
"or_id": "lri_iAbpn_r",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T07",
"schedule": "Tuesday, July 11: Oral session 4 - Neuroimaging \u2014 9:00\u201310:15\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "E(3) x SO(3)-Equivariant Networks for Spherical Deconvolution in Diffusion MRI",
"yt_full": ""
},
"O140": {
"abstract": "The reconstruction of graph representations from Images (Image-to-graph) is a frequent task, especially vessel graph extraction from biomedical images. Traditionally, this problem is tackled by a two-stage process: segmentation followed by skeletonization. However, the ambiguity in the heuristic-based pruning of the centerline graph from the skeleta makes it hard to achieve a compact yet faithful graph representation. Recently, \\textit{Relationformer} proposed an end-to-end solution to extract graphs directly from images. However, it does not consider edge features, particularly radius information, which is crucial in many applications such as flow simulation. Further, Relationformer predicts only patch-based graphs. In this work, we address these two shortcomings. We propose a task-specific token, namely radius-token, which explicitly focuses on capturing radius information between two nodes. Second, we propose an efficient algorithm to infer a large 3D graph from patch inference. Finally, we show experimental results on a synthetic vessel dataset and achieve the first 3D complete graph prediction. Code is available at \\url{https://github.com/****}. ",
"authors": "Chinmay Prabhakar, Suprosanna Shit, Johannes C. Paetzold, Ivan Ezhov, Rajat Koner, Hongwei Li, Florian Sebastian Kofler, bjoern menze",
"award": "",
"id": "140",
"melba": "False",
"or_id": "X_AJqHfE1H",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M14",
"schedule": "Monday, July 10: Oral session 3 - Graph-based methods \u2014 16:00\u201316:45\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Vesselformer: Towards Complete 3D Vessel Graph Generation from Images",
"yt_full": ""
},
"O159": {
"abstract": "We present a novel approach to transcranial ultrasound computed tomography that utilizes normalizing flows to improve the speed of imaging and provide Bayesian uncertainty quantification. Our method combines physics-informed methods and data-driven methods to accelerate the reconstruction of the final image. We make use of a physics-informed summary statistic to incorporate the known ultrasound physics with the goal of compressing large incoming observations. This compression enables efficient training of the normalizing flow and standardizes the size of the data regardless of imaging configurations. The combinations of these methods results in fast uncertainty-aware image reconstruction that generalizes to a variety of transducer configurations. We evaluate our approach with in silico experiments and demonstrate that it can significantly improve the imaging speed while quantifying uncertainty. We validate the quality of our image reconstructions by comparing against the traditional physics-only method and also verify that our provided uncertainty is calibrated with the error. ",
"authors": "Rafael Orozco, Mathias Louboutin, Ali Siahkoohi, Gabrio Rizzuti, Tristan van Leeuwen, Felix Johan Herrmann",
"award": "",
"id": "159",
"melba": "False",
"or_id": "LoJG-lUIlk",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T01",
"schedule": "Tuesday, July 11: Oral session 4 - Neuroimaging \u2014 9:00\u201310:15\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Amortized Normalizing Flows for Transcranial Ultrasound with Uncertainty Quantification",
"yt_full": ""
},
"O160": {
"abstract": "Longitudinal studies, where a series of images from the same set of individuals are acquired at different time-points, represent a popular technique for studying and characterizing temporal dynamics in biomedical applications. The classical approach for longitudinal comparison involves normalizing for nuisance variations, such as image orientation or contrast differences, via pre-processing. Statistical analysis is, in turn, conducted to detect changes of interest, either at the individual or population level. This classical approach can suffer from pre-processing issues and limitations of the statistical modeling. For example, normalizing for nuisance variation might be hard in settings where there are a lot of idiosyncratic changes. In this paper, we present a simple machine learning-based approach that can alleviate these issues. In our approach, we train a deep learning model (called PaIRNet, for Pairwise Image Ranking Network) to compare pairs of longitudinal images, with or without supervision. In the self-supervised setup, for instance, the model is trained to temporally order the images, which requires learning to recognize time-irreversible changes. Our results from four datasets demonstrate that PaIRNet can be very effective in localizing and quantifying meaningful longitudinal changes while discounting nuisance variation. Our code is available at \\url{https://github.com/heejong-kim/learning-to-compare-longitudinal-images}",
"authors": "Heejong Kim, Mert R. Sabuncu",
"award": "",
"id": "160",
"melba": "False",
"or_id": "l17YFzXLP53",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T04",
"schedule": "Tuesday, July 11: Oral session 5 - Semi-supervised/self-supervised methods \u2014 14:00\u201315:00\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Learning to Compare Longitudinal Images",
"yt_full": ""
},
"O177": {
"abstract": "Motion artifacts are a pervasive problem in MRI, leading to misdiagnosis or mischaracterization in population-level imaging studies. Current retrospective rigid intra-slice motion correction techniques jointly optimize estimates of the image and the motion parameters. In this paper, we use a deep network to reduce the joint image-motion parameter search to a search over rigid motion parameters alone. Our network produces a reconstruction as a function of two inputs: corrupted k-space data and motion parameters. We train the network using simulated, motion-corrupted k-space data generated from known motion parameters. At test-time, we estimate unknown motion parameters by minimizing a data consistency loss between the motion parameters, the network-based image reconstruction given those parameters, and the acquired measurements. Intra-slice motion correction experiments on simulated and realistic 2D fast spin echo brain MRI achieve high reconstruction fidelity while retaining the benefits of explicit data consistency-based optimization.",
"authors": "Nalini M Singh, Neel Dey, Malte Hoffmann, Bruce Fischl, Elfar Adalsteinsson, Robert Frost, Adrian V Dalca, Polina Golland",
"award": "",
"id": "177",
"melba": "False",
"or_id": "KolMbwNBNGv",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T03",
"schedule": "Tuesday, July 11: Oral session 4 - Neuroimaging \u2014 9:00\u201310:15\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Data Consistent Deep Rigid MRI Motion Correction",
"yt_full": ""
},
"O190": {
"abstract": "We present a physics-enhanced implicit neural representation (INR) for ultrasound (US) imaging that learns tissue properties from overlapping US sweeps. Our proposed method leverages a ray-tracing-based neural rendering for novel view US synthesis. Recent publications demonstrated that INR models could encode a representation of a three-dimensional scene from a set of two-dimensional US frames. However, these models fail to consider the view-dependent changes in appearance and geometry intrinsic to US imaging. In our work, we discuss direction-dependent changes in the scene and show that a physics-inspired rendering improves the fidelity of US image synthesis. In particular, we demonstrate experimentally that our proposed method generates geometrically accurate B-mode images for regions with ambiguous representation owing to view-dependent differences of the US images. We conduct our experiments using simulated B-mode US sweeps of the liver and acquired US sweeps of a spine phantom tracked with a robotic arm. The experiments corroborate that our method generates US frames that enable consistent volume compounding from previously unseen views. To the best of our knowledge, the presented work is the first to address view-dependent US image synthesis using INR.",
"authors": "Magdalena Wysocki, Mohammad Farid Azampour, Christine Eilers, Benjamin Busam, Mehrdad Salehi, Nassir Navab",
"award": "",
"id": "190",
"melba": "False",
"or_id": "x4McMBwVyi",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T14",
"schedule": "Tuesday, July 11: Oral session 6 - Synthesis \u2014 16:00\u201316:45\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Ultra-NeRF: Neural Radiance Fields for Ultrasound Imaging",
"yt_full": ""
},
"O191": {
"abstract": "The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently, deep learning (DL) techniques, particularly self-supervised learning (SSL), have demonstrated great potential in the classification of ECGs. SSL pre-training has achieved competitive performance with only a small amount of annotated data after fine-tuning. However, current SSL methods rely on the availability of annotated data and are unable to predict labels not existing in fine-tuning datasets. To address this challenge, we propose \\textbf{M}ultimodal \\textbf{E}CG-\\textbf{T}ext \\textbf{S}elf-supervised pre-training (METS), \\textbf{the first work} to utilize the auto-generated clinical reports to guide ECG SSL pre-training. We use a trainable ECG encoder and a frozen language model to embed paired ECGs and automatically machine-generated clinical reports separately, then the ECG embedding and paired report embedding are compared with other unpaired embeddings. In downstream classification tasks, METS achieves around 10\\% improvement in performance without using any annotated data via zero-shot classification, compared to other supervised and SSL baselines that rely on annotated data. Furthermore, METS achieves the highest recall and F1 scores on the MIT-BIH dataset, despite MIT-BIH containing different classes of ECGs compared to the pre-trained dataset. The extensive experiments have demonstrated the advantages of using ECG-Text multimodal self-supervised learning in terms of generalizability and effectiveness.",
"authors": "Jun Li, Che Liu, Sibo Cheng, Rossella Arcucci, Shenda Hong",
"award": "",
"id": "191",
"melba": "False",
"or_id": "UAr59yTUWR2",
"oral": "True",
"pmlr_url": "",
"poster_loc": "W04",
"schedule": "Wednesday, July 12: Oral session 8 - Computer-assisted diagnosis \u2014 14:00\u201315:00\nWednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Frozen Language Model Helps ECG Zero-Shot Learning",
"yt_full": ""
},
"O198": {
"abstract": "To fully extract the feature information of lung parenchyma in Chest X-ray images and realize the auxiliary diagnosis of COVID-19 pneumonia, this paper proposed an end-to-end deep learning model, which is mainly composed of object detection, depth feature generation, and multi-channel fusion classification. Firstly, the convolutional neural network (CNN) and region proposal network (RPN)-based object detection module was adopted to detect chest cavity region of interest (ROI). Then, according to the obtained coordinate information of ROI and the convolution feature map of original image, the new convolution feature maps of ROI were obtained with number of 13. By screening 4 representative feature maps form 4 convolution layers with different receptive fields and combining with original ROI image, the 5-dimensional (5D) feature maps were generated as the multi-channel input of classification module. Moreover, in each channel of classification module, three pyramidal recursive MLPs were employed to achieve cross-scale and cross-channel feature analysis. Finally, the correlation analysis of multi-channel output was realized by bi-directional long short memory (Bi-LSTM) module, and the auxiliary diagnosis of pneumonia disease was realized through fully connected layer and SoftMax function. Experimental results show that the proposed model has better classification performance and diagnosis effect than previous methods, with great clinical application potential.",
"authors": "Yiwen Liu, Wenyu Xing, Mingbo Zhao, MINGQUAN LIN",
"award": "",
"id": "198",
"melba": "False",
"or_id": "2cgTLIy1Zx",
"oral": "True",
"pmlr_url": "",
"poster_loc": "W02",
"schedule": "Wednesday, July 12: Oral session 8 - Computer-assisted diagnosis \u2014 14:00\u201315:00\nWednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "An end-to-end framework for diagnosing COVID-19 pneumonia via Parallel Recursive MLP module and Bi-LTSM correlation",
"yt_full": ""
},
"O216": {
"abstract": "Generative statistical models have a wide range of applications in the modelling of anatomies. In-silico clinical trials of medical devices, for instance, require the development of virtual populations of anatomy that capture enough variability while remaining plausible. Model construction and use are heavily influenced by the correspondence problem and establishing shape matching over a large number of training data.This study focuses on generating virtual cohorts of left ventricle geometries resembling different-sized shape populations, suitable for in-silico experiments. We present an unsupervised data-driven probabilistic generative model for shapes. This framework incorporates an attention-based shape matching procedure using graph neural networks, coupled with a $\\beta-$VAE generation model, eliminating the need for initial shape correspondence. Left ventricle shapes derived from cardiac magnetic resonance images available in the UK Biobank are utilized for training and validating the framework. We investigate our method\u2019s generative capabilities in terms of generalisation and specificity and show that it is able to synthesise virtual populations of realistic shapes with volumetric measurements in line with actual clinical indices. Moreover, results show our method outperforms joint registration-PCA-based models.",
"authors": "Soodeh Kalaie, Andrew J. Bulpitt, Alejandro F. Frangi, Ali Gooya",
"award": "",
"id": "216",
"melba": "False",
"or_id": "Ao0D2HMB8P",
"oral": "True",
"pmlr_url": "",
"poster_loc": "M10",
"schedule": "Monday, July 10: Oral session 3 - Graph-based methods \u2014 16:00\u201316:45\nMonday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "A Geometric Deep Learning Framework for Generation of Virtual Left Ventricles as Graphs",
"yt_full": ""
},
"O221": {
"abstract": "Image augmentations are quintessential for effective visual representation learning across self-supervised learning techniques. While augmentation strategies for natural imaging have been studied extensively, medical images are vastly different from their natural counterparts. Thus, it is unknown whether common augmentation strategies employed in Siamese representation learning generalize to medical images and to what extent. To address this challenge, in this study, we systematically assess the effect of various augmentations on the quality and robustness of the learned representations. We train and evaluate Siamese Networks for abnormality detection on chest X-Rays across three large datasets (MIMIC-CXR, CheXpert and VinDr-CXR). We investigate the efficacy of the learned representations through experiments involving linear probing, fine-tuning, zero-shot transfer, and data efficiency. Finally, we identify a set of augmentations that yield robust representations that generalize well to both out-of-distribution data and diseases, while outperforming supervised baselines using just zero-shot transfer and linear probes by up to 20%.",
"authors": "Rogier Van der Sluijs, Nandita Bhaskhar, Daniel Rubin, Curtis Langlotz, Akshay S Chaudhari",
"award": "",
"id": "221",
"melba": "False",
"or_id": "xkmhsBITaCw",
"oral": "True",
"pmlr_url": "",
"poster_loc": "T02",
"schedule": "Tuesday, July 11: Oral session 5 - Semi-supervised/self-supervised methods \u2014 14:00\u201315:00\nTuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Exploring Image Augmentations for Siamese Representation Learning with Chest X-Rays",
"yt_full": ""
},
"O236": {
"abstract": "Smoothing hard-label assignments has emerged as a popular strategy in training discriminative models. Nevertheless, most existing approaches are typically designed for classification tasks, ignoring underlying properties of dense prediction problems, such as medical image segmentation. First, these strategies often ignore the spatial relations between a given pixel and its neighbours. And second, the image context associated with each label is overlooked, which can convey important information about potential errors or ambiguities in the segmentation masks. To address these limitations, we propose in this work geodesic label smoothing (GeoLS), which integrates image information into the label smoothing process by leveraging the geodesic distance transform of the images. As the resulting label assignment is based on the computed geodesic map, class-wise relationships in the soft-labels are better modeled, as it considers image gradients at the boundary of two or more categories. Furthermore, spatial pixel-wise relationships are captured in the geodesic distance transform, integrating richer information than resorting to the Euclidean distance between pixels. We evaluate our method on two publicly available segmentation benchmarks and compare them to popular segmentation loss functions that directly modify the standard hard-label assignments. The proposed geodesic label smoothing improves the segmentation accuracy over existing soft-labeling strategies, demonstrating the validity of integrating image information into the label smoothing process. The code to reproduce our results is available at: https://github.com/anonymous35783578/GeoLS.",
"authors": "Sukesh Adiga Vasudeva, Jose Dolz, Herve Lombaert",
"award": "",
"id": "236",
"melba": "False",
"or_id": "mTIP1bkmR0q",
"oral": "True",
"pmlr_url": "",
"poster_loc": "W01",
"schedule": "Wednesday, July 12: Oral session 7 - Segmentation 2 \u2014 9:30\u201310:15\nWednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "GeoLS: Geodesic Label Smoothing for Image Segmentation",
"yt_full": ""
},
"P002": {
"abstract": "Crohn\u2019s Disease (CD) and Ulcerative Colitis (UC) are the two main Inflammatory Bowel Disease (IBD) types. We developed interpretable deep learning models to identify histolog- ical disease features for both CD and UC using only endoscopic labels. We explored fine- tuning and end-to-end training of two state-of-the-art self-supervised models for predicting three different endoscopic categories (i) CD vs UC (AUC=0.87), (ii) normal vs lesional (AUC=0.81), (iii) low vs high disease severity score (AUC=0.80). With the support of a pathologist, we explored the relationship between endoscopic labels, model predictions and histological evaluations qualitatively and quantitatively and identified cases where the pathologist\u2019s descriptions of inflammation were consistent with regions of high attention. In parallel, we used a model trained on the Colon Nuclei Identification and Counting (CoNIC) dataset to predict and explore 6 cell populations. We observed consistency between areas enriched with the predicted immune cells in biopsies and the pathologist\u2019s feedback on the attention maps. Finally, we identified several cell level features indicative of disease severity in CD and UC. These models can enhance our understanding about the pathology behind IBD and can shape our strategies for patient stratification in clinical trials.",
"authors": "Ricardo Mokhtari, Azam Hamidinekoo, Daniel James Sutton, Arthur Lewis, Bastian Angermann, Ulf Gehrmann, P\u00e5l Lundin, Hibret Adissu, Junmei Cairns, Jessica Neisen, Emon Khan, Daniel Marks, Nia Khachapuridze, Talha Qaiser, Nikolay Burlutskiy",
"award": "",
"id": "2",
"melba": "False",
"or_id": "m-f1SNDhde",
"oral": "False",
"pmlr_url": "",
"poster_loc": "M09",
"schedule": "Monday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Interpretable histopathology-based prediction of disease relevant features in Inflammatory Bowel Disease biopsies using weakly-supervised deep learning",
"yt_full": ""
},
"P003": {
"abstract": "Active learning promises to improve annotation efficiency by iteratively selecting the most important data to be annotated first. However, we uncover a striking contradiction to this promise: at the first few choices, active learning fails to select data as efficiently as random selection. We identify this as the cold start problem in active learning, caused by a biased and outlier initial query. This paper seeks to address the cold start problem and develops a novel active querying strategy, named HaCon, that can exploit the three advantages of contrastive learning: (1) no annotation is required; (2) label diversity is ensured by pseudo-labels to mitigate bias; (3) typical data is determined by contrastive features to reduce outliers. Experiments on three public medical datasets show that HaCon not only significantly outperforms existing active querying strategies but also surpasses random selection by a large margin. Code is available at https://github.com/liangyuch/CSVAL.",
"authors": "Liangyu Chen, Yutong Bai, Siyu Huang, Yongyi Lu, Bihan Wen, Alan Yuille, Zongwei Zhou",
"award": "",
"id": "3",
"melba": "False",
"or_id": "5iSBMWm3ln",
"oral": "False",
"pmlr_url": "",
"poster_loc": "T13",
"schedule": "Tuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Making Your First Choice: To Address Cold Start Problem in Medical Active Learning",
"yt_full": ""
},
"P008": {
"abstract": "Magnetic resonance (MR) images from multiple sources often show differences in image contrast related to acquisition settings or the used scanner type. For long-term studies, longitudinal comparability is essential but can be impaired by these contrast differences, leading to biased results when using automated evaluation tools. This study presents a diffusion model-based approach for contrast harmonization. We use a data set consisting of scans of 18 Multiple Sclerosis patients and 22 healthy controls. Each subject was scanned in two MR scanners of different magnetic field strengths (1.5 T and 3 T), resulting in a paired data set that shows scanner-inherent differences. We map images from the source contrast to the target contrast for both directions, from 3 T to 1.5 T and from 1.5 T to 3 T. As we only want to change the contrast, not the anatomical information, our method uses the original image to guide the image-to-image translation process by adding structural information. The aim is that the mapped scans display increased comparability with scans of the target contrast for downstream tasks. We evaluate this method for the task of segmentation of cerebrospinal fluid, grey matter and white matter. Our method achieves good and consistent results for both directions of the mapping.",
"authors": "Alicia Durrer, Julia Wolleb, Florentin Bieder, Tim Sinnecker, Matthias Weigel, Robin Sandkuehler, Cristina Granziera, \u00d6zg\u00fcr Yaldizli, Philippe C. Cattin",
"award": "",
"id": "8",
"melba": "False",
"or_id": "Xs_Hd23_PP",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W07",
"schedule": "Wednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Diffusion Models for Contrast Harmonization of Magnetic Resonance Images",
"yt_full": ""
},
"P009": {
"abstract": "Denoising diffusion models have recently achieved state-of-the-art performance in many image-generation tasks. They do, however, require a large amount of computational resources. This limits their application to medical tasks, where we often deal with large 3D volumes, like high-resolution three-dimensional data. In this work, we present a number of different ways to reduce the resource consumption for 3D diffusion models and apply them to a dataset of 3D images. The main contribution of this paper is the memory-efficient patch-based diffusion model PatchDDM, which can be applied to the total volume during inference while the training is performed only on patches. Without limiting the application of the proposed diffusion model for image generation, we evaluate the method on the tumor segmentation task of the BraTS2020 dataset and demonstrate that we can generate meaningful three-dimensional segmentations.",
"authors": "Florentin Bieder, Julia Wolleb, Alicia Durrer, Robin Sandkuehler, Philippe C. Cattin",
"award": "",
"id": "9",
"melba": "False",
"or_id": "neXqIGpO-tn",
"oral": "False",
"pmlr_url": "",
"poster_loc": "M13",
"schedule": "Monday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Denoising Diffusion Models for Memory-efficient Processing of 3D Medical Images",
"yt_full": ""
},
"P010": {
"abstract": "Detection of pathologies is a fundamental task in medical imaging and the evaluation of algorithms that can perform this task automatically is crucial. However, current object detection metrics for natural images do not reflect the specific clinical requirements in pathology detection sufficiently. To tackle this problem, we propose Robust Detection Outcome (RoDeO); a novel metric for evaluating algorithms for pathology detection in medical images, especially in chest X-rays. RoDeO evaluates different errors directly and individually, and reflects clinical needs better than current metrics. Extensive evaluation on the ChestX-ray8 dataset shows the superiority of our metrics compared to existing ones. We released the code at [https://github.com/FeliMe/RoDeO](https://github.com/FeliMe/RoDeO) and published RoDeO as pip package ($rodeometric$).",
"authors": "Felix Meissen, Philip M\u00fcller, Georgios Kaissis, Daniel Rueckert",
"award": "",
"id": "10",
"melba": "False",
"or_id": "zyiJi4sJ7dZ",
"oral": "False",
"pmlr_url": "",
"poster_loc": "T29",
"schedule": "Tuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\nWednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P010.pdf",
"title": "Robust Detection Outcome: A Metric for Pathology Detection in Medical Images",
"yt_full": "https://youtu.be/A_pOOc8lKFY"
},
"P011": {
"abstract": "One major problem in deep learning-based solutions for medical imaging is the drop in performance when a model is tested on a data distribution different from the one that it is trained on. Adapting the source model to target data distribution at test-time is an efficient solution for the data-shift problem. Previous methods solve this by adapting the model to target distribution by using techniques like entropy minimization or regularization. In these methods, the models are still updated by back-propagation using an unsupervised loss on complete test data distribution. In real-world clinical settings, it makes more sense to adapt a model to a new test image on-the-fly and avoid model update during inference due to privacy concerns and lack of computing resource at deployment. To this end, we propose a new setting - On-the-Fly Adaptation which is zero-shot and episodic i.e., the model is adapted to a single image at a time and also does not perform any back-propagation during test-time). To achieve this, we propose a new framework called Adaptive UNet where each convolutional block is equipped with an adaptive batch normalization layer to adapt the features with respect to a domain code. The domain code is generated using a pre-trained encoder trained on a large corpus of medical images. During test-time, the model takes in just the new test image and generates a domain code to adapt the features of source model according to the test data. We validate the performance on both 2D and 3D data distribution shifts where we get a better performance compared to previous test-time adaptation methods.",
"authors": "Jeya Maria Jose Valanarasu, Pengfei Guo, Vibashan VS, Vishal M. Patel",
"award": "",
"id": "11",
"melba": "False",
"or_id": "UQDalTzrEg",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W05",
"schedule": "Wednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "On-the-Fly Test-time Adaptation for Medical Image Segmentation",
"yt_full": ""
},
"P012": {
"abstract": "Deep Learning (DL) based methods for magnetic resonance (MR) image reconstruction have been shown to produce superior performance. However, previous methods either only leverage under-sampled data or require a paired fully-sampled auxiliary MR sequence to perform guidance-based reconstruction. Consequently, existing approaches neglect to explore attention mechanisms that can transfer texture from reference data to under-sampled data within a single MR sequence, which either limits the performance of these approaches or increases the difficulty of data acquisition. In this paper, we propose a novel $\\textbf{T}$exture $\\textbf{T}$ransformer $\\textbf{M}$odule ($\\textbf{TTM}$) for the reference-based MR image reconstruction. The TTM facilitates joint feature learning across under-sampled and reference data, so feature correspondences can be discovered by attention and accurate texture features can be leveraged during reconstruction. Notably, TTM can be stacked on prior MRI reconstruction methods to improve their performance. In addition, a $\\textbf{R}$ecurrent $\\textbf{T}$ransformer $\\textbf{R}$econstruction backbone ($\\textbf{RTR}$) is proposed to further improve the performance in a unified framework. Extensive experiments demonstrate the effectiveness of TTM and show that RTR can achieve the state-of-the-art results on multiple datasets. Implementation code and pre-trained weights will be made public after the review process. ",
"authors": "Pengfei Guo, Vishal M. Patel",
"award": "",
"id": "12",
"melba": "False",
"or_id": "EoEWcHFHJ1W",
"oral": "False",
"pmlr_url": "",
"poster_loc": "M11",
"schedule": "Monday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Reference-based MRI Reconstruction Using Texture Transformer",
"yt_full": ""
},
"P014": {
"abstract": "The deep image prior (DIP) is a well-established unsupervised deep learning method for image reconstruction; yet it is far from being flawless. The DIP overfits to noise if not early stopped, or optimized via a regularized objective. We build on the regularized fine-tuning of a pretrained DIP, by adopting a novel strategy that restricts the learning to the adaptation of singular values. The proposed SVD-DIP uses ad hoc convolutional layers whose pretrained parameters are decomposed via the singular value decomposition. Optimizing the DIP then solely consists in the fine-tuning of the singular values, while keeping the left and right singular vectors fixed. We thoroughly validate the proposed method on real-measured \u03bcCT data of a lotus root as well as two medical datasets (LoDoPaB and Mayo). We report significantly improved stability of the DIP optimization, by overcoming the overfitting to noise.",
"authors": "Marco Nittscher, Michael Falk Lameter, Riccardo Barbano, Johannes Leuschner, Bangti Jin, Peter Maass",
"award": "",
"id": "14",
"melba": "False",
"or_id": "ivC7VP2mof",
"oral": "False",
"pmlr_url": "",
"poster_loc": "T11",
"schedule": "Tuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "SVD-DIP: Overcoming the Overfitting Problem in DIP-based CT Reconstruction",
"yt_full": ""
},
"P015": {
"abstract": "Cell detection in histopathology images facilitates clinical diagnosis, and deep learning methods have been applied to the detection problem with substantially improved performance. However, cell detection methods based on deep learning usually require a large number of annotated training samples, which are costly and time-consuming to obtain, and it is desirable to develop methods where detection networks can be adequately trained with only a few annotated training samples. Since unlabeled data is much less expensive to obtain, it is possible to address this problem with semi-supervised learning, where abundant unlabeled data is combined with the limited annotated training samples for network training. In this work, we propose a semi-supervised object detection method for cell detection in histopathology images, which is based on and improves the mean teacher framework. In standard mean teacher, the detection results on unlabeled data given by the teacher model can be noisy, which may negatively impact the learning of the student model. To address this problem, we propose to suppress the noise in the detection results of the teacher model by mixing the unlabeled training images with labeled training images of which the ground truth detection results are available. In addition, we propose to further incorporate a loss term that is robust to noise when the the student model learns from the teacher model. To evaluate the proposed method, experiments were performed on a publicly available dataset for multi-class cell detection, and the experimental results show that our method improves the performance of cell detection in histopathology images in the semi-supervised setting.",
"authors": "Ziqi Wen, Chuyang Ye",
"award": "",
"id": "15",
"melba": "False",
"or_id": "PEWigppmw3b",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P015.pdf",
"title": "A Robust Mean Teacher Framework for Semi-Supervised Cell Detection in Histopathology Images",
"yt_full": "https://youtu.be/SKtcmrrD6xo"
},
"P016": {
"abstract": "Deep learning systems have been proposed to improve the objectivity and efficiency of Ki-67 PI scoring. The challenge is that deep learning techniques, while very accurate, suffer from reduced performance when applied to out-of-domain data. This is a critical challenge for clinical translation, as models are typically trained using data available to the vendor, which is not from the target domain. To address this challenge, this study proposes a domain adaptation pipeline that employs an unsupervised framework to generate silver standard (pseudo) labels in the target domain, which is used to augment the gold standard (GS) source domain data. Five training regimes were tested on two validated Ki-67 scoring architectures (UV-Net and piNET), (1) SS Only: trained on target silver standard (SS) labels, (2) GS Only: trained on source GS labels, (3) Mixed: trained on target SS and source GS labels, (4) GS+SS: trained on source GS labels and fine-tuned on target SS labels, and our proposed method (5) SS+GS: trained on source SS labels and fine-tuned on source GS labels. The SS+GS method yielded significantly ($p<0.05$) higher PI accuracy ($95.9\\%$) and more consistent results compared to the GS Only model on target data. Analysis of t-SNE plots showed features learned by the SS+GS models are more aligned for source and target data which results in improved generalization. The proposed pipeline provides an efficient method for learning the target distribution without the need for manual annotations, which are time-consuming and costly to generate for medical images. This framework can be applied to any target site as a per-laboratory calibration method, for widescale deployment.",
"authors": "Amanda Dy, Ngoc-Nhu Jennifer Nguyen, Seyed Hossein Mirjahanmardir, Dimitrios Androutsos, Melanie Dawe, Anthony Fyles, Wei Shi, Fei-Fei Liu, Susan Done, April Khademi",
"award": "",
"id": "16",
"melba": "False",
"or_id": "-ahfrpMo9ui",
"oral": "False",
"pmlr_url": "",
"poster_loc": "M15",
"schedule": "Monday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology A Step Closer to Widescale Deployment",
"yt_full": ""
},
"P023": {
"abstract": "Self-supervised learning has attracted increasing attention as it learns data-driven representation from data without annotations. Vision transformer-based autoencoder (ViT-AE) by He et al. (2021) is a recent self-supervised learning technique that employs a patch-masking strategy to learn a meaningful latent space. In this paper, we focus on improving ViT-AE (nicknamed ViT-AE++) for a more effective representation of both 2D and 3D medical images. We propose two new loss functions to enhance the representation during the training stage. The first loss term aims to improve self-reconstruction by considering the structured dependencies and hence indirectly improving the representation. The second loss term leverages contrastive loss to directly optimize the representation from two randomly masked views. As an independent contribution, we extended ViT-AE++ to a 3D fashion for volumetric medical images. We extensively evaluate ViT-AE++ on both natural images and medical images, demonstrating consistent improvement over vanilla ViT-AE and its superiority over other contrastive learning approaches.",
"authors": "Chinmay Prabhakar, Hongwei Li, Jiancheng Yang, Suprosanna Shit, Benedikt Wiestler, bjoern menze",
"award": "",
"id": "23",
"melba": "False",
"or_id": "2Aoi0VKPOWT",
"oral": "False",
"pmlr_url": "",
"poster_loc": "T15",
"schedule": "Tuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "ViT-AE++: Improving Vision Transformer Autoencoder for Self-supervised Medical Image Representations",
"yt_full": ""
},
"P031": {
"abstract": "Multi-scale representations have proven to be a powerful tool since they can take into account both the fine-grained details of objects in an image as well as the broader context. Inspired by this, we propose a novel dual-branch transformer network that operates on two different scales to encode global contextual dependencies while preserving local information. To learn in a self-supervised fashion, our approach considers the semantic dependency that exists between different scales to generate a supervisory signal for inter-scale consistency and also imposes a spatial stability loss within the scale for self-supervised content clustering. While intra-scale and inter-scale consistency losses aim to increase features similarly within the cluster, we propose to include a cross-entropy loss function on top of the clustering score map to effectively model each cluster distribution and increase the decision boundary between clusters. Iteratively our algorithm learns to assign each pixel to a semantically related cluster to produce the segmentation map. Extensive experiments on skin lesion and lung segmentation datasets show the superiority of our method compared to the state-of-the-art (SOTA) approaches. ",
"authors": "Sanaz Karimijafarbigloo, Reza Azad, Amirhossein Kazerouni, Dorit Merhof",
"award": "",
"id": "31",
"melba": "False",
"or_id": "pp2raGSU3Wx",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P031.pdf",
"title": "MS-Former: Multi-Scale Self-Guided Transformer for Medical Image Segmentation",
"yt_full": "https://youtu.be/8zK8o7b8Bw4"
},
"P032": {
"abstract": "Incorporating either rotation equivariance or scale equivariance into CNNs has proved to be effective in improving models\u2019 generalization performance. However, jointly integrating rotation and scale equivariance into CNNs has not been widely explored. Digital histology imaging of biopsy tissue can be captured at arbitrary orientation and magnification and stored at different resolutions, resulting in cells appearing in different scales. When conventional CNNs are applied to histopathology image analysis, the generalization performance of models is limited because 1) a part of the parameters of filters are trained to fit rotation transformation, thus decreasing the capability of learning other discriminative features; 2) fixed-size filters trained on images at a given scale fail to generalize to those at different scales. To deal with these issues, we propose the Rotation-Scale Equivariant Steerable Filter (RSESF), which incorporates steerable filters and scale-space theory. The RSESF contains copies of filters that are linear combinations of Gaussian filters, whose direction is controlled by directional derivatives and whose scale parameters are trainable but constrained to span disjoint scales in successive layers of the network. Extensive experiments on two gland segmentation datasets demonstrate that our method outperforms other approaches, with much fewer trainable parameters and fewer GPU resources required. The source code is available at: https://github.com/ynulonger/RSESF.",
"authors": "Yilong Yang, Srinandan Dasmahapatra, Sasan Mahmoodi",
"award": "",
"id": "32",
"melba": "False",
"or_id": "A0MyiAwE_E4",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P032.pdf",
"title": "Rotation-Scale Equivariant Steerable Filters",
"yt_full": "https://youtu.be/8PtZLo2Ihkw"
},
"P033": {
"abstract": "Recent advances in MRI have led to the creation of large datasets. With the increase in data volume, it has become difficult to locate previous scans of the same patient within these datasets (a process known as re-identification). To address this issue, we propose an AI-powered medical imaging retrieval framework called DeepBrainPrint, which is designed to retrieve brain MRI scans of the same patient. Our framework is a semi-self-supervised contrastive deep learning approach with three main innovations. First, we use a combination of self-supervised and supervised paradigms to create an effective brain fingerprint from MRI scans that can be used for real-time image retrieval. Second, we use a special weighting function to guide the training and improve model convergence. Third, we introduce new imaging transformations to improve retrieval robustness in the presence of intensity variations (i.e. different scan contrasts), and to account for age and disease progression in patients. We tested DeepBrainPrint on a large dataset of T1-weighted brain MRIs from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and on a synthetic dataset designed to evaluate retrieval performance with different image modalities. Our results show that DeepBrainPrint outperforms previous methods, including simple similarity metrics and more advanced contrastive deep learning frameworks.",
"authors": "Lemuel Puglisi, Arman Eshaghi, Geoff Parker, Frederik Barkhof, Daniel C. Alexander, Daniele Ravi",
"award": "",
"id": "33",
"melba": "False",
"or_id": "i5khDI1te1M",
"oral": "False",
"pmlr_url": "",
"poster_loc": "M16",
"schedule": "Monday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "DeepBrainPrint: A Novel Contrastive Framework for Brain MRI Re-Identification",
"yt_full": ""
},
"P035": {
"abstract": "Deformable image registration is a crucial component in the analysis of motion in time series. In medical data, the deformation fields are often predictable to a certain degree: the muscles and other tissues causing the motion-of-interest form shapes that may be used as a geometric prior. Using an Implicit Neural Representation to parameterize a deformation field allows the coordinate space to be chosen arbitrarily. We propose to curve this coordinate space around anatomical structures that influence the motion in our time series, yielding a space that is aligned with the expected motion. The geometric information is therefore explicitly encoded into the neural representation, reducing the complexity of the optimized deformation function. We design and evaluate this concept using an abdominal 3D cine-MRI dataset, where the motion of interest is bowel motility. We align the coordinate system of the neural representations with automatically extracted centerlines of the small intestine. We show that explicitly encoding the intestine geometry in the neural representations improves registration accuracy for bowel loops with active motility when compared to registration using neural representations in the original coordinate system. Additionally, we show that registration accuracy can be further improved using a model that combines a neural representation in image coordinates with a separate neural representation that operates in the proposed tangent coordinate system. This approach may improve the efficiency of deformable registration when describing motion-of-interest that is influenced by the shape of anatomical structures.",
"authors": "Louis van Harten, Rudolf Leonardus Mirjam Van Herten, Jaap Stoker, Ivana Isgum",
"award": "",
"id": "35",
"melba": "False",
"or_id": "Pj9vtDIzSCE",
"oral": "False",
"pmlr_url": "",
"poster_loc": "T16",
"schedule": "Tuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Deformable Image Registration with Geometry-informed Implicit Neural Representations",
"yt_full": ""
},
"P038": {
"abstract": "Magnetic resonance imaging (MRI) is a common non-invasive imaging technique with high soft tissue contrast. Different MRI modalities are used for the diagnosis of various conditions including T1-weighted and T2-weighted MRI. In this paper, we introduce MTSR-MRI, a novel method that can not only upscale low-resolution scans but also translates between the T1-weighted and T2-weighted modalities. This will potentially reduce the scan time or repeat scans by taking low-resolution inputs in one modality and returning plausible high-resolution output in another modality. Due to the ambiguity that persists in image-to-image translation tasks, we consider the distribution of possible outputs in a conditional generative setting. The mapping is distilled in a low-dimensional latent distribution which can be randomly sampled at test time, thus allowing us to generate multiple plausible high-resolution outputs from a given low-resolution input. We validate the proposed method on the BraTS-18 dataset qualitatively and quantitatively using a variety of similarity measures. The implementation of this work will be available at https://github.com/AvirupJU/MTSR-MRI . ",
"authors": "Avirup Dey, Mehran Ebrahimi",
"award": "",
"id": "38",
"melba": "False",
"or_id": "mUPIsk20oGt",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W09",
"schedule": "Wednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "MTSR-MRI: Combined Modality Translation and Super-Resolution of Magnetic Resonance Images",
"yt_full": ""
},
"P046": {
"abstract": "Pretraining on large natural image classification datasets such as ImageNet has aided model development on data-scarce 2D medical tasks. 3D medical tasks often have much less data than 2D medical tasks, prompting practitioners to rely on pretrained 2D models to featurize slices. However, these 2D models have been surpassed by 3D models on 3D computer vision benchmarks since they do not natively leverage cross-sectional or temporal information. In this study, we explore whether natural video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks. We demonstrate video pretraining improves the average performance of seven 3D models on two chest CT datasets, regardless of finetuning dataset size, and that video pretraining allows 3D models to outperform 2D baselines. Lastly, we observe that pretraining on the large-scale out-of-domain Kinetics dataset improves performance more than pretraining on a typically-sized in-domain CT dataset. Our results show consistent benefits of video pretraining across a wide array of architectures, tasks, and training dataset sizes, supporting a shift from small-scale in-domain pretraining to large-scale out-of-domain pretraining for 3D medical tasks.",
"authors": "Alexander Ke, Shih-Cheng Huang, Chloe P O'Connell, Michal Klimont, Serena Yeung, Pranav Rajpurkar",
"award": "",
"id": "46",
"melba": "False",
"or_id": "zhpP7zluLBk",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P046.pdf",
"title": "Video pretraining advances 3D deep learning on chest CT tasks",
"yt_full": ""
},
"P048": {
"abstract": "Automatic 3-dimensional tooth segmentation on intraoral scans (IOS) plays a pivotal role in computer-aided orthodontic treatments. In practice, deploying existing well-trained models to different medical centers suffers from two main problems: (1) the data distribution shifts between existing and new centers, (2) the data in the existing center is usually not allowed to share while annotating additional data in the new center is time-consuming and expensive. In this paper, we propose a Model Adaptive Tooth Segmentation (MATS) framework to alleviate these issues. Taking the trained model from a source center as input, MATS adapts it to different target centers without data transmission or additional annotations, as inspired by the source data-free domain adaptation (SFDA) paradigm. The model adaptation in MATS is realized by a tooth-level feature prototype learning module, a progressive pseudo-labeling module and a tooth-prior regularized information maximization loss. Experiments on a dataset with tooth abnormalities and a real-world cross-center dataset show that MATS can consistently surpass existing baselines. The effectiveness is further verified with extensive ablation studies and statistical analysis, demonstrating its applicability for privacy-preserving tooth segmentation in real-world digital dentistry. ",
"authors": "Ruizhe Chen, Jianfei Yang, YANG FENG, Jin Hao, Zuozhu Liu",
"award": "",
"id": "48",
"melba": "False",
"or_id": "O2DerS5oQ1",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P048.pdf",
"title": "Model Adaptive Tooth Segmentation",
"yt_full": "https://youtu.be/bkF1lfFwWd0"
},
"P050": {
"abstract": "In recent years, Transformer-based models have gained attention in the field of medical image segmentation, with research exploring ways to integrate them with established architectures such as Unet. However, the high computational demands of these models have led most current approaches to focus on segmenting 2D slices of MRI or CT images, which can limit the ability of the model to learn semantic information in the depth axis and result in output with uneven edges. Additionally, the small size of medical image datasets, particularly those for brain tumor segmentation, poses a challenge for training transformer models. To address these issues, we propose 3D Medical Axial Transformer (MAT), a lightweight, end-to-end model for 3D brain tumor segmentation that employs an axial attention mechanism to reduce computational demands and self-distillation to improve performance on small datasets. Results indicate that our approach, which has fewer parameters and a simpler structure than other models, achieves superior performance and produces clearer output boundaries, making it more suitable for clinical applications.",
"authors": "Cheng Liu, Hisanor Kiryu",
"award": "",
"id": "50",
"melba": "False",
"or_id": "PX-jt92kQUM",
"oral": "False",
"pmlr_url": "",
"poster_loc": "T17",
"schedule": "Tuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "3D Medical Axial Transformer: A Lightweight Transformer Model for 3D Brain Tumor Segmentation",
"yt_full": ""
},
"P053": {
"abstract": "We propose an image synthesis mechanism for multi-sequence prostate MR images conditioned on text, to control lesion presence and sequence, as well as to generate paired bi-parametric images conditioned on images e.g. for generating diffusion-weighted MR from T2-weighted MR for paired data, which are two challenging tasks in pathological image synthesis. Our proposed mechanism utilises and builds upon the recent stable diffusion model by proposing image-based conditioning for paired data generation. We validate our method using 2D image slices from real suspected prostate cancer patients. The realism of the synthesised images is validated by means of a blind expert evaluation for identifying real versus fake images, where a radiologist with 4 years experience reading urological MR only achieves 59.4\\% accuracy across all tested sequences (where chance is 50\\%). For the first time, we evaluate the realism of the generated pathology by blind expert identification of the presence of suspected lesions, where we find that the clinician performs similarly for both real and synthesised images, with a 2.9 percentage point difference in lesion identification accuracy between real and synthesised images, demonstrating the potentials in radiological training purposes. Furthermore, we also show that a machine learning model, trained for lesion identification, shows better performance (76.2\\% vs 70.4\\%, statistically significant improvement) when trained with real data augmented by synthesised data as opposed to training with only real images, demonstrating usefulness for model training.",
"authors": "Shaheer U. Saeed, Tom Syer, Wen Yan, Qianye Yang, Mark Emberton, Shonit Punwani, Matthew John Clarkson, Dean Barratt, Yipeng Hu",
"award": "",
"id": "53",
"melba": "False",
"or_id": "3QnxUSzR7iu",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P053.pdf",
"title": "Bi-parametric prostate MR image synthesis using pathology and sequence-conditioned stable diffusion",
"yt_full": "https://youtu.be/4lqgp8BKcfg"
},
"P054": {
"abstract": "The image acquisition parameters (IAPs) used to create MRI scans are central to defining the appearance of the images. Deep learning models trained on data acquired using certain parameters might not generalize well to images acquired with different parameters. Being able to recover such parameters directly from an image could help determine whether a deep learning model is applicable, and could assist with data harmonization and/or domain adaptation. Here, we introduce a neural network model that can predict many complex IAPs used to generate an MR image with high accuracy solely using the image, with a single forward pass. These predicted parameters include field strength, echo and repetition times, acquisition matrix, scanner model, scan options, and others. Even challenging parameters such as contrast agent type can be predicted with good accuracy. We perform a variety of experiments and analyses of our model's ability to predict IAPs on many MRI scans of new patients, and demonstrate its usage in a realistic application. Predicting IAPs from the images is an important step toward better understanding the relationship between image appearance and IAPs. This in turn will advance the understanding of many concepts related to the generalizability of neural network models on medical images, including domain shift, domain adaptation, and data harmonization. ",
"authors": "Nicholas Konz, Maciej A Mazurowski",
"award": "",
"id": "54",
"melba": "False",
"or_id": "JZBNeVLAqp",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W10",
"schedule": "Wednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Reverse Engineering Breast MRIs: Predicting Acquisition Parameters Directly from Images",
"yt_full": ""
},
"P055": {
"abstract": "Renal transplantation emerges as the most effective solution for end-stage renal disease. Occurring from complex causes, a substantial risk of transplant chronic dysfunction persists and may lead to graft loss. Medical imaging plays a substantial role in renal transplant monitoring in clinical practice. However, graft supervision is multi-disciplinary, notably joining nephrology, urology, and radiology, while identifying robust biomarkers from such high-dimensional and complex data for prognosis is challenging. In this work, taking inspiration from the recent success of Large Language Models (LLMs), we propose MEDIMP -- Medical Images and clinical Prompts -- a model to learn meaningful multi-modal representations of renal transplant Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE MRI) by incorporating structural clinicobiological data after translating them into text prompts. MEDIMP is based on contrastive learning from joint text-image paired embeddings to perform this challenging task. Moreover, we propose a framework that generates medical prompts using automatic textual data augmentations from LLMs. Our goal is to learn meaningful manifolds of renal transplant DCE MRI, interesting for the prognosis of the transplant or patient status (2, 3, and 4 years after the transplant), fully exploiting the limited available multi-modal data most efficiently. Extensive experiments and comparisons with other renal transplant representation learning methods with limited data prove the effectiveness of MEDIMP in a relevant clinical setting, giving new directions toward medical prompts. Our code is available at https://github.com/leomlck/MEDIMP.",
"authors": "Leo Milecki, Vicky Kalogeiton, Sylvain Bodard, Dany Anglicheau, Jean-Michel Correas, Marc-Olivier Timsit, Maria Vakalopoulou",
"award": "",
"id": "55",
"melba": "False",
"or_id": "jt-ochRhqG",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W22",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\nWednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "/virtual/poster/P055.pdf",
"title": "MEDIMP: 3D Medical Images and clinical Prompts for renal transplant representation learning",
"yt_full": "https://youtu.be/PgGGqhpntTs"
},
"P062": {
"abstract": "Brain surface-based image registration, an important component of brain image analysis, establishes spatial correspondence between cortical surfaces. Existing iterative and learning-based approaches focus on accurate registration of folding patterns of the cerebral cortex, and assume that geometry predicts function and thus functional areas will also be well aligned. However, structure/functional variability of anatomically corresponding areas across subjects has been widely reported. In this work, we introduce a learning-based cortical registration framework, JOSA, which jointly aligns folding patterns and functional maps while simultaneously learning an optimal atlas. We demonstrate that JOSA can substantially improve registration performance in both anatomical and functional domains over existing methods. By employing a semi-supervised training strategy, the proposed framework obviates the need for functional data during inference, enabling its use in broad neuroscientific domains where functional data may not be observed. The source code of JOSA will be released to the public at https://voxelmorph.net.",
"authors": "Jian Li, Greta Tuckute, Evelina Fedorenko, Brian L Edlow, Bruce Fischl, Adrian V Dalca",
"award": "",
"id": "62",
"melba": "False",
"or_id": "n9v_BuIcY7G",
"oral": "False",
"pmlr_url": "",
"poster_loc": "T18",
"schedule": "Tuesday, July 11: Posters \u2014 10:30\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Joint cortical registration of geometry and function using semi-supervised learning",
"yt_full": ""
},
"P067": {
"abstract": "Recently, various deep learning methods have shown significant successes in medical image analysis, especially in the detection of cancer metastases in hematoxylin and eosin (H&E) stained whole-slide images (WSIs). However, in order to obtain good performance, these research achievements rely on hundreds of well-annotated WSIs. In this study, we tackle the tumor localization and detection problem under the setting of few labeled whole slide images and introduce a patch-based analysis pipeline based on the latest reverse knowledge distillation architecture. To address the extremely unbalanced normal and tumorous samples in training sample collection, we applied the focal loss formula to the representation similarity metric for model optimization. Compared with prior arts, our method achieves similar performance by less than ten percent of training samples on the public Camelyon16 dataset. In addition, this is the first work that show the great potential of the knowledge distillation models in computational histopathology. Our python implementation will be publically accessible upon paper acceptance.",
"authors": "Yinsheng He, Xingyu Li",
"award": "",
"id": "67",
"melba": "False",
"or_id": "BF4NpwMuei",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W11",
"schedule": "Wednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Whole-slide-imaging Cancer Metastases Detection and Localization with Limited Tumorous Data",
"yt_full": ""
},
"P068": {
"abstract": "Mild cognitive impairment (MCI), as a transitional state between normal cognition and Alzheimer's disease (AD), is crucial for taking preventive interventions in order to slow down AD progression. Given the high relevance of brain atrophy and the neurodegeneration process of AD, we propose a novel mesh-based pooling module, RegionPool, to investigate the morphological changes in brain shape regionally. We then present a geometric deep learning framework with the RegionPool and graph attention convolutions to perform binary classification on MCI subtypes (EMCI/LMCI). Our model does not require feature engineering and relies only on the relevant geometric information of T1-weighted magnetic resonance imaging (MRI) signals. Our evaluation reveals the state-of-the-art classification capabilities of our network and shows that current empirically derived MCI subtypes cannot identify heterogeneous patterns of cortical atrophy at the MCI stage. The class activation maps (CAMs) generated from the correct predictions provide additional visual evidence for our model's decisions and are consistent with the atrophy patterns reported by the relevant literature.",
"authors": "Jiaqi Guo, Emanuel Azcona, Santiago Lopez-Tapia, Aggelos Katsaggelos",
"award": "",
"id": "68",
"melba": "False",
"or_id": "J4JWTCq14u",
"oral": "False",
"pmlr_url": "",
"poster_loc": "M18",
"schedule": "Monday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Stage Detection of Mild Cognitive Impairment: Region-dependent Graph Representation Learning on Brain Morphable Meshes",
"yt_full": ""
},
"P070": {
"abstract": "While there have been many studies on using deep learning for medical image analysis, the lack of manually annotated data remains a challenge in training a deep learning model for segmentation of medical images. This work shows how the kaleidoscope transform (KT) can be applied to a 3D convolutional neural network to improve its generalizability when the training set is extremely small. In this study, the KT was applied to a context aggregation network (CAN) for semantic segmentation of anatomical structures in knee MR images. In the proposed model, KAN3D, the input image is rearranged into a batch of downsampled images (KT) before the convolution operations, and then the voxels are rearranged back to their original positions (inverse KT) after the convolution operations to produce the predicted segmentation mask for the input image. Compared to the CAN3D (without the KT), the KAN3D was able to reduce overfitting without data augmentation while maintaining a fast training and inference time. The paper discusses the observed advantages and disadvantages of KAN3D.",
"authors": "Boyeong Woo, Marlon Bran Lorenzana, Craig Engstrom, William Baresic, Jurgen Fripp, Stuart Crozier, Shekhar S. Chandra",
"award": "",
"id": "70",
"melba": "False",
"or_id": "80ZHtBKHKHo",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P070.pdf",
"title": "Semantic Segmentation of 3D Medical Images Through a Kaleidoscope: Data from the Osteoarthritis Initiative",
"yt_full": "https://youtu.be/03tWDzQfWlg"
},
"P071": {
"abstract": "High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods have shown great success on natural images, and offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which may be caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a novel approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, which leads to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks - such as distinguishing treatments and mechanism of action.",
"authors": "Johan Fredin Haslum, Christos Matsoukas, Karl-Johan Leuchowius, Erik M\u00fcllers, Kevin Smith",
"award": "",
"id": "71",
"melba": "False",
"or_id": "PzzhiSNnyF8",
"oral": "False",
"pmlr_url": "",
"poster_loc": "W35",
"schedule": "Wednesday, July 12: Posters \u2014 10:15\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "Metadata-guided Consistency Learning for High Content Images",
"yt_full": ""
},
"P073": {
"abstract": "Interpretability is essential for machine learning algorithms in high-stakes application fields such as medical image analysis. However, high-performing black-box neural networks do not provide explanations for their predictions, which can lead to mistrust and suboptimal human-ML collaboration. Post-hoc explanation techniques, which are widely used in practice, have been shown to suffer from severe conceptual problems. Furthermore, as we show in this paper, current explanation techniques do not perform adequately in the multi-label scenario, in which multiple medical findings may co-occur in a single image. We propose Attri-Net, an inherently interpretable model for multi-label classification. Attri-Net is a powerful classifier that provides transparent, trustworthy, and human-understandable explanations. The model first generates class-specific attribution maps based on counterfactuals to identify which image regions correspond to certain medical findings. Then a simple logistic regression classifier is used to make predictions based solely on these attribution maps. We compare Attri-Net to five post-hoc explanation techniques and one inherently interpretable classifier on three chest X-ray datasets. We find that Attri-Net produces high-quality multi-label explanations consistent with clinical knowledge and has comparable classification performance to state-of-the-art classification models.",
"authors": "Susu Sun, Stefano Woerner, Andreas Maier, Lisa M. Koch, Christian F. Baumgartner",
"award": "",
"id": "73",
"melba": "False",
"or_id": "OpmQOkfizzM",
"oral": "False",
"pmlr_url": "",
"poster_loc": "M19",
"schedule": "Monday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "/virtual/poster/P073.pdf",
"title": "Inherently Interpretable Multi-Label Classification Using Class-Specific Counterfactuals",
"yt_full": "https://youtu.be/x44gbPrkWl8"
},
"P075": {
"abstract": "Federated learning and its application to medical image segmentation have recently become a popular research topic. This training paradigm suffers from statistical heterogeneity between participating institutions\u2019 local datasets, incurring convergence slowdown as well as potential accuracy loss compared to classical training. To mitigate this effect, federated personalization emerged as the federated optimization of one model per distribution. We propose a novel personalization algorithm tailored to the feature shift induced by the usage of different scanners and acquisition parameters by different institutions. This method is the first to account for both inter and intra-institution feature shifts (multiple scanners used in a single institution). It is based on the computation, within each centre, of a series of radiomic features capturing the global texture of each 3D image volume, followed by a clustering analysis pooling all feature vectors transferred from the local institutions to the central server. Each computed clustered decentralized dataset (potentially including data from different institutions) then serves to finetune a global model obtained through classical federated learning. We validate our approach on the Federated Brain Tumor Segmentation 2022 Challenge dataset (FeTS2022).",
"authors": "Matthis Manthe, Stefan Duffner, Carole Lartizien",
"award": "",
"id": "75",
"melba": "False",
"or_id": "1CyXExO15K",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P075.pdf",
"title": "Whole brain radiomics for clustered federated personalization in brain tumor segmentation",
"yt_full": "https://youtu.be/NCbSwfMRPzg"
},
"P076": {
"abstract": "Automated generation of clinically accurate radiology reports can improve patient care. Previous report generation methods that rely on image captioning models often generate incoherent and incorrect text due to their lack of relevant domain knowledge, while retrieval-based attempts frequently retrieve reports that are irrelevant to the input image. In this work, we propose Contrastive X-Ray REport Match (X-REM), a novel retrieval-based radiology report generation module that uses an image-text matching score to measure the similarity of a chest X-ray image and radiology report for report retrieval. We observe that computing the image-text matching score with a language-image model can effectively capture the fine-grained interaction between image and text that is often lost when using cosine similarity. X-REM outperforms multiple prior radiology report generation modules in terms of both natural language and clinical metrics. Human evaluation of the generated reports suggests that X-REM increased the number of zero-error reports and decreased the average error severity compared to the baseline retrieval approach. Our code is available at: https://github.com/rajpurkarlab/X-REM",
"authors": "Jaehwan Jeong, Katherine Tian, Andrew Li, Sina Hartung, Subathra Adithan, Fardad Behzadi, Juan Calle, David Osayande, Michael Pohlen, Pranav Rajpurkar",
"award": "",
"id": "76",
"melba": "False",
"or_id": "aZ0OuYMSMMZ",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P076.pdf",
"title": "Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation",
"yt_full": "https://youtu.be/jqlCFoU-w2U"
},
"P078": {
"abstract": "This work proposes MultiTask Learning for accelerated-MRI Reconstruction and Segmentation (MTLRS). Unlike the common single-task approaches, MultiTask Learning identifies relations between multiple tasks to improve the performance of all tasks. The proposed MTLRS consists of a unique cascading architecture, where a recurrent reconstruction network and a segmentation network inform each other through hidden states. The features of the two networks are shared and implicitly enforced as inductive bias. To evaluate the benefit of MTLRS, we compare performing the two tasks of accelerated-MRI reconstruction and MRI segmentation with pre-trained, sequential, end-to-end, and joint approaches. A synthetic multicoil dataset is used to train, validate, and test all approaches with five-fold cross-validation. The dataset consists of 3D FLAIR brain data of relapsing-remitting Multiple Sclerosis patients with known white matter lesions. The acquisition is prospectively undersampled by approximately 7.5 times compared to clinical standards. Reconstruction performance is evaluated by Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio (PSNR). Segmentation performance is evaluated by Dice score for combined brain tissue and white matter lesion segmentation and by per lesion Dice score. Results show that MTLRS outperforms other evaluated approaches, providing high-quality reconstructions and accurate white matter lesion segmentation. A significant correlation was found between the performance of both tasks (SSIM and per lesion Dice score, $\\rho=0.92$, $p=0.0005$). Our proposed MTLRS demonstrates that accelerated-MRI reconstruction and MRI segmentation can be effectively combined to improve performance on both tasks, potentially benefiting clinical settings.",
"authors": "Dimitrios Karkalousos, Ivana Isgum, Henk Marquering, Matthan W. A. Caan",
"award": "",
"id": "78",
"melba": "False",
"or_id": "ci2Fg31H0T",
"oral": "False",
"pmlr_url": "",
"poster_loc": "M20",
"schedule": "Monday, July 10: Posters \u2014 11:00\u201312:00 & 15:00\u201316:00\n",
"short": "False",
"slides": "",
"title": "MultiTask Learning for accelerated-MRI Reconstruction and Segmentation of Brain Lesions in Multiple Sclerosis",
"yt_full": ""
},
"P082": {
"abstract": "Recently, the study of multi-modal brain networks has dramatically facilitated the efficiency in brain disorder diagnosis by characterizing multiple types of connectivity of brain networks and their intrinsic complementary information. Despite the promising performance achieved by multi-modal technologies, most existing multi-modal approaches can only learn from samples with complete modalities, which wastes a considerable amount of mono-modal data. Otherwise, most existing data imputation approaches still rely on a large number of samples with complete modalities. In this study, we propose a modal-mixup data imputation method by randomly sampling incomplete samples and synthesizing them into complete data for auxiliary training. Moreover, to mitigate the noise in the complementary information between unpaired modalities in the synthesized data, we introduce a bilateral network with deep supervision for improving and regularizing mono-modal representations with disease-specific information. Experiments on the ADNI dataset demonstrate the superiority of our proposed method for disease classification in terms of different rates of samples with complete modalities.",
"authors": "Yanwu Yang, Hairui Chen, Zhikai Chang, Yang Xiang, Chenfei Ye, Ting Ma",
"award": "",
"id": "82",
"melba": "False",
"or_id": "WjrcYNTPunQ",
"oral": "False",
"pmlr_url": "",
"poster_loc": "Virtual only",
"schedule": "Wednesday, July 12: Virtual poster session - 8:00\u20139:00\n",
"short": "False",
"slides": "/virtual/poster/P082.pdf",
"title": "Incomplete learning of multi-modal connectome for brain disorder diagnosis via modal-mixup and deep supervision",
"yt_full": "https://youtu.be/4eLh_bwT_Tk"
},
"P083": {
"abstract": "The use of supervised deep learning techniques to detect pathologies in brain MRI scans can be challenging due to the diversity of brain anatomy and the need for annotated data sets. An alternative approach is to use unsupervised anomaly detection, which only requires sample-level labels of healthy brains to create a reference representation. This reference representation can then be compared to unhealthy brain anatomy in a pixel-wise manner to identify abnormalities. To accomplish this, generative models are needed to create anatomically consistent MRI scans of healthy brains. While recent diffusion models have shown promise in this task, accurately generating the complex structure of the human brain remains a challenge. In this paper, we propose a method that reformulates the generation task of diffusion models as a patch-based estimation of healthy brain anatomy, using spatial context to guide and improve reconstruction. We evaluate our approach on data of tumors and multiple sclerosis lesions and demonstrate a relative improvement of 25.1% compared to existing baselines.",
"authors": "Finn Behrendt, Debayan Bhattacharya, Julia Kr\u00fcger, Roland Opfer, Alexander Schlaefer",
"award": "",
"id": "83",
"melba": "False",
"or_id": "O-uZr5S1tJE",