-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
executable file
·1492 lines (1489 loc) · 119 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!--
Dependecies:
jQuery JavaScript Library v1.4.2
http://jquery.com/
Copyright 2010, John Resig
Dual licensed under the MIT or GPL Version 2 licenses.
http://jquery.org/license
-----------------------------------------------------
Bootstrap
Copyright (c) 2011-2019 Twitter, Inc.
Copyright (c) 2011-2019 The Bootstrap Authors
https://github.com/twbs/bootstrap/blob/v4.3.1/LICENSE
-----------------------------------------------------
Vanta
https://github.com/tengbao/vanta
-----------------------------------------------------
Author: Joshua Cao | yuchenca@andrew.cmu.edu
Webiste: 16726 course website
-->
<!doctype html>
<html>
<head>
<!-- icon -->
<link rel="icon" href="./media/joshua.ico" type="image/x-icon">
<link rel="shortcut icon" href="./media/joshua.ico" type="image/x-icon">
<link rel="bookmark" href="./media/joshua.ico" type="image/x-icon">
<!-- title -->
<title>16726-Yuchenca</title>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0" name="viewport">
<meta name="description" content="16726-Yuchenca">
<meta name="keywords" lang="de" content="16726-Yuchenca">
<!-- jQuery -->
<script type="text/javascript" src="./js/jQuery-3.3.1.js"></script>
<!-- Bootstrap -->
<link type="text/css" href="./css/bootstrap.min.css" rel='stylesheet'>
<script type="text/javascript" src="./js/bootstrap.min.js"></script>
<!-- main -->
<link type="text/css" href="./css/main.css" rel='stylesheet'>
<script type="text/javascript" src="./js/main.js"></script>
<!-- vanta -->
<script type="text/javascript" src="./js/three.min.js"></script>
<script type="text/javascript" src="./js/ring.min.js"></script>
<script type="text/javascript" src="./js/cloud.min.js"></script>
</head>
<body>
<div style="width:100%;height:100%">
<!-- title -->
<div id="cloud">
<div class="row">
<div class="col-12 mx-12">
<div id="title" class="text-center">
<span id="title-top" class="font-weight-bold">16726 Learning-based Image Synthesis Spring 22</span>
<br>
<span id="title-bot">Joshua Cao | Carnegie Mellon University</span>
</div>
</div>
</div>
<br><br>
<!-- navigation -->
<ul id="navigation" class="nav nav-tabs font-weight-bold text-center" role="tablist">
<li class="nav-item">
<a class="nav-link" id="home-tab" data-toggle="tab" href="#home" role="tab" aria-controls="home" aria-selected="false">Home</a>
</li>
<li class="nav-item">
<a class="nav-link" id="a1-tab" data-toggle="tab" href="#a1" role="tab" aria-controls="a1" aria-selected="false">Assignment1</a>
</li>
<li class="nav-item">
<a class="nav-link" id="a2-tab" data-toggle="tab" href="#a2" role="tab" aria-controls="a2" aria-selected="false">Assignment2</a>
</li>
<li class="nav-item">
<a class="nav-link" id="a3-tab" data-toggle="tab" href="#a3" role="tab" aria-controls="a3" aria-selected="true">Assignment3</a>
</li>
<li class="nav-item">
<a class="nav-link" id="a4-tab" data-toggle="tab" href="#a4" role="tab" aria-controls="a4" aria-selected="false">Assignment4</a>
</li>
<li class="nav-item">
<a class="nav-link" id="a5-tab" data-toggle="tab" href="#a5" role="tab" aria-controls="a5" aria-selected="false">Assignment5</a>
</li>
<li class="nav-item">
<a class="nav-link active" id="final-tab" data-toggle="tab" href="#final" role="tab" aria-controls="final" aria-selected="false">Final</a>
</li>
</ul>
</div>
<div class="tab-content container" id="TabContent">
<div class="tab-pane fade" id="home" role="tabpanel" aria-labelledby="home-tab">
<div class="row">
<div class="col-12 my-5 mx-3">
<h2>About Course</h2>
<hr class="col-xs-12 mr-4">
<p class="col-xs-12 mr-4">
<a class="underline" href="https://learning-image-synthesis.github.io/sp22/" target="_blank">16-726 Learning-Based Image Synthesis / Spring 2022</a> is led by <a class="underline" href="https://www.cs.cmu.edu/~junyanz/" target="_blank">Professor Jun-yan Zhu</a>, and assisted by TAs <a class="underline" href="https://peterwang512.github.io/" target="_blank">Sheng-Yu Wang</a> and <a class="underline" href="https://linzhiqiu.github.io/" target="_blank">Zhi-Qiu Lin</a>.
</p>
<p class="col-xs-12 mr-4">
This course introduces machine learning methods for image and video synthesis. The objectives of synthesis research vary from modeling statistical distributions of visual data, through realistic picture-perfect recreations of the world in graphics, and all the way to providing interactive tools for artistic expression. Key machine learning algorithms will be presented, ranging from classical learning methods (e.g., nearest neighbor, PCA, Markov Random Fields) to deep learning models (e.g., ConvNets, deep generative models, such as GANs and VAEs). We will also introduce image and video forensics methods for detecting synthetic content. In this class, students will learn to build practical applications and create new visual effects using their own photos and videos.
</p>
<h2>Assigment Summary</h2>
<hr class="col-xs-12 mr-4">
<div class="row mx-1">
<div class="col-12">
<p class="col-xs-12">
<table class="table">
<thead>
<tr>
<th scope="col"></th>
<th scope="col">Topic</th>
<th scope="col">Abstract</th>
<th scope="col">Reference</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">A1</th>
<td>Colorizing the Prokudin-Gorskii Photo Collection</td>
<td>Implement SSD, pyramid structure, USM, auto crop, contrast methods </td>
<td>
<a class="underline" href="http://lcweb2.loc.gov/master/pnp/prok/" target="_blank">Dataset</a><br>
<a class="underline" href="https://en.wikipedia.org/wiki/Unsharp_masking" target="_blank">USM</a><br>
<a class="underline" href="https://en.wikipedia.org/wiki/Hough_transform" target="_blank">Hough Transform</a><br>
</td>
</tr>
<tr>
<th scope="row">A2</th>
<td>Gradient Domain Fusion</td>
<td>Implement Poisson Blending, Mixed Blending, Color2Gray</td>
<td>
<a class="underline" href="https://erkaman.github.io/posts/poisson_blending.html" target="_blank">Poisson Blending</a><br>
</td>
</tr>
<tr>
<th scope="row">A3</th>
<td>When Cats meet GANs</td>
<td>Implement DCGAN, CycleGAN</td>
<td>
<a class="underline" href="https://data-efficient-gans.mit.edu/datasets/" target="_blank">Dataset</a><br>
<a class="underline" href="https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix" target="_blank">CycleGAN-pix2pix</a><br>
</td>
</tr>
<tr>
<th scope="row">A4</th>
<td>Neural Style Transfer</td>
<td>Vgg-19, style transfer</td>
<td></td>
</tr>
<tr>
<th scope="row">A5</th>
<td>GAN Photo Editing</td>
<td>Inverted GAN, StyleGAN2, Interpolation, Sketch2Image</td>
<td>
<a class="underline" href="https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/images/generated-afhqwild/" target="_blank">AdaWild Dataset</a><br>
<a class="underline" href="https://drive.google.com/file/d/1p9SlAZ_lwtewEM-UU6GvYEdQWTV1K-_g/view" target="_blank">256/128 Resolution Cat Dataset</a>
</td>
</tr>
</tbody>
</table>
</p>
</div>
</div>
<h2>Copyright</h2>
<hr class="col-xs-12 mr-4">
<p class="col-xs-12 mr-4">
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png"></a> All datasets, teaching resources and training networks on this page are copyright by Carnegie Mellon University and published under the <a class="underline" rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license.
</p>
</div>
</div>
</div>
<div class="tab-pane fade" id="a1" role="tabpanel" aria-labelledby="a1-tab">
<div class="row">
<div class="col-12 my-5 mx-3">
<h2>Assignment #1</h2>
<hr class="col-xs-12 mr-4">
<h3>Introduction</h3>
<p class="col-xs-12 mr-4">
The Prokudin-Gorskii image collection from the Library of Congress is a series of glass plate negative photographs taken by Sergei Mikhailovich Prokudin-Gorskii. To view these photographs in color digitally, one must overlay the three images and display them in their respective RGB channels. However, due to the technology used to take these images, the three photos are not perfectly aligned. The goal of this project is to automatically align, clean up, and display a single color photograph from a glass plate negative.
</p>
<hr class="col-xs-12 mr-4">
<h3>Direct Method</h3>
<p class="col-xs-12 mr-4">
Before diving into algorithms, I decide to blend R, G, B images directly and have an intuitive feeling of the tasks, which at the same time provides a reference to compare how far my algorithm can go. There are total 1 small jpeg and 9 large tiff images from the given dataset. Their direct blending goes as below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a1/direct/cathedral.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/emir.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/harvesters.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/icon.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/lady.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/self_portrait.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/three_generations.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/train.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/turkmen.jpg" height="180px" alt="image_left">
<img src="./media/a1/direct/village.jpg" height="180px" alt="image_left">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h3>SSD & NCC Alignment</h3>
<p class="col-xs-12 mr-4">
I implement both SSD and NCC to compare the small patch's similarity for alignment, the search range is [-15,15], and the algorithm works both well on the small jpeg image as shown below. To speed up the calculation, I also cropped 20% of each side of the image to decrease calculation on the edge. However, to deal with larger image, not only the search range is not large enough, but also the calulation takes extremely long. Therefore, the pyramid structure comes to practice.
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a1/pyramid/cathedral.jpg" height="280px" alt="image_left">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h3>Pyramid Aligment</h3>
<p class="col-xs-12 mr-4">
I use log2(min(image.shape)) to find out how many mamixmum layers the image can have, and add conditions to only apply the pyramid algorithm for images larger than 512*512, and the small image can directly use [-15,15] SSD search. For large images, my starting layer is a size around 265 pixels(2^8 as first layer), and exhaustively search till the original image size, because I realized missing final layer will give me color bias all the time(misalignment of color channel is very easy to detect even though it's just small pixels). The first implementation of my method took 180s for one image. To speed it up, I recursively decrease search region by 2 each time to shorten it to 55 seconds per iamge, because the center of search box is determined by last layer, so the deeper algorithm search, the smaller search range it requires to find the best alignment. The output is listed as below. As you can see, most of the images are aligned quite well but image like the piece of Sergei Mikhailovich Prokudin-Gorskii, work even worse than direct alignment, this is because the brightness of the images are different, therefore, I use an USM(Unsharp Mask) algorithm to fix the issue.
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a1/pyramid/cathedral.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/emir.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/harvesters.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/icon.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/lady.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/self_portrait.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/three_generations.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/train.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/turkmen.jpg" height="180px" alt="image_left">
<img src="./media/a1/pyramid/village.jpg" height="180px" alt="image_left">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h2>Extra Credits</h2>
<h3>USM Unsharp Mask</h3>
<p class="col-xs-12 mr-4">
The USM algorithm is mainly to sharpen or soften the edge of images, and allows accurate SSD difference to make better alignment. The algorithm is called for each recursion in the pyramid alignment, and it first uses Gaussian Blur to blur the single channel(gray) image, and subtract it from the original image, then I take the region of difference that is larger than certain threshold and subtract them from the original image and multiple certain constant parameters. Here I use subraction because I notice certain edge needs to be softened instead of being sharpened, due to some disturbing edges stand out too much in the original image that makes SSD find the wrong alignment. The USM specifically improves the quality of this image:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a1/usm/emir.jpg" height="280px" alt="image_left">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h3>Crop Image</h3>
<p class="col-xs-12 mr-4">
To get rid of the borders, I mainly use two cropping method, the first one is to keep area only for all three channels have contents, and remove those blank area caused by alignment, this is implemented by retriving shift of each single channel image. But this method can't deal so well with the region that is originally black or white outside the image. Then I use a MSE(Mean Square Error) method to calculate each row and each column's error, and set up when three adjacent rows or columns all are smaller than certain threshold, it is the area should be cropped. The result shows as below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a1/crop/cathedral.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/emir.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/harvesters.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/icon.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/lady.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/self_portrait.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/three_generations.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/train.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/turkmen.jpg" height="200px" alt="image_left">
<img src="./media/a1/crop/village.jpg" height="200px" alt="image_left">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h3>Add Contrast</h3>
<p class="col-xs-12 mr-4">
The contrast method is pretty straight-forward, I just calculate the accumulative histogram of the image, and take 5% and 95% as 0 and 255 respectively, and stretch the color value in between so that the contrast of the main image increase.
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a1/contrast/cathedral.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/emir.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/harvesters.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/icon.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/lady.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/self_portrait.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/three_generations.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/train.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/turkmen.jpg" height="200px" alt="image_left">
<img src="./media/a1/contrast/village.jpg" height="200px" alt="image_left">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h3>Other Dataset</h3>
<p class="col-xs-12 mr-4">
I find some other similar <a class="underline" href="http://lcweb2.loc.gov/master/pnp/prok/" target="_blank">dataset</a> that has pretty large tiff image to test the algorithm. The result shows as below
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a1/outdata_crop/00001a.jpg" height="220px" alt="image_left">
<img src="./media/a1/outdata_crop/00002u.jpg" height="220px" alt="image_left">
<img src="./media/a1/outdata_crop/00004a.jpg" height="220px" alt="image_left">
<img src="./media/a1/outdata_crop/00005u.jpg" height="220px" alt="image_left">
</div>
</div>
</div>
</div>
</div>
<div class="tab-pane fade" id="a2" role="tabpanel" aria-labelledby="a2-tab">
<div class="row">
<div class="col-12 my-5 mx-3">
<h2>Assignment #2</h2>
<hr class="col-xs-12 mr-4">
<h3>Introduction</h3>
<p class="col-xs-12 mr-4">
The project explores the gradient-domain processing in the practice of image blending, tone mapping and non-photorealistic rendering. The method mainly focuses on the Poisson Blending algorithm. The tasks include primary gradient minimization, 4 neighbours based Poisson blending, mixed gradient Poisson blending and grayscale intensity preserved color2gray method. The whole project is implemented in Python.
</p>
<hr class="col-xs-12 mr-4">
<h3>Toy Problem</h3>
<p class="col-xs-12 mr-4">
The toy problem is a simplifed version of Poisson blending algorithm, therefore it helps understanding the Poisson blending a lot. The major functions are three, to calculate the gradient of x axis and y axis, and to align the left-top corner (0,0) of the image:
</p>
<p class="col-xs-12 mr-4">
<b>
((v(x+1,y)−v(x,y))−(s(x+1,y)−s(x,y)))**2<br>
((v(x,y+1)−v(x,y))−(s(x,y+1)−s(x,y)))**2<br>
(v(1,1)−s(1,1))**2
</b>
</p>
<p class="col-xs-12 mr-4">
We use the equations to loop through each pixel of the source image to construct A and b. By solving the least square form of Av=b, we can get the synthesized image v. Say the given gray image size is H by W, A's dimension is <b>H*W</b> by <b>2*H*W+1</b>, b's dimension is <b>H*W</b> by <b>1</b>. The result is pretty much to test if we can copy the original image, as shown below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/toy.png" height="400px" alt="image_left">
</div>
</div>
<p class="col-xs-12 mr-4">
I implemented it in both loop method and non-loop method, the interesting thing is that with loop method, it only takes around 0.4s, whereas with the non-loop method, which is supposed to be faster in Python environment than loop, turns out to take around 10s. I think the major reason is that sparse matrix's arithmetic calculation is more expensive than directly assign value to coordinates. In the non-loop method, I mainly use lil_matrix to construct sparse matrix, and use np.roll, np.transpose to construct A matrix. Finally I decide to use the loop method for the rest of the task.
</p>
<hr class="col-xs-12 mr-4">
<h3>Poisson Blending</h3>
<p class="col-xs-12 mr-4">
Based on toy problem's hint, Poisson Blending explores the four neighbour of each pixel, follow the equation below that <b>v</b> is the synthesized vector that we need to solve, <b>s</b> is the source image in the size of target image(but we're only interested in the masked source image), <b>t</b> is the target image:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/poisson.png" height="80px" alt="image_left">
</div>
</div>
<p class="col-xs-12 mr-4">
In the equation, each <b>i</b> deals with 4 <b>j</b>, i.e. the same <b>i</b> is calculated 4 times with 4 different neighbour <b>j</b>. The left part considers the condition if all the neighbour of i is still inside the mask, and the right part considers the neighbour not all inside the mask. Notice the difference in code is that for the right part, <b>tj</b> is used to construct parameter <b>b</b>, whereas for the left part, <b>vj</b> is used to construct parameter <b>A</b>.
</p>
<p class="col-xs-12 mr-4">
Also, the given image now has RGB, 3 channels, therefore we need to calculate each channel separately. When I implemented it, I consider that <b>A</b> matrix is always the same for 3 channels, but <b>b</b> are different for each channel, therefore I only calculate <b>A</b> once and <b>b</b> 3 times to speed up the algorithm. What's more, since the we only need to generate image from the maksed source image's coordinate, I only loop through this area to speed up. And finally I merged three v together to get new RGB image. The average speed is related to the image size, to deal with the given example of <b>130x107</b>(source image) and <b>250x333</b>(taget image), it generally takes around 20s.
</p>
<p class="col-xs-12 mr-4">
However, the naive Poisson Blending has some issue to deal with blending the image seamlessly, which is because only considering the source neighbour is not enough for strong difference of target and source images. As you can see the image below, the inner part of the ballon house is little blur and not blended so well with the background. Therefore, we need to implement Mixed Blending.
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/house.jpg" height="150px">
<img src="./media/a2/house_mask.png" height="150px">
<img src="./media/a2/mountain.jpg" height="350px">
<img src="./media/a2/ballon_blend.png" height="600px">
</div>
</div>
<h2>Extra Credits</h2>
<h3>Mixed Gradients</h3>
<p class="col-xs-12 mr-4">
Mixed gradients is actually pretty straight forward based on Poisson Blending, we just add one condition that instead calculating the gradient of source image <b>s</b>, we compare the gradient of <b>s</b> and gradient of <b>t</b>, and take the larger one so that it can better blend when the difference of source and target image are large. The comparison of the same image can be seen as below(The left is blendered without Mixed gradients.):
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/ballon_blend.png" height="500px">
<img src="./media/a2/ballon.png" height="500px">
</div>
</div>
<p class="col-xs-12 mr-4">
The given example of bear and pool is below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/source_01.jpg" height="150px">
<img src="./media/a2/source_01_mask.png" height="150px">
<img src="./media/a2/target_01.jpg" height="350px">
<img src="./media/a2/poisson_blend.png" height="600px">
</div>
</div>
<p class="col-xs-12 mr-4">
There are some more generated blending images:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/obama.jpg" height="150px">
<img src="./media/a2/obama_mask.png" height="150px">
<img src="./media/a2/mona_lisa.jpg" height="300px">
<img src="./media/a2/obama.png" height="600px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/swimmer.jpg" height="100px">
<img src="./media/a2/swimmer_mask.png" height="100px">
<img src="./media/a2/road.jpg" height="250px">
<img src="./media/a2/swimmer.png" height="600px">
</div>
</div>
<p class="col-xs-12 mr-4">
And there is a failure case where the colorful Patrix can't blend so well with the gray-scale like moon image:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/patrick.jpg" height="100px">
<img src="./media/a2/patrick_mask.png" height="100px">
<img src="./media/a2/moon.jpg" height="200px">
<img src="./media/a2/patrick.png" height="600px">
</div>
</div>
<p class="col-xs-12 mr-4">
This is mainly because there is upper limitation of the blending algorithm to adjust the color, if the source image and target image have too large difference, the algorithm will reach its limitation to find the solution that best approximate the least square function.
</p>
<h3>Color2Gray</h3>
<p class="col-xs-12 mr-4">
The Color2Gray method first turns RGB image to the HSV color space, and only consider the S and V channels to represent the color contrast and intensity respectively. In this way, we can keep the color contrast of rgb image and preserve the grayscale intensity at the same time. The algorithm runs similar to the Mixed Gradient where source and target image are S and V. The result is shown as below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a2/colorBlindTest.png" height="300px">
<img src="./media/a2/color2gray.png" height="600px">
</div>
</div>
</div>
</div>
</div>
<div class="tab-pane fade" id="a3" role="tabpanel" aria-labelledby="a3-tab">
<div class="row">
<div class="col-12 my-5 mx-3">
<h2>Assignment #3</h2>
<hr class="col-xs-12 mr-4">
<h3>Introduction</h3>
<p class="col-xs-12 mr-4">
This project implements two famous GAN architecture: DCGAN and CycleGAN. It is programmed in Pytorch, the major code includes the build-up of discriminator and generator neural network, loss function, forward and backward propagations. It also explores different methods that help GAN generate better results, such as Data Agumentation, Differentiable Augmentation, variance of different lose functions, variance of different discriminators, and implemented in different dataset to check the robustness fo the network.
</p>
<hr class="col-xs-12 mr-4">
<h3>Part I: Deep Convulotional GAN</h3>
<p class="col-xs-12 mr-4"><b>Implement Data Augmentation</b></p>
<p class="col-xs-12 mr-4">
In Pytorch, data augmentation is invoked each time when iterated through the mini-batch, the purpose of the data augmentation is add variance to the dataset, which is especially useful when the dataset is small or each sample of the data are too similar. For example, in this DCGAN training, we only have 204 images as dataset. I mainly use the Resize + RandomCrop, RandomHorizontalFlip, RandomRotation(10 angles) to generate augmentation. The example of orignal dataset and augmented dataset, and GAN generated sample by different dataset can be see here:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/real-basic.png" height="300px">
<img src="./media/a3/real-basic_diff.png" height="300px">
<img src="./media/a3/real-deluxe.png" height="300px">
<img src="./media/a3/real-deluxe-diff.png" height="300px">
</div>
</div>
<p class="col-xs-12 mr-4"><b>The Discriminator</b></p>
<p class="col-xs-12 mr-4">
The discriminator in this DCGAN is a convolutional neural network with the following architecture:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/discriminator.png" height="300px">
</div>
</div>
<p class="col-xs-12 mr-4">
According to the formula <b>(N-F+2P)/S + 1 = M</b>(We don't consider dilation here), where <b>N</b> is the input number of channels, <b>M</b> is the output number of channels, <b>S</b> is stride, <b>P</b> is padding, <b>F</b> is size of filter/kernel, since we want <b>N=2M</b>, and we use filter size of 4, stride of 2, we can calculate that padding <b>P</b> is 1.
Besides, I use softmax for the output layer, and later on squared mean difference for loss function, since this Discriminator is a classification problem, the softmax-loss combination is a generally good choice.
</p>
<p class="col-xs-12 mr-4"><b>The Generator</b></p>
<p class="col-xs-12 mr-4">
The generator in this DCGAN is a convolutional neural network with the following architecture:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/generator.png" height="450px">
</div>
</div>
<p class="col-xs-12 mr-4">
In the generator neural network, I use transposed convolution with a filter, size of 4, stride of 1 and padding of 0 for the very first layer that from 100x1x1 input to 256x4x4 output. And the rest layer all are upsampling of 2, with a filter, size of 3, stride of 1 and padding of 1 to satisfy the condition that output dimension is the 2 times of input dimension. And the reason why the first layer is a transposed convolution is that it has better performance than direct upsampling for noise.
</p>
<p class="col-xs-12 mr-4"><b>The Training Loop</b></p>
<p class="col-xs-12 mr-4">
The training loop including loss function and backpropagation is as straight forward as image below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/gan_algo.png" height="450px">
</div>
</div>
<p class="col-xs-12 mr-4">
One thing to notice is that we normally first do the back propagation to update the gradients and weights of discriminator, then do the generator, this is because the loss function first goes back to discriminator then to the generator. To better train generator's weight, we prefer the gradient and weights of discriminator is static than dynamic.
</p>
<p class="col-xs-12 mr-4">
Similarly, when training the discriminator, we don't want to update the gradient all the way back to generator, therefore we have to set no_grad_up for the fake_image that is generated from generator and fed to discriminator. In my case, I use torch.detach() function.
</p>
<p class="col-xs-12 mr-4"><b>The Differentiable Augmentation</b></p>
<p class="col-xs-12 mr-4">
The Differentiable Augmentation method is meant to process data during the training process, it can slow down the training but can significantly improve the output's performance. And it's shown as below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/method.jpg" height="100px">
</div>
</div>
<p class="col-xs-12 mr-4">
I apply differentiable augmentation to all the generator generated fake images and real images during the training process.
</p>
<p class="col-xs-12 mr-4"><b>Results</b></p>
<p class="col-xs-12 mr-4">
After the training with learning rate = 0.0002, beta1 = 0.5, beta2 = 0.999, epoch = 500, batch_size = 16, I get the results as below. From left to right, they are generated image with basic data augmentation, with basic and differentiable data augmentation, with deluxe data augmentation, and with deluxe and differentiable data augmentation.
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/basic.png" height="300px">
<img src="./media/a3/basic_diff.png" height="300px">
<img src="./media/a3/deluxe.png" height="300px">
<img src="./media/a3/deluxe_diff.png" height="300px">
</div>
</div>
<p class="col-xs-12 mr-4">
The result is interesting. It's obvious the basic method has a clumzy, hard to tell cat generation. And the differentiable method clearly has some color intensity and white balance shift, it actually proves the ability of this method to increase the robustness towards the RGB color intensity, and generate a more general color tone results, also it has a better recognition in the shape of cat. The deluxe result has a better detail than the differential method. And finally, the mix of deluxe and differential method is hard to tell if it's better than a single method, because some of the generation has both good detail and color tone but some are more blur or distorted, but if we only want one best result from a group of samples, this mix method definitely works better than either of the single method.
</p>
<p class="col-xs-12 mr-4">
Similarly to the order above, I also draw the loss curve along with every 200 iteration as below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/loss_basic.png" height="250px">
<img src="./media/a3/loss_basic_diff.png" height="250px">
<img src="./media/a3/loss_deluxe.png" height="250px">
<img src="./media/a3/loss_deluxe_diff.png" height="250px">
</div>
</div>
<p class="col-xs-12 mr-4">
From the loss curve, we can tell that the base and deluxe both have the trend to overfitting after the training, though dexule has a relatively better loss(where G and D both are close to 0.5), and with the help of differentiable augmentation, the loss is more close to the ideal value, which means the network has a better robust performance on general data.
</p>
<hr class="col-xs-12 mr-4">
<h3>Part II: CycleGAN</h3>
<p class="col-xs-12 mr-4"><b>The Dual Generator</b></p>
<p class="col-xs-12 mr-4">
The generator of CycleGAN is a convolutional neural network with the following architecture:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/cyclegan_generator.png" height="450px">
</div>
</div>
<p class="col-xs-12 mr-4">
In the generator neural network, there are two generator for X->Y and Y->X, they are symmetric in input and output, and their architecture are identical in this project. The filter size is the same as in DCGAN, refer to them in convolution and upconvolution layer respectively. And one major difference is the ResnetBlock, which is used 3 times here, which aims to make sure characteristics of the output image (e.g., the shapes of objects) do not differ too much from the input. One major difference of this generator from the one of DCGAN is that its input is not noise anymore but an image.
</p>
<p class="col-xs-12 mr-4"><b>The PatchDiscriminator</b></p>
<p class="col-xs-12 mr-4">
A major difference of this PatchDiscriminator from the Discriminator of DCGAN is that its output is a 4x4 patch instead of a loss value, which means the output layer doesn't require a softmax function anymore, and the rest are pretty much the same.
</p>
<p class="col-xs-12 mr-4"><b>The Training Loop</b></p>
<p class="col-xs-12 mr-4">
The training loop including loss function and backpropagation is as straight forward as image below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/cyclegan_algo.png" height="550px">
</div>
</div>
<p class="col-xs-12 mr-4">
The major loss function and pipeline are similar to DCGAN, except for that for generator, since we have two generators, and we want to train them at the same time, so we add both X->Y and Y->X loss here; And for the discriminator, it goes the same way to train the loss from fake images generated from X->Y and Y->X.
</p>
<p class="col-xs-12 mr-4"><b>The Cycle Consistency</b></p>
<p class="col-xs-12 mr-4">
The cycle consistency is like the soul part of the CycleGAN from my own experience of experiment, it aims to
</p>
<p class="col-xs-12 mr-4"><b>The CycleGAN Experiments</b></p>
<p class="col-xs-12 mr-4">
My training for CycleGAN follows the learning rate = 0.0002, beta1 = 0.5, beta2 = 0.999, batch_size = 16 same as DCGAN, and I always use the Differentiable Augmentation since it helps increase robustness towards the RGB intensity and other factors. I first start by testing the CycleGAN with epoch = 1000, with and without cycle consistency, and the results are(Left side are without cycle, right side are with cycle; up side are X->Y, down side are Y->X):
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/sample-001000-X-Y.png" height="350px">
<img src="./media/a3/sample-cycle-001000-X-Y.png" height="350px">
</div>
</div>
<br>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/sample-001000-Y-X.png" height="350px">
<img src="./media/a3/sample-cycle-001000-Y-X.png" height="350px">
</div>
</div>
<p class="col-xs-12 mr-4">
It's clearly that 1000 epoch is not enough to get a good output, but from the basic shape of it, and the loss function below, we can tell that the general direction is good to extend the epoch.(Left without cycle, right with cyckle) And we can tell that the cycle consistency has a slightly better output.
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/loss_10000_cat_nocycle.png" height="350px">
<img src="./media/a3/loss_10000_cat_allon.png" height="350px">
</div>
</div>
<p class="col-xs-12 mr-4">
Next I extend the epoch to 10000, and train the CycleGAN with two datasets, also I made comparison that with and without cycle consistency, patchDiscriminator and DCDiscriminator. The result are listed below:(Left is patch+cycle, middle is patch no cycle, right is dc+cycle)
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/cycle-sample-010000-X-Y.png" height="200px">
<img src="./media/a3/nocycle-sample-010000-X-Y.png" height="200px">
<img src="./media/a3/dc-cycle-sample-010000-X-Y.png" height="200px">
</div>
</div>
<br>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/cycle-sample-010000-Y-X.png" height="200px">
<img src="./media/a3/nocycle-sample-010000-Y-X.png" height="200px">
<img src="./media/a3/dc-cycle-sample-010000-Y-X.png" height="200px">
</div>
</div>
<br>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/fruit-cycle-sample-010000-X-Y.png" height="200px">
<img src="./media/a3/fruit-nocycle-sample-010000-X-Y.png" height="200px">
<img src="./media/a3/fruit-dc-sample-010000-X-Y.png" height="200px">
</div>
</div>
<br>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/fruit-cycle-sample-010000-Y-X.png" height="200px">
<img src="./media/a3/fruit-nocycle-sample-010000-Y-X.png" height="200px">
<img src="./media/a3/fruit-dc-sample-010000-Y-X.png" height="200px">
</div>
</div>
<p class="col-xs-12 mr-4">
By observing the results above, we can see that without cycle consistency(middle column), the generated results have weird color, unclear shape. And for the DCDiscriminator(right column) and PatchDiscriminator(left column), they both achieve a relatively good result that generated fake image assembles the real image a lot. And they both have this color re-mapping effect, by comparison, I think the PatchDiscriminator has a slightly stronger color shift in all interested regions. It means that PatchDiscriminator can perform better pattern rematch effect.
</p>
<hr class="col-xs-12 mr-4">
<h3>Bonus</h3>
<p class="col-xs-12 mr-4"><b>Extra dataset</b></p>
<p class="col-xs-12 mr-4">
I choose one dataset from <a href="https://data-efficient-gans.mit.edu/datasets/">https://data-efficient-gans.mit.edu/datasets/</a> to apply to DCGAN, the output is like:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a3/obama-dc.png" height="400px">
</div>
</div>
</div>
</div>
</div>
<div class="tab-pane fade" id="a4" role="tabpanel" aria-labelledby="a4-tab">
<div class="row">
<div class="col-12 my-5 mx-3">
<h2>Assignment #4</h2>
<hr class="col-xs-12 mr-4">
<h3>Introduction</h3>
<p class="col-xs-12 mr-4">
Neural Style Transfer is a vgg-19 based nueral network, utlizes regression method MSE for loss function, and LBFGS for input image(noise) optimization. It only uses the feature extraction part of vgg-19, and only for evaluation purpose(no gradient optimizaiton for these layers), instead, the optimizaiton happens in the loss function and input(two ends). And the loss function consists of two parts, content loss and style loss, we'll implement them separately first, and then combine them together with assigned weights.
</p>
<hr class="col-xs-12 mr-4">
<h3>Part 1: Content Reconstruction</h3>
<p class="col-xs-12 mr-4">
Content reconstruction mainly tries to regress a noise to an input content image, so that noise can gradually resemble the content image, therefore, the loss function is a straightforward MSE, and in implementaion, because we don't want to update vgg-19, the detach is added at the end of vgg-19. I first try content loss for each single layer and have a general direciton of how the noise approxiate the content image under the impact of different dept of neural layer's feature. Here is the result with input of a dancing.jpeg:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/dancing.jpeg" height="300px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/Reconstructed_Image_Conv_1.png" height="200px">
<img src="./media/a4/Reconstructed_Image_Conv_2.png" height="200px">
<img src="./media/a4/Reconstructed_Image_Conv_3.png" height="200px">
<img src="./media/a4/Reconstructed_Image_Conv_4.png" height="200px">
<img src="./media/a4/Reconstructed_Image_Conv_5.png" height="200px">
</div>
</div>
<p class="col-xs-12 mr-4">
Compared with the input content image, it's not hard to tell that the shallower layer the content loss is put, the loser the noise ends up to the content image. I think the deeper layer output is more interesting, so I tried several conbinations for the deep layers from 3-5. The results are below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/Reconstructed_Image_Conv_34.png" height="200px">
<img src="./media/a4/Reconstructed_Image_Conv_35.png" height="200px">
<img src="./media/a4/Reconstructed_Image_Conv_45.png" height="200px">
<img src="./media/a4/Reconstructed_Image_Conv_345.png" height="200px">
<img src="./media/a4/Reconstructed_Image_Conv_12345.png" height="200px">
</div>
</div>
<p class="col-xs-12 mr-4">
First look at the last image, we can tell if we calculate content loss in every layer, the result is pretty much like single layer 34 and single layer 3, which means that shallow layer's loss has dominated the output. And 35, 45, 345 are pretty much close to layer 4 with different color saturation, the difference is trivial and the choice if more like a style preference. I personally like the faint style, so I simply choose to apply content loss to layer 4, because it seems easier with blend with style if the content image has low saturaion. And I use two different white noise to generate the content image, the results are very much the same.
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/Reconstructed_Image_Conv_4(1).png" height="300px">
<img src="./media/a4/Reconstructed_Image_Conv_4.png" height="300px">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h3>Part 2: Texture Synthesis</h3>
<p class="col-xs-12 mr-4">
Style synthesis is pretty much the same as content structure, except for that the output of vgg-19 is processed with gram_matrix and then calculated the MSE loss. Similarly, I first try different single layer with picasso.jpeg:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/picasso.jpeg" height="300px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/Synthesized_Texture_Conv_1.png" height="200px">
<img src="./media/a4/Synthesized_Texture_Conv_2.png" height="200px">
<img src="./media/a4/Synthesized_Texture_Conv_3.png" height="200px">
<img src="./media/a4/Synthesized_Texture_Conv_4.png" height="200px">
<img src="./media/a4/Synthesized_Texture_Conv_5.png" height="200px">
</div>
</div>
<p class="col-xs-12 mr-4">
We can see that the first layer has the strongest impact on the style, and the last layer only has few similarity in style with the origianl style image. And the shallow layer has dense and blur style, and the deep layer has light and sharp style. Because we want the style be strong to overwritten the content's original style, the first layer's result is more of what we want. So I focus on the front layer, and incrementally add more layers to the first layer, the results are below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/Synthesized_Texture_Conv_12.png" height="200px">
<img src="./media/a4/Synthesized_Texture_Conv_123.png" height="200px">
<img src="./media/a4/Synthesized_Texture_Conv_1234.png" height="200px">
<img src="./media/a4/Synthesized_Texture_Conv_12345.png" height="200px">
</div>
</div>
<p class="col-xs-12 mr-4">
From the results, I have a general feeling that when apply style loss to every layer, the output style image has both good sharpness and densive style, so I choose the layer 12345 for my preference. And I test it with different input random noise, and we can tell that the generated style is visually different just by eye-balled. This means everytime, we can have the similar but different style.
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/Synthesized_Texture_Conv_12345.png" height="300px">
<img src="./media/a4/Synthesized_Texture_Conv_12345(1).png" height="300px">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h3>Part 3: Style Transfer</h3>
<p class="col-xs-12 mr-4">
After deciding the content loss and style loss structure, it's time to merge them together in the network. In the implementation, to fit the purpose of merging different size of style and content image, I first resize both of the style and content image's smaller dimension to 512 if using cuda, and because we want to maintain content image as same size and ratio as original input, I fix the content's size, and do first do padding for the case when style image's longer dimension is content image's short dimension, this is because if there is empty(black) part in style image, the style feature can be hugely deprecated. And then I do centerCrop on the style image to fit exactly the same size with content image, this is also because we care more about the pattern and texture of the style image instead of its completeness.
</p>
<p class="col-xs-12 mr-4">
The major hyperParameters are the weights of content and style loss, I fix the content loss weight as 1, and adjust style loss differently, the results are shown below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/Output_Image_from_Noise_SW1_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW100_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW10000_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW100000_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW150000_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW250000_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW500000_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW600000_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW750000_CW1_PD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW1000000_CW1_PD_C4_S12345.png" height="200px">
</div>
</div>
<p class="col-xs-12 mr-4">
The weight of style loss ranges a big number, as you can see, for weight < 10^4, the style barely works, but when weight> 10^6, it overwrittens the content. Therefore, a good weight is range(10^4,10^6), however, there is no best weight for a general implementaion. I realized the best weight is different for different style and content input. My general choice is around 10^5. For dancing image and picasso style, I choose style weight of 1.5 * 10^5.
</p>
<p class="col-xs-12 mr-4">
To test the robustness of the implementation, I mix 3 content images and 3 style images to get 9 group of style transfer. The content & style input and the noise-based & content-based output are listed below:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/waterfall.png" height="200px">
<img src="./media/a4/boy2.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW50000_CW1_FF_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW50000_CW1_FF_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/waterfall.png" height="200px">
<img src="./media/a4/sphere2.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW50000_CW1_SD_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW50000_CW1_AB_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/waterfall.png" height="200px">
<img src="./media/a4/star.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW50000_CW1_SW_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW50000_CW1_SW_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/god.png" height="200px">
<img src="./media/a4/boy3.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW100000_CW1_FWally_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_FWally_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/god.png" height="200px">
<img src="./media/a4/sphere3.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW100000_CW1_SWally_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_SWally_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/god.png" height="200px">
<img src="./media/a4/star3.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW100000_CW1_StarWally_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_StarWally_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/garden.png" height="200px">
<img src="./media/a4/boy.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW50000_CW1_FP_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW50000_CW1_FP_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/garden.png" height="200px">
<img src="./media/a4/sphere.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW50000_CW1_EP_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW50000_CW1_EP_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/garden.png" height="200px">
<img src="./media/a4/star2.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW50000_CW1_SPP_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW50000_CW1_SPP_C4_S12345.png" height="200px">
</div>
</div>
<p class="col-xs-12 mr-4">
From the experiment, we can tell that images generated from noise and content image are different, and the weight can impact both the output, I export relatively good results for different images combination with different weights. I think the weights depends on the strength of the style image, and the initial color saturation and intensity of content image. By comparison, we can tell that noise generated image has stronger style pattern, and the pattern tends to overwritten the content feature, whereas the content image generated image has good style and content shape at the same time, therefore I mostly use content image to generate outputs for the later section. And the running time varies from image to image, but the average running time goes around 23s for noise image, 19s for content image. It's because the regression loss calculation is more time comsuming for noise task.
</p>
<p class="col-xs-12 mr-4">
I'm quite interested in the topic of architecture, and I collect a group of CMU campus photos, apply the style transfer on them. Here are the results:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/campus3.png" height="200px">
<img src="./media/a4/cyberpunk.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW50000_CW1_CyberCampus_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW50000_CW1_CyberCampus_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/campus1.png" height="200px">
<img src="./media/a4/gibli.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW100000_CW1_GibliCampus_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_GibliCampus_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/campus2.png" height="200px">
<img src="./media/a4/postfuture.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW100000_CW1_PostCampus_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW50000_CW1_PostCampus_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/campus4.png" height="200px">
<img src="./media/a4/titan.png" height="200px">
<img src="./media/a4/Output_Image_from_Noise_SW100000_CW1_TitanCampus_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_TitanCampus_C4_S12345.png" height="200px">
</div>
</div>
<hr class="col-xs-12 mr-4">
<h3>Bells & Whistles</h3>
<p class="col-xs-12 mr-4">
<b>Previous Assignments as Content Image</b>
</p>
<p class="col-xs-12 mr-4">
I choose two outputs from previous assignments of Poisson Blending and CycleGAN, and apply style transfer to them:
</p>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/deluxe_diff.png" height="150px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_ScreamCat_C4_S12345.png" height="200px">
</div>
</div>
<div class="row mr-2">
<div class="text-center col-12">
<img src="./media/a4/ballon.jpeg" height="150px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_StarBallon_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_ScreamBallon_C4_S12345.png" height="200px">
<img src="./media/a4/Output_Image_from_Content_Image_SW100000_CW1_PicassoBallon_C4_S12345.png" height="200px">
</div>
</div>
<hr class="col-xs-12 mr-4">
</div>
</div>
</div>
<div class="tab-pane fade" id="a5" role="tabpanel" aria-labelledby="a5-tab">
<div class="row">
<div class="col-12 my-5 mx-3">
<h2>Assignment #5</h2>
<hr class="col-xs-12 mr-4">
<h3>Introduction</h3>
<p class="col-xs-12 mr-4">
In this assignment, we implement a few different techniques to manipulate images on the manifold of natural images. First, we invert a pre-trained generator to find a latent variable that closely reconstructs the given real image. In the second part of the assignment, we take a hand-drawn sketch and generate an image that fits the sketch accordingly.
</p>
<hr class="col-xs-12 mr-4">
<h3>Part I: Inverting the Generator</h3>
<p class="col-xs-12 mr-4">
For the first part of the assignment, we mainly want to solve an optimization problem to reconstruct the image from a particular latent code. And the loss function consists of two parts: pixel-wise generated image and target(source) image's Lp loss; latent space level of content feature MSE loss. To expand it, we want to optimize the difference between image from StyleGAN2's generator and target image to get the better z input, here I test L1 and L2 loss; and at the same time we want to minimize the distance of manifold content feature between target and the z input, which in our case, we feed them to a VGG-19's feature layers, and get compute the loss of conv_3 and conv_4's output.
</p>
<p class="col-xs-12 mr-4">
StyleGAN2 has one remarkable feature to mapping Gaussian distribution of z into w or w+ by the mapping layers so that the noise has a better distribution to fit the model. For comparison of different generator, we experiment with both DCGAN and StyleGAN. Also, for noise sampling, we want to test the influence of distribution of the noise, we test with and without mean normalization, for mean we use N=10000 samples. And for the optimization, I use LBFGS. A group of Ablation experiment is listed below:
</p>
<div class="row mr-2">
<div class="text-center images col-12">
<figure>
<img src="./media/a5/part1/0_data.png" height="200px">
<figcaption>Source 0</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_vanilla_z_0.1_mean_l1_750.png" height="200px">
<figcaption>0_vanilla_z_0.1_mean_l1_750</figcaption>
</figure>
</div>
</div>
<div class="row mr-2">
<div class="text-center images col-12">
<figure>
<img src="./media/a5/part1/0_vanilla_z_0.1_no_mean_l1_1000.png" height="200px">
<figcaption>0_vanilla_z_0.1_no_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_vanilla_z_0.5_no_mean_l1_1000.png" height="200px">
<figcaption>0_vanilla_z_0.5_no_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_vanilla_z_1_no_mean_l1_1000.png" height="200px">
<figcaption>0_vanilla_z_1_no_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_vanilla_z_0.1_no_mean_l2_1000.png" height="200px">
<figcaption>0_vanilla_z_0.1_no_mean_l2_1000</figcaption>
</figure>
</div>
</div>
<div class="row mr-2">
<div class="text-center images col-12">
<figure>
<img src="./media/a5/part1/0_stylegan_w_0.1_mean_l1_1000.png" height="200px">
<figcaption>0_stylegan_w_0.1_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_stylegan_w_0.1_no_mean_l1_1000.png" height="200px">
<figcaption>0_stylegan_w_0.1_no_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_stylegan_w+_0.1_mean_l1_1000.png" height="200px">
<figcaption>0_stylegan_w+_0.1_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_stylegan_w+_0.1_mean_l2_1000.png" height="200px">
<figcaption>0_stylegan_w+_0.1_mean_l2_1000</figcaption>
</figure>
</div>
</div>
<div class="row mr-2">
<div class="text-center images col-12">
<figure>
<img src="./media/a5/part1/0_stylegan_z_0.1_mean_l1_1000.png" height="200px">
<figcaption>0_stylegan_z_0.1_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_stylegan_z_0.1_no_mean_l1_1000.png" height="200px">
<figcaption>0_stylegan_z_0.1_no_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_stylegan_z_0.1_no_mean_l2_1000.png" height="200px">
<figcaption>0_stylegan_z_0.1_no_mean_l2_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/0_stylegan_w+_0.1_no_mean_l1_1000.png" height="200px">
<figcaption>0_stylegan_w+_0.1_no_mean_l1_1000</figcaption>
</figure>
</div>
</div>
<p class="col-xs-12 mr-4">
Considering the results above, we decided to use mean w+ samping, stylegan2, l1 norm, l1 weight of 10, perc weight of 0.1 as hyper-parameter to generate the cat. Some of the examples are shown below:
</p>
<div class="row mr-2">
<div class="text-center images col-12">
<figure>
<img src="./media/a5/part1/1_data.png" height="200px">
<figcaption>Source 1</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/1_stylegan_w+_0.1_mean_l1_1000.png" height="200px">
<figcaption>1_stylegan_w+_0.1_mean_l1_1000</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/2_data.png" height="200px">
<figcaption>Source 2</figcaption>
</figure>
<figure>
<img src="./media/a5/part1/2_stylegan_w+_0.1_mean_l1_1000.png" height="200px">
<figcaption>2_stylegan_w+_0.1_mean_l1_1000</figcaption>
</figure>
</div>
</div>
<div class="row mr-2">
<div class="text-center images col-12">
<figure>