From 8e0b2de8daf5802920d47a80f863e147ff7d5090 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sun, 10 Nov 2024 17:55:39 -0500 Subject: [PATCH] update quiz2 and 3 readmes --- examples/quiz2/README.md | 50 +++------------------------------------- examples/quiz3/README.md | 36 +++++++++++++++++++++++++---- 2 files changed, 35 insertions(+), 51 deletions(-) diff --git a/examples/quiz2/README.md b/examples/quiz2/README.md index 24490030..c0bf51e2 100644 --- a/examples/quiz2/README.md +++ b/examples/quiz2/README.md @@ -4,7 +4,7 @@ This quiz is about loop fission bugs and debugging via printing cursors. ## Incorrect output (compiler error) As written, the schedule has a bug which attempts to incorrectly fission a loop. -``` +```bash Traceback (most recent call last): File "/home/yuka/.local/bin/exocc", line 8, in sys.exit(main()) @@ -33,7 +33,7 @@ exo.rewrite.new_eff.SchedulingError: <<>>: Will not fission h ## Correct Output The correct output will divide the computation into individual, vectorizable loops. -``` +```python def scaled_add_scheduled(N: size, a: f32[N] @ DRAM, b: f32[N] @ DRAM, c: f32[N] @ DRAM): assert N % 8 == 0 @@ -64,57 +64,13 @@ def scaled_add_scheduled(N: size, a: f32[N] @ DRAM, b: f32[N] @ DRAM, ## Solution -To understand the bug, let's first try printing right before the error. -Put `print(vector_assign.after())` after line 37. -``` - for io in seq(0, N / 8): - vec: R[8] @ DRAM - for ii in seq(0, 8): - vec_1: R @ DRAM - vec_1 = 2 - [GAP - After] - ... -``` -This is showing the code is trying to fission at `[GAP - After]` location, which is unsafe because the `vec_1: R` allocation is in the `ii` loop and before the fissioning point, which means if `vec_1` is used after the fission point that'll be an error. - -Change -```python - for i in range(num_vectors): - vector_reg = p.find(f"vec: _ #{i}") - p = expand_dim(p, vector_reg, 8, "ii") - p = lift_alloc(p, vector_reg) - - vector_assign = p.find(f"vec = _ #{i}") - p = fission(p, vector_assign.after()) -``` - -to -```python - for i in range(num_vectors): - vector_reg = p.find(f"vec: _ #{i}") - p = expand_dim(p, vector_reg, 8, "ii") - p = lift_alloc(p, vector_reg) - - for i in range(num_vectors): - vector_assign = p.find(f"vec = _ #{i}") - p = fission(p, vector_assign.after()) -``` - -So that you lift all the allocations out of the loop before fissioning. - -Here is the rewritten version with improved clarity and GitHub Markdown syntax: - -## Solution - To understand the bug, let's first try printing right before the error. Add the following line after line 37: - ```python print(vector_assign.after()) ``` This will output: - -``` +```python for io in seq(0, N / 8): vec: R[8] @ DRAM for ii in seq(0, 8): diff --git a/examples/quiz3/README.md b/examples/quiz3/README.md index d7d9948a..82fc54b2 100644 --- a/examples/quiz3/README.md +++ b/examples/quiz3/README.md @@ -1,10 +1,12 @@ # Quiz3!! +This quiz explores fixing subtle cursor navigation bugs. + ## Correct Output This code makes the optimization of shrinking the `blur_x` memory allocation from (H+2, W) to (34, 256). Since the code has been tiled, we don't need to store the entire intermediate `blur_x` buffer in memory. Instead, we can just reuse the same intermediate buffer for each tile. To do so, the schedule tries to sink the allocation within the tile, reduce the memory size to the bare minimum necessary for computing that tile, and then lift the allocation back up to the top level scope. -``` +```python def tile_and_fused_blur(W: size, H: size, blur_y: ui16[H, W] @ DRAM, inp: ui16[H + 2, W + 2] @ DRAM): assert H % 32 == 0 @@ -30,8 +32,8 @@ def tile_and_fused_blur(W: size, H: size, blur_y: ui16[H, W] @ DRAM, ``` ## Incorrect Output -This output is partially correct: it manages to reduce the height dimension from H+2 to 34. However, it wasn't able to reduce the memory in the width direction. -``` +This output is partially correct: it manages to reduce the height dimension from `H+2` to `34`. However, it fails to reduce the memory usage in the width direction. +```python def tile_and_fused_blur(W: size, H: size, blur_y: ui16[H, W] @ DRAM, inp: ui16[H + 2, W + 2] @ DRAM): assert H % 32 == 0 @@ -57,5 +59,31 @@ def tile_and_fused_blur(W: size, H: size, blur_y: ui16[H, W] @ DRAM, --- ## Solution -Change `loops = []` to `loops = [cursor]`. +To understand the bug, let's insert print statements in these places: + +```python +print(xo_loop) +loops_to_lower_allocation_into = get_loops_at_or_above(xo_loop) +for i, loop in enumerate(loops_to_lower_allocation_into): + print(i, loop) + ... +``` + +The `xo_loop` points to: +```python +for yo in seq(0, H / 32): + for xo in seq(0, W / 256): # <-- NODE + ... +``` + +And the first (and only) iteration of the `loop` points to: +```python +for yo in seq(0, H / 32): # <-- NODE + for xo in seq(0, W / 256): + ... +``` + +This reveals that the implementation of `get_loops_at_or_above` has a bug because it only contains "loops above" the `xo_loop` (which is `yo` loop), not including the `xo_loop` itself. + +To fix this bug, change `loops = []` to `loops = [cursor]` in the implementation of `get_loops_at_or_above`.