Skip to content

Commit

Permalink
update quiz2 and 3 readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
yamaguchi1024 committed Nov 10, 2024
1 parent 4d06b84 commit 8e0b2de
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 51 deletions.
50 changes: 3 additions & 47 deletions examples/quiz2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This quiz is about loop fission bugs and debugging via printing cursors.

## Incorrect output (compiler error)
As written, the schedule has a bug which attempts to incorrectly fission a loop.
```
```bash
Traceback (most recent call last):
File "/home/yuka/.local/bin/exocc", line 8, in <module>
sys.exit(main())
Expand Down Expand Up @@ -33,7 +33,7 @@ exo.rewrite.new_eff.SchedulingError: <<<unknown directive>>>: Will not fission h
## Correct Output
The correct output will divide the computation into individual, vectorizable loops.
```
```python
def scaled_add_scheduled(N: size, a: f32[N] @ DRAM, b: f32[N] @ DRAM,
c: f32[N] @ DRAM):
assert N % 8 == 0
Expand Down Expand Up @@ -64,57 +64,13 @@ def scaled_add_scheduled(N: size, a: f32[N] @ DRAM, b: f32[N] @ DRAM,
## Solution
To understand the bug, let's first try printing right before the error.
Put `print(vector_assign.after())` after line 37.
```
for io in seq(0, N / 8):
vec: R[8] @ DRAM
for ii in seq(0, 8):
vec_1: R @ DRAM
vec_1 = 2
[GAP - After]
...
```
This is showing the code is trying to fission at `[GAP - After]` location, which is unsafe because the `vec_1: R` allocation is in the `ii` loop and before the fissioning point, which means if `vec_1` is used after the fission point that'll be an error.

Change
```python
for i in range(num_vectors):
vector_reg = p.find(f"vec: _ #{i}")
p = expand_dim(p, vector_reg, 8, "ii")
p = lift_alloc(p, vector_reg)

vector_assign = p.find(f"vec = _ #{i}")
p = fission(p, vector_assign.after())
```

to
```python
for i in range(num_vectors):
vector_reg = p.find(f"vec: _ #{i}")
p = expand_dim(p, vector_reg, 8, "ii")
p = lift_alloc(p, vector_reg)

for i in range(num_vectors):
vector_assign = p.find(f"vec = _ #{i}")
p = fission(p, vector_assign.after())
```

So that you lift all the allocations out of the loop before fissioning.

Here is the rewritten version with improved clarity and GitHub Markdown syntax:

## Solution

To understand the bug, let's first try printing right before the error. Add the following line after line 37:

```python
print(vector_assign.after())
```
This will output:

```
```python
for io in seq(0, N / 8):
vec: R[8] @ DRAM
for ii in seq(0, 8):
Expand Down
36 changes: 32 additions & 4 deletions examples/quiz3/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# Quiz3!!

This quiz explores fixing subtle cursor navigation bugs.

## Correct Output
This code makes the optimization of shrinking the `blur_x` memory allocation from (H+2, W) to (34, 256). Since the code has been tiled, we don't need to store the entire intermediate `blur_x` buffer in memory. Instead, we can just reuse the same intermediate buffer for each tile.

To do so, the schedule tries to sink the allocation within the tile, reduce the memory size to the bare minimum necessary for computing that tile, and then lift the allocation back up to the top level scope.
```
```python
def tile_and_fused_blur(W: size, H: size, blur_y: ui16[H, W] @ DRAM,
inp: ui16[H + 2, W + 2] @ DRAM):
assert H % 32 == 0
Expand All @@ -30,8 +32,8 @@ def tile_and_fused_blur(W: size, H: size, blur_y: ui16[H, W] @ DRAM,
```

## Incorrect Output
This output is partially correct: it manages to reduce the height dimension from H+2 to 34. However, it wasn't able to reduce the memory in the width direction.
```
This output is partially correct: it manages to reduce the height dimension from `H+2` to `34`. However, it fails to reduce the memory usage in the width direction.
```python
def tile_and_fused_blur(W: size, H: size, blur_y: ui16[H, W] @ DRAM,
inp: ui16[H + 2, W + 2] @ DRAM):
assert H % 32 == 0
Expand All @@ -57,5 +59,31 @@ def tile_and_fused_blur(W: size, H: size, blur_y: ui16[H, W] @ DRAM,
---

## Solution
Change `loops = []` to `loops = [cursor]`.

To understand the bug, let's insert print statements in these places:

```python
print(xo_loop)
loops_to_lower_allocation_into = get_loops_at_or_above(xo_loop)
for i, loop in enumerate(loops_to_lower_allocation_into):
print(i, loop)
...
```

The `xo_loop` points to:
```python
for yo in seq(0, H / 32):
for xo in seq(0, W / 256): # <-- NODE
...
```

And the first (and only) iteration of the `loop` points to:
```python
for yo in seq(0, H / 32): # <-- NODE
for xo in seq(0, W / 256):
...
```

This reveals that the implementation of `get_loops_at_or_above` has a bug because it only contains "loops above" the `xo_loop` (which is `yo` loop), not including the `xo_loop` itself.

To fix this bug, change `loops = []` to `loops = [cursor]` in the implementation of `get_loops_at_or_above`.

0 comments on commit 8e0b2de

Please sign in to comment.