From d56c09fcaf2002ef469f7209be2d51cdc41b37b7 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sun, 20 Oct 2024 21:45:04 -0400 Subject: [PATCH 01/26] update README --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 25c05b61..6d080c41 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ If you're just using Exo, install it using `pip`: ```sh $ pip install exo-lang ``` +In case of `ModuleNotFoundError: No module named 'attrs'` please upgrade your attrs module by `pip install --upgrade attrs`. ## Compile Exo From 43c7d6713ead543ac5158eb992211a28cee477a8 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Mon, 21 Oct 2024 14:13:37 -0400 Subject: [PATCH 02/26] update documentations --- docs/API.md | 12 +- docs/configurations.md | 3 + docs/externs.md | 171 ++++++++++++++++ docs/instructions.md | 240 +++++++++++++++++++++++ docs/memories.md | 159 +++++++++++++++ docs/object_code.md | 279 +++++++++++++++++++++++++++ docs/{ => primitives}/backend_ops.md | 0 docs/{ => primitives}/buffer_ops.md | 0 docs/{ => primitives}/config_ops.md | 0 docs/{ => primitives}/loop_ops.md | 0 docs/{ => primitives}/other_ops.md | 0 docs/{ => primitives}/subproc_ops.md | 0 12 files changed, 858 insertions(+), 6 deletions(-) create mode 100644 docs/configurations.md create mode 100644 docs/externs.md create mode 100644 docs/instructions.md create mode 100644 docs/memories.md create mode 100644 docs/object_code.md rename docs/{ => primitives}/backend_ops.md (100%) rename docs/{ => primitives}/buffer_ops.md (100%) rename docs/{ => primitives}/config_ops.md (100%) rename docs/{ => primitives}/loop_ops.md (100%) rename docs/{ => primitives}/other_ops.md (100%) rename docs/{ => primitives}/subproc_ops.md (100%) diff --git a/docs/API.md b/docs/API.md index 64b267e9..62c9467b 100644 --- a/docs/API.md +++ b/docs/API.md @@ -37,9 +37,9 @@ Cursors can be obtained by querying patterns on a procedure. All the Cursor rela We have classified scheduling primitives into six categories. Here are the links to each: -- [Buffer Transformations](buffer_ops.md) -- [Loop and Scope Transformations](loop_ops.md) -- [Configuration States](config_ops.md) -- [Subprocedure Operations](subproc_ops.md) -- [Memory, Precision, and Parallelism Transformations](backend_ops.md) -- [Other Operations](other_ops.md) +- [Buffer Transformations](primitives/buffer_ops.md) +- [Loop and Scope Transformations](primitives/loop_ops.md) +- [Configuration States](primitives/config_ops.md) +- [Subprocedure Operations](primitives/subproc_ops.md) +- [Memory, Precision, and Parallelism Transformations](primitives/backend_ops.md) +- [Other Operations](primitives/other_ops.md) diff --git a/docs/configurations.md b/docs/configurations.md new file mode 100644 index 00000000..a683cefe --- /dev/null +++ b/docs/configurations.md @@ -0,0 +1,3 @@ +# Configurations + +Debug https://github.com/exo-lang/exo/issues/732 first... diff --git a/docs/externs.md b/docs/externs.md new file mode 100644 index 00000000..cc4dd933 --- /dev/null +++ b/docs/externs.md @@ -0,0 +1,171 @@ +# Externs + +Externs in Exo provide a mechanism to interface with external functions and libraries directly from your Exo code. By defining custom extern functions, you can extend the capabilities of Exo and leverage existing code written in other languages like C or C++. Externs can be used as expressions in your code, particularly on the right-hand side (RHS) of assignment and reduction statements. + +## Defining Externs in User Code + +Extern functions are defined by subclassing the `Extern` class provided by Exo. This allows you to specify how the extern function should behave, including type checking, compilation, and any global code it might require. + +### Step-by-Step Guide + +#### 1. Import the Extern Class + +Before you can define an extern function, you need to import the `Extern` class and the `_EErr` exception from `exo.core.extern`. + +```python +from exo.core.extern import Extern, _EErr +``` + +- `Extern`: The base class for creating custom extern functions. +- `_EErr`: An exception class used for error handling during type checking. + +#### 2. Subclass the Extern Class + +Create a new class that inherits from `Extern`. This class represents your custom extern function. + +```python +class _Sin(Extern): + # Implementation details will go here +``` + +#### 3. Implement Required Methods + +Your subclass must implement several methods to define the behavior of the extern function. + +##### `__init__(self)` + +Initialize your extern function with its name. + +```python +def __init__(self): + super().__init__("sin") +``` + +- `"sin"`: The name of the external function as it will appear in the generated code. + +##### `typecheck(self, args)` + +Define how the function checks the types of its arguments. + +```python +def typecheck(self, args): + if len(args) != 1: + raise _EErr(f"expected 1 argument, got {len(args)}") + + arg_type = args[0].type + if not arg_type.is_real_scalar(): + raise _EErr( + f"expected argument to be a real scalar value, but got type {arg_type}" + ) + return arg_type +``` + +- Checks that there is exactly one argument. +- Ensures the argument is a real scalar type (e.g., `float`, `double`). +- Returns the type of the argument as the return type of the function. + +##### `compile(self, args, prim_type)` + +Define how the function is compiled into target code. + +```python +def compile(self, args, prim_type): + return f"sin(({prim_type}){args[0]})" +``` + +- Generates the code that calls the external function, ensuring proper casting to the primitive type. + +##### `globl(self, prim_type)` + +Provide any global code or headers needed. + +```python +def globl(self, prim_type): + return "#include " +``` + +- Includes necessary headers required for the external function (e.g., `` for mathematical functions). + +##### `interpret(self, args)` (Optional) + +Define how the function behaves during interpretation (useful for testing or interactive sessions). + +```python +def interpret(self, args): + import math + return math.sin(args[0]) +``` + +- Allows the extern function to be executed in environments that support interpretation. + +#### 4. Instantiate the Extern Function + +Create an instance of your extern class to make it usable in your code. + +```python +sin = _Sin() +``` + +- `sin` now represents the extern function and can be used like any other expression in Exo. + +## Using Externs as Expressions + +Externs can be used as expressions on the RHS of assignment and reduction statements. This allows you to incorporate external functions seamlessly into your Exo computations. + +### Example: Using `sin` in an Expression + +Here's a complete example demonstrating how to define and use the `sin` extern function within an expression. + +```python +from __future__ import annotations +from exo import * +from exo.core.extern import Extern, _EErr + +class _Sin(Extern): + def __init__(self): + super().__init__("sin") + + def typecheck(self, args): + if len(args) != 1: + raise _EErr(f"expected 1 argument, got {len(args)}") + + arg_type = args[0].type + if not arg_type.is_real_scalar(): + raise _EErr( + f"expected argument to be a real scalar value, but got type {arg_type}" + ) + return arg_type + + def compile(self, args, prim_type): + return f"sin(({prim_type}){args[0]})" + + def globl(self, prim_type): + return "#include " + + def interpret(self, args): + import math + return math.sin(args[0]) + +# Instantiate the extern function +sin = _Sin() + +# Define an Exo procedure using the extern function in an expression +@proc +def foo(x: f32): + x = sin(x) * 3.0 + +print(foo) +``` + +### Output + +When you run the code above with `exocc`, the generated C code will be: +```c +#include +// foo( +// x : f32 @DRAM +// ) +void foo( void *ctxt, float* x ) { + *x = sin((float)*x) * 3.0f; +} +``` diff --git a/docs/instructions.md b/docs/instructions.md new file mode 100644 index 00000000..f91d3e05 --- /dev/null +++ b/docs/instructions.md @@ -0,0 +1,240 @@ +# External Instruction Definitions + +Exo allows users to define custom hardware instructions within their code using the `@proc` annotation. These user-defined instructions can be leveraged during the scheduling process to replace specific code fragments with calls to hardware-optimized instructions. This feature enables fine-grained control over code optimization and hardware acceleration, making it easier to target specific architectures like SIMD units or custom accelerators. + +## Overview + +- **Custom Instructions**: Define hardware-specific instructions as procedures using the `@proc` decorator. +- **Replacement**: Use the `replace` primitive to substitute code fragments with calls to these instructions. +- **Pattern Matching**: Exo uses pattern matching to unify code fragments with instruction definitions. +- **Code Generation**: Custom instructions can emit arbitrary C code, including inline assembly, with placeholders for arguments. + +## Defining Custom Instructions + +Custom instructions are defined as procedures annotated with `@proc` and further decorated with `@instr`. The `@instr` decorator allows you to specify the C code to be emitted when the instruction is called, including placeholders for arguments. + +### Syntax + +```python +@instr("C code with placeholders") +@proc +def instruction_name(args): + # Specification of the instruction's behavior +``` + +- **`@instr`**: Decorator that specifies the C code to emit. +- **`@proc`**: Indicates that the function is an Exo procedure. +- **`instruction_name`**: The name of your custom instruction. +- **`args`**: Arguments to the instruction. +- **Specification**: A high-level description of what the instruction does, used for pattern matching. + +### Placeholders in C Code + +In the string provided to `@instr`, you can include placeholders wrapped in `{}`. These placeholders will be replaced with the names of the arguments when the code is compiled. + +### Example: Defining a NEON Load Instruction + +Below is an example of defining a NEON load instruction that loads four `f32` values into NEON memory. + +```python +from exo import * +from exo.core.proc import instr + +@instr("{dst_data} = vld1q_f32(&{src_data});") +@proc +def neon_vld_4xf32(dst: [f32][4] @ Neon, src: [f32][4] @ DRAM): + assert stride(src, 0) == 1 + assert stride(dst, 0) == 1 + + for i in seq(0, 4): + dst[i] = src[i] +``` + +#### Explanation + +- **`@instr("{dst_data} = vld1q_f32(&{src_data});")`**: Specifies the C code to emit when this instruction is called. + - `{dst_data}` and `{src_data}` are placeholders that will be replaced with the actual argument names. +- **`dst: [f32][4] @ Neon`**: Declares `dst` as a 4-element array of `f32` in `Neon` memory. +- **`src: [f32][4] @ DRAM`**: Declares `src` as a 4-element array of `f32` in `DRAM`. +- **Assertions**: Ensure that the strides of `src` and `dst` are 1 for correct memory access. +- **Loop**: The loop specifies the semantics of the instruction, copying elements from `src` to `dst`. + +### Defining the Memory Annotation `Neon` + +The `Neon` memory type can be defined similarly to how custom memories are defined, as explained in [memories.md](memories.md). + +```python +class Neon(Memory): + @classmethod + def global_(cls): + return "#include " + + # Implement other required methods +``` + +## Using Custom Instructions + +Once you've defined a custom instruction, you can use it to replace code fragments in your procedures. + +### Step 1: Define Your Procedure + +Define your Exo procedure as usual. + +```python +@proc +def foo(src: [f32][4] @ DRAM, dst: [f32][4] @ Neon): + for i in seq(0, 4): + dst[i] = src[i] +``` + +### Step 2: Use `replace` to Substitute the Instruction + +Use the `replace` primitive to substitute the loop with the custom instruction. + +```python +# Instantiate the procedure +p = foo + +# Replace the loop with the custom instruction +p = replace(p, "for i in _:_", neon_vld_4xf32) +``` + +#### Explanation + +- **`replace(p, "for i in _:_", neon_vld_4xf32)`**: + - **`p`**: The procedure in which to perform the replacement. + - **`"for i in _:_"`**: A cursor pointing to the loop to replace. + - **`neon_vld_4xf32`**: The instruction to replace the loop with. + +### How `replace` Works + +- **Pattern Matching**: Exo attempts to unify the code fragment (the loop) with the body of `neon_vld_4xf32`. +- **Automatic Argument Determination**: If successful, Exo replaces the fragment with a call to `neon_vld_4xf32`, automatically determining the correct arguments. +- **Semantics Preservation**: The specification in the instruction's body ensures that the replacement is semantically correct. + +### Step 3: Compile and Generate Code + +Compile your procedure to generate the optimized C code. + +```python +print(p) +``` + +### Generated C Code + +```c +void foo(float src[4], float32x4_t dst) { + dst = vld1q_f32(&src[0]); +} +``` + +- **`dst = vld1q_f32(&src[0]);`**: The custom instruction is emitted as specified in the `@instr` decorator, with placeholders replaced. + +## Understanding the Magic + +By defining the behavior of hardware instructions in Python using Exo procedures, you can express the semantics of your accelerator or specialized hardware. The `replace` primitive allows Exo to reason about whether it's safe to offload certain computations to hardware instructions based on their specifications. + +- **No Compiler Backend Needed**: The heavy lifting is done within Exo, eliminating the need for a separate compiler backend. +- **Semantics Encoding**: The instruction's body acts as a specification, encoding its semantics for Exo's pattern matching. +- **Flexible and Extensible**: Users can define any instruction and specify how it should be matched and replaced. + +## The `replace` Primitive + +The `replace` primitive is used to substitute a fragment of code within a procedure with a call to another procedure (e.g., a custom instruction). + +### Syntax + +```python +replace(proc, cursor_path, subproc) +``` + +- **`proc`**: The procedure containing the code to be replaced. +- **`cursor_path`**: A string or cursor pointing to the code fragment. +- **`subproc`**: The procedure whose body will replace the code fragment. + +### Documentation + +The `replace` primitive is documented in [primitives/subproc_ops.md](primitives/subproc_ops.md). + +## Practical Example: RISC-V Matrix Multiply + +### Step 1: Define the Instruction + +```python +@instr("{dst} = asm_rvm_macc({src_a}, {src_b}, {dst});") +@proc +def rvm_macc(dst: f32 @ RVM, src_a: f32 @ RVM, src_b: f32 @ RVM): + dst += src_a * src_b +``` + +- **`asm_rvm_macc`**: Hypothetical assembly function for RISC-V multiply-accumulate. +- **Specification**: The procedure specifies that `dst += src_a * src_b`. + +### Step 2: Use the Instruction in a Procedure + +```python +@proc +def matmul_rvm(A: f32[M, K], B: f32[K, N], C: f32[M, N]): + for i in seq(0, M): + for j in seq(0, N): + for k in seq(0, K): + C[i, j] += A[i, k] * B[k, j] +``` + +### Step 3: Optimize Using `replace` + +```python +p = matmul_rvm + +# Apply transformations to expose the computation pattern +... + +# Replace the innermost loop with the custom instruction +p = replace(p, "for k in _:_", rvm_macc) +``` + +### Step 4: Compile and Generate Code + +```python +print(p) +``` + +### Generated C Code + +```c +void matmul_rvm(float A[M][K], float B[K][N], float C[M][N]) { + for (int i = 0; i < M; i++) { + for (int j = 0; j < N; j++) { + C[i][j] = asm_rvm_macc(A[i][k], B[k][j], C[i][j]); + } + } +} +``` + +## Further Reading and Examples + +- **RVM Tutorial**: [https://exo-lang.dev/tutorial.html](https://exo-lang.dev/tutorial.html) +- **Running Code Examples**: [examples/rvm_conv1d/exo/conv1d.py](https://github.com/exo-lang/exo/blob/main/examples/rvm_conv1d/exo/conv1d.py) + +## Tips and Best Practices + +- **Define Clear Specifications**: Ensure that the body of your instruction accurately represents its semantics. +- **Use Assertions**: Include assertions in your instruction definitions to enforce constraints and ensure correctness. +- **Leverage Memory Annotations**: Use custom memory annotations to model hardware-specific memory behaviors (e.g., `Neon`, `RVM`). +- **Pattern Matching**: Structure your code to facilitate pattern matching with instruction definitions. +- **Test Thoroughly**: Verify that replacements are correct and that the generated code behaves as expected. + +## Conclusion + +By defining custom instructions and using the `replace` primitive, Exo provides a powerful mechanism to optimize code for specific hardware architectures directly within the user code. This approach offers flexibility and control, enabling developers to harness hardware acceleration without the need for extensive compiler support. + +**Key Takeaways**: + +- **Custom Instructions**: Define hardware-specific instructions with precise semantics. +- **Pattern Matching**: Use Exo's pattern matching to replace code fragments safely. +- **Code Generation**: Emit custom C code, including inline assembly, tailored to your hardware. +- **Optimization**: Optimize existing code by replacing computational patterns with hardware-accelerated instructions. + +--- + +**Note**: The examples provided are illustrative and may need adjustments to fit your specific hardware and use cases. Ensure that any external functions or assembly code used in the `@instr` decorator are properly defined and compatible with your target architecture. diff --git a/docs/memories.md b/docs/memories.md new file mode 100644 index 00000000..a48fe9a2 --- /dev/null +++ b/docs/memories.md @@ -0,0 +1,159 @@ +# External Memory Definitions + +Exo allows users to define custom memory types external to the compiler. This feature enables modeling of specialized memory systems, such as vector machines and hardware accelerator memories, directly within your Exo code. By defining custom memories, you can optimize your programs for specific hardware architectures and achieve better performance. + +## Overview + +- **Custom Memories**: Define your own memory types by subclassing the `Memory` class. +- **Usage**: Use custom memories as annotations in your Exo code or apply them during scheduling. + +## Defining Custom Memories + +To define a custom memory, you need to create a class that inherits from `Memory` and implement the required methods. Below is an example of defining an `AVX512` memory, which models the AVX-512 vector registers. + +### Example: Defining AVX512 Memory + +```python +class AVX512(Memory): + @classmethod + def global_(cls): + return "#include " + + @classmethod + def can_read(cls): + return False + + @classmethod + def alloc(cls, new_name, prim_type, shape, srcinfo): + if not shape: + raise MemGenError(f"{srcinfo}: AVX512 vectors are not scalar values") + if not prim_type == "float": + raise MemGenError(f"{srcinfo}: AVX512 vectors must be f32 (for now)") + if not _is_const_size(shape[-1], 16): + raise MemGenError(f"{srcinfo}: AVX512 vectors must be 16-wide") + shape = shape[:-1] + if shape: + result = f'__m512 {new_name}[{"][".join(map(str, shape))}];' + else: + result = f"__m512 {new_name};" + return result + + @classmethod + def free(cls, new_name, prim_type, shape, srcinfo): + return "" + + @classmethod + def window(cls, basetyp, baseptr, indices, strides, srcinfo): + assert strides[-1] == "1" + idxs = indices[:-1] or "" + if idxs: + idxs = "[" + "][".join(idxs) + "]" + return f"{baseptr}{idxs}" +``` + +#### Explanation of Methods + +- **`global_(cls)`**: Returns any global code or headers needed. Here, it includes the AVX-512 intrinsic header. + + ```python + @classmethod + def global_(cls): + return "#include " + ``` + +- **`can_read(cls)`**: Controls whether the memory can be read directly. Setting it to `False` means you cannot read/write directly to this memory using standard array access. + + ```python + @classmethod + def can_read(cls): + return False + ``` + +- **`alloc(cls, new_name, prim_type, shape, srcinfo)`**: Defines how memory allocation is handled. For `AVX512`, it ensures that the allocated memory represents 16-wide vectors of `float` type. + + ```python + @classmethod + def alloc(cls, new_name, prim_type, shape, srcinfo): + # Validation checks and allocation code + ``` + +- **`free(cls, new_name, prim_type, shape, srcinfo)`**: Handles memory deallocation. For `AVX512`, no action is needed. + + ```python + @classmethod + def free(cls, new_name, prim_type, shape, srcinfo): + return "" + ``` + +- **`window(cls, basetyp, baseptr, indices, strides, srcinfo)`**: Defines how to access elements in the memory. + + ```python + @classmethod + def window(cls, basetyp, baseptr, indices, strides, srcinfo): + # Windowing logic for memory access + ``` + +## Understanding `can_read` + +The `can_read` method controls whether direct array access is allowed for the memory type. When `can_read` is set to `False`, you cannot read or write to the memory using standard array indexing in Exo or the generated C code. This models hardware that requires special instructions for memory access, such as vector registers. + +### Invalid Usage + +Attempting to read or write directly results in an error. + +```python +x: f32[16] @ AVX512 +x[0] = 3.0 # Invalid when can_read() is False +``` + +### Valid Usage + +To interact with the memory, you must use specific instructions or operations designed for that memory type (e.g., AVX-512 intrinsics). + +```python +# Use AVX-512 instructions to manipulate x +x: f32[16] @ AVX512 +mm512_loadu_ps(x, inp[16*i : 16*i+16]) +``` +- **Instructions Documentation**: [instructions.md](instructions.md) + +## Using Custom Memories + +There are two primary ways to use custom memories in Exo: + +1. **Direct Annotation**: Annotate variables with the custom memory type using the `@` symbol. +2. **Scheduling Primitive**: Change the memory annotation during scheduling using `set_memory`. + +### 1. Direct Annotation + +Annotate buffers at the time of declaration. +```python +from exo import * +from exo.libs.memories import AVX512 + +@proc +def foo(x: f32[16] @ AVX512): + y: f32[16] @ AVX512 + # Function body +``` + +- **`x: f32[16] @ AVX512`**: Declares `x` as a 16-element array of `f32` stored in `AVX512` memory. +- **`y: f32[16] @ AVX512`**: Similarly declares `y` in `AVX512` memory. + +### 2. Changing Memory During Scheduling + +Use the `set_memory` primitive to change the memory annotation of a variable during scheduling. +- **`set_memory(p, "C", AVX512)`**: Changes the memory of variable `C` in procedure `p` to `AVX512`. +- This is common when optimizing existing code for specific hardware. + +#### Documentation for `set_memory` + +The `set_memory` primitive is documented in [primitives/buffer_ops.md](primitives/buffer_ops.md). + + +## Additional Examples + +- **Memory Definitions**: More examples of custom memory definitions can be found in [src/exo/libs/memories.py](https://github.com/exo-lang/exo/blob/main/src/exo/libs/memories.py). +- **Usage in Applications**: Examples of using custom memories in real applications are available in [examples/rvm_conv1d/exo/conv1d.py](https://github.com/exo-lang/exo/blob/main/examples/rvm_conv1d/exo/conv1d.py). + + diff --git a/docs/object_code.md b/docs/object_code.md new file mode 100644 index 00000000..a3ada455 --- /dev/null +++ b/docs/object_code.md @@ -0,0 +1,279 @@ +# Exo Object Code Syntax + +Exo is a programming language designed for performance-critical code, providing fine-grained control over code generation and optimization. In Exo, object code can be defined using Python-like syntax with specific annotations and constructs to model low-level programming concepts. + +This documentation explains Exo's object code syntax using the following example of a 1D convolution operation: + +```python +@proc +def generic_conv1d( + data: i32[IC, N] @ DRAM, + kernels: i32[OC, IC, W] @ DRAM, + out: i32[OC, N] @ DRAM, +): + # Perform the convolution + for i in seq(0, OC): + for j in seq(0, N): + # Zero out the output memory + out[i, j] = 0.0 + for c in seq(0, IC): + for r in seq(0, W): + y: i32 + if j + r < N: + y = data[c, j + r] + else: + y = 0 + out[i, j] += kernels[i, c, r] * y +``` + +## Table of Contents + +- [Annotations and Decorators](#annotations-and-decorators) + - [`@proc` Decorator](#proc-decorator) + - [Type and Memory Annotations](#type-and-memory-annotations) +- [Procedure Arguments](#procedure-arguments) +- [Variable Declarations](#variable-declarations) +- [Memory Spaces](#memory-spaces) +- [Loops](#loops) + - [`for` Loop Syntax](#for-loop-syntax) +- [Conditional Statements](#conditional-statements) +- [Operations and Assignments](#operations-and-assignments) +- [Understanding the Example](#understanding-the-example) +- [Conclusion](#conclusion) + +## Annotations and Decorators + +### `@proc` Decorator + +The `@proc` decorator is used to define an Exo procedure (analogous to a function in other programming languages). It indicates that the following function definition should be treated as Exo object code, which can be further optimized and transformed. + +```python +@proc +def function_name(arguments): + # Function body +``` + +### Type and Memory Annotations + +In Exo, types and memory spaces are explicitly annotated to provide precise control over data representation and placement. The syntax for annotations is: + +```python +name: type[size] @ memory +``` + +- **`name`**: The variable name. +- **`type`**: The data type (e.g., `i32`, `f32`). +- **`[size]`**: The dimensions of the array (optional for scalars). +- **`@ memory`**: The memory space where the variable resides. + +## Procedure Arguments + +Procedure arguments are declared with their types, sizes, and memory spaces. They can have dependent sizes based on other arguments. + +Example from the code: + +```python +data: i32[IC, N] @ DRAM +``` + +- **`data`**: The name of the argument. +- **`i32`**: The data type (32-bit integer). +- **`[IC, N]`**: A 2D array with dimensions `IC` and `N`. +- **`@ DRAM`**: Specifies that `data` resides in DRAM memory. + +## Variable Declarations + +Variables within the procedure are declared similarly to arguments but without the `@` annotation if they reside in default memory. + +Example: + +```python +y: i32 +``` + +- **`y`**: The variable name. +- **`i32`**: The data type (32-bit integer). +- **No memory annotation**: Defaults to a standard memory space (e.g., registers). + +## Memory Spaces + +Memory spaces in Exo are used to model different hardware memory regions, such as DRAM, caches, or specialized memories. The `@` symbol is used to specify the memory space. + +Common memory spaces: + +- **`@ DRAM`**: Main memory. +- **`@ SRAM`**: Static RAM or cache. +- **`@ Registers`**: CPU registers. + +Example: + +```python +out: i32[OC, N] @ DRAM +``` + +- **`out`**: Output array. +- **Resides in DRAM memory.** + +## Loops + +### `for` Loop Syntax + +Exo uses explicit loop constructs to model iteration. The `for` loop syntax is: + +```python +for loop_variable in seq(start, end): + # Loop body +``` + +- **`loop_variable`**: The loop counter variable. +- **`seq(start, end)`**: Generates a sequence from `start` to `end - 1`. + +Example from the code: + +```python +for i in seq(0, OC): + # Iterates i from 0 to OC - 1 +``` + +## Conditional Statements + +Conditional logic is expressed using `if` and `else` statements. + +Syntax: + +```python +if condition: + # True branch +else: + # False branch +``` + +Example: + +```python +if j + r < N: + y = data[c, j + r] +else: + y = 0 +``` + +- Checks if `j + r` is less than `N`. +- Assigns `y` accordingly. + +## Operations and Assignments + +- **Assignment (`=`)**: Assigns a value to a variable. + + ```python + y = data[c, j + r] + ``` + +- **In-place Addition (`+=`)**: Adds a value to a variable and stores the result back. + + ```python + out[i, j] += kernels[i, c, r] * y + ``` + +- **Array Access**: Uses square brackets to access array elements. + + ```python + data[c, j + r] + ``` + +## Understanding the Example + +Let's break down the example code step by step. + +### Procedure Definition + +```python +@proc +def generic_conv1d( + data: i32[IC, N] @ DRAM, + kernels: i32[OC, IC, W] @ DRAM, + out: i32[OC, N] @ DRAM, +): +``` + +- **`generic_conv1d`**: The procedure name. +- **Arguments**: + - **`data`**: Input data array of shape `[IC, N]` in DRAM. + - **`kernels`**: Kernel weights array of shape `[OC, IC, W]` in DRAM. + - **`out`**: Output data array of shape `[OC, N]` in DRAM. +- **Variables**: + - **`IC`**, **`OC`**, **`N`**, **`W`**: Dimensions, assumed to be defined elsewhere or passed as parameters. + +### Loop Nest + +```python +for i in seq(0, OC): + for j in seq(0, N): + # Zero out the output memory + out[i, j] = 0.0 + for c in seq(0, IC): + for r in seq(0, W): + y: i32 + if j + r < N: + y = data[c, j + r] + else: + y = 0 + out[i, j] += kernels[i, c, r] * y +``` + +#### Outer Loops + +- **`for i in seq(0, OC):`**: Iterates over the output channels. +- **`for j in seq(0, N):`**: Iterates over the spatial dimension of the output. + +#### Initialization + +- **`out[i, j] = 0.0`**: Initializes the output element at `(i, j)` to zero. + +#### Inner Loops + +- **`for c in seq(0, IC):`**: Iterates over the input channels. +- **`for r in seq(0, W):`**: Iterates over the kernel width. + +#### Conditional Data Access + +```python +y: i32 +if j + r < N: + y = data[c, j + r] +else: + y = 0 +``` + +- **Purpose**: Handles boundary conditions where the kernel extends beyond the input data. +- **`y`**: Temporary variable to hold the input data or zero. +- **Condition**: + - **If `j + r < N`**: Valid index; assign `data[c, j + r]` to `y`. + - **Else**: Out-of-bounds; assign `0` to `y`. + +#### Accumulation + +```python +out[i, j] += kernels[i, c, r] * y +``` + +- **Operation**: Accumulates the product of the kernel weight and the input data into the output. +- **`kernels[i, c, r]`**: Kernel weight for output channel `i`, input channel `c`, at position `r`. +- **`y`**: The input data value or zero. + +## Conclusion + +This example demonstrates how Exo's object code syntax allows for precise and expressive definitions of computations, particularly for performance-critical operations like convolutions. By understanding the annotations, loops, and operations, you can write efficient Exo procedures that can be further optimized and transformed for specific hardware targets. + +### Key Points + +- **Annotations**: Use `name: type[size] @ memory` to declare variables with explicit types and memory spaces. +- **Loops**: Utilize `for` loops with `seq(start, end)` for controlled iteration. +- **Conditionals**: Implement boundary checks and other logic using `if` and `else`. +- **Operations**: Perform computations using standard arithmetic operators, with support for in-place updates. + +### Further Reading + +- **Exo Documentation**: Explore more about Exo's syntax and capabilities in the official documentation. +- **Optimizations**: Learn how to apply scheduling primitives and transformations to optimize Exo procedures. + +By leveraging Exo's powerful syntax and features, you can develop high-performance code tailored to specific hardware architectures, enabling efficient execution of complex algorithms. diff --git a/docs/backend_ops.md b/docs/primitives/backend_ops.md similarity index 100% rename from docs/backend_ops.md rename to docs/primitives/backend_ops.md diff --git a/docs/buffer_ops.md b/docs/primitives/buffer_ops.md similarity index 100% rename from docs/buffer_ops.md rename to docs/primitives/buffer_ops.md diff --git a/docs/config_ops.md b/docs/primitives/config_ops.md similarity index 100% rename from docs/config_ops.md rename to docs/primitives/config_ops.md diff --git a/docs/loop_ops.md b/docs/primitives/loop_ops.md similarity index 100% rename from docs/loop_ops.md rename to docs/primitives/loop_ops.md diff --git a/docs/other_ops.md b/docs/primitives/other_ops.md similarity index 100% rename from docs/other_ops.md rename to docs/primitives/other_ops.md diff --git a/docs/subproc_ops.md b/docs/primitives/subproc_ops.md similarity index 100% rename from docs/subproc_ops.md rename to docs/primitives/subproc_ops.md From c2c42bd9164512041c9d70ff02ccb2a7325db269 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Mon, 21 Oct 2024 19:19:04 -0400 Subject: [PATCH 03/26] Add files --- docs/Cursors.md | 4 +++ docs/Design.md | 55 ++++++++++++++++++++++++++++++++++ docs/Imports.md | 2 ++ docs/{API.md => Procedures.md} | 0 docs/README.md | 11 +++++++ docs/System.md | 3 ++ docs/configurations.md | 3 -- examples/cursors/README.md | 1 + 8 files changed, 76 insertions(+), 3 deletions(-) create mode 100644 docs/Design.md create mode 100644 docs/Imports.md rename docs/{API.md => Procedures.md} (100%) create mode 100644 docs/README.md create mode 100644 docs/System.md delete mode 100644 docs/configurations.md create mode 100644 examples/cursors/README.md diff --git a/docs/Cursors.md b/docs/Cursors.md index 317ea7a7..6f4bc568 100644 --- a/docs/Cursors.md +++ b/docs/Cursors.md @@ -370,3 +370,7 @@ p2 = reorder_scope(p1, p1.forward(c).next(), ...) In this code, the navigation `.next()` is applied to the forwarded cursor `p1.forward(c)`. Attempting to change `p1.forward(c).next()` to `p1.forward(c.next())` will result in incorrect behavior. This is because navigation and forwarding are *not commutative*. +## further +More details can be found in our ASPLOS '25 paper or Kevin Qian's master thesis + + diff --git a/docs/Design.md b/docs/Design.md new file mode 100644 index 00000000..9f6c56c3 --- /dev/null +++ b/docs/Design.md @@ -0,0 +1,55 @@ +# Design document for Exo + +Here is a summary of the key design decisions of the Exo language in github markdown format: + +# Exo: A Language for Hardware-Accelerated Kernel Libraries + +Exo is a domain-specific language designed to enable productive development of high-performance kernel libraries that target specialized hardware accelerators. The key design principles of Exo are: + +## Exocompilation: Externalizing Hardware Targets + +One of the main ideas behind Exo is **exocompilation**, which allows users to define hardware targets externally to the compiler in user-level libraries. This has several advantages: + +- Hardware vendors can support new accelerators without maintaining compiler forks +- The cost of adding support for new hardware is significantly reduced +- Proprietary details of hardware can be protected + +Users can model custom memories, instructions, and configuration state in libraries to target a specific accelerator. These hardware abstractions can then be used to write hand-optimized code or as building blocks for higher-level scheduling transformations. + +## Fine-Grained Primitives for Performance Control + +Exo offers a set of fine-grained scheduling primitives that give users low-level control over performance-critical details. These primitives can be composed to build complex transformation schedules. Some examples of these primitives include: + +- `split` and `reorder` for loop transformations +- `stage_mem` for explicit data movement between memories +- `replace` for mapping code fragments to custom instructions + +Having explicit control over these low-level details enables Exo to achieve performance competitive with highly-tuned vendor libraries and hand-optimized assembly code. + +## User-Defined Scheduling Operations + +While the flexibility of fine-grained primitives is necessary for achieving peak performance, directly using them can be verbose and laborious. To address this, Exo allows users to define new higher-level scheduling operations by composing the core primitives. + +These user-defined scheduling operations can encapsulate common optimization patterns and hardware-specific transformations, greatly improving productivity. They can be put together in reusable libraries, further enabling modularity and portability. + +## The AIR Abstraction: Action, Inspection, Reference + +To enable user-defined scheduling operations, Exo introduces a powerful abstraction called AIR, which stands for Action, Inspection, and Reference. + +- **Actions** are the scheduling primitives that transform the code (e.g., `split`, `reorder`). +- **Inspection** queries properties of the code (e.g., loop bounds, memory access patterns). +- **References** point to specific parts of the code to apply actions to. + +Together, AIR allows scheduling operations to be defined as composable rewrites on the code. The language implementation guarantees the correctness of these rewrites with a set of effect analyses. + +## Cursors: Enabling Relative References + +A novel feature in Exo's design is the concept of cursors, which serve as relative references into the code. Similar to a text editing cursor, an Exo cursor identifies a specific location in the program AST, such as a statement, loop nest, or even the gap between statements. + +Cursors support navigation operations such as `next`, `prev`, `parent`, enabling powerful code transformations using relative positions. Multiple cursors can coexist, allowing different parts of the code to be referenced and modified simultaneously. + +Using cursors, complex scheduling operations can be built using simple navigation and rewrite rules, with the cursor abstracting away the details of manual AST manipulation. + +## Evaluation + +The effectiveness of Exo's design is demonstrated through case studies targeting specialized accelerators like Gemmini and x86 CPUs with AVX-512 extensions. With Exo, state-of-the-art performance is achieved on key computational kernels like matrix multiplication and convolution, using an order of magnitude fewer lines of code compared to handwritten libraries. diff --git a/docs/Imports.md b/docs/Imports.md new file mode 100644 index 00000000..1a2041c0 --- /dev/null +++ b/docs/Imports.md @@ -0,0 +1,2 @@ +Document about how to import different modules + diff --git a/docs/API.md b/docs/Procedures.md similarity index 100% rename from docs/API.md rename to docs/Procedures.md diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 00000000..ef566ad6 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,11 @@ +write where is what documentaiton + +Learn about primitives, learn about external hardware definition, learn about Cursors (with link to all of them) in the main README + + + +# Further read +thesis +papers +tutorial +examples diff --git a/docs/System.md b/docs/System.md new file mode 100644 index 00000000..f132b609 --- /dev/null +++ b/docs/System.md @@ -0,0 +1,3 @@ +# System overview + +Talk about compilation process. diff --git a/docs/configurations.md b/docs/configurations.md deleted file mode 100644 index a683cefe..00000000 --- a/docs/configurations.md +++ /dev/null @@ -1,3 +0,0 @@ -# Configurations - -Debug https://github.com/exo-lang/exo/issues/732 first... diff --git a/examples/cursors/README.md b/examples/cursors/README.md new file mode 100644 index 00000000..249f01f1 --- /dev/null +++ b/examples/cursors/README.md @@ -0,0 +1 @@ +# Cursor step-by-step example! From dd46691abe858d2db55630cd97a02ca67525ee2b Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Mon, 21 Oct 2024 19:59:46 -0400 Subject: [PATCH 04/26] add boilerplate text --- docs/Imports.md | 165 +++++++++++++++++++++++++++++ docs/System.md | 61 +++++++++++ examples/cursors/README.md | 209 +++++++++++++++++++++++++++++++++++++ 3 files changed, 435 insertions(+) diff --git a/docs/Imports.md b/docs/Imports.md index 1a2041c0..86c7b84b 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -1,2 +1,167 @@ Document about how to import different modules +# Explanation of Imports in Exo Language Script + +This document provides an overview of the import statements used in an Exo language script. [Exo](https://github.com/exo-lang/exo) is a programming system that facilitates the development of high-performance code, particularly for hardware accelerators and specialized computing platforms. + +--- + +## Table of Contents + +1. [Standard Python Future Import](#1-standard-python-future-import) +2. [Core Exo Module](#2-core-exo-module) +3. [Memory Libraries](#3-memory-libraries) +4. [Platform-Specific Modules](#4-platform-specific-modules) +5. [Frontend Syntax Utilities](#5-frontend-syntax-utilities) +6. [Standard Library Modules](#6-standard-library-modules) +7. [External Interfaces](#7-external-interfaces) +8. [API Cursors](#8-api-cursors) + +--- + +## 1. Standard Python Future Import + +```python +from __future__ import annotations +``` + +- **Purpose**: Enables postponed evaluation of type annotations, allowing you to use forward references in type hints without causing issues during runtime. +- **Context**: This is a standard Python feature that improves compatibility and performance when using type hints in your code. + +--- + +## 2. Core Exo Module + +```python +from exo import * +``` + +- **Purpose**: Imports all core functionalities from the Exo language. +- **Includes**: Fundamental classes and functions necessary for defining and manipulating high-performance computational kernels. + +--- + +## 3. Memory Libraries + +### 3.1 Importing `DRAM_STATIC` + +```python +from exo.libs.memories import DRAM_STATIC +``` + +- **Purpose**: Provides access to a static DRAM memory model. +- **Usage**: Used for declaring and managing statically allocated memory regions in DRAM. + +### 3.2 Importing Multiple Memory Classes and Errors + +```python +from exo.libs.memories import MDRAM, MemGenError, StaticMemory, DRAM_STACK +``` + +- **Components**: + - `MDRAM`: Multi-dimensional DRAM memory abstraction. + - `MemGenError`: Exception class for memory generation errors. + - `StaticMemory`: Base class for statically allocated memory types. + - `DRAM_STACK`: Represents a stack allocated in DRAM. +- **Usage**: Facilitates advanced memory management and error handling in performance-critical code. + +--- + +## 4. Platform-Specific Modules + +### 4.1 x86 Platform Optimizations + +```python +from exo.platforms.x86 import * +``` + +- **Purpose**: Imports optimizations and definitions specific to x86 architectures. +- **Usage**: Enables the generation of optimized code tailored for x86 CPUs, including SIMD instructions and cache management. + +### 4.2 ARM NEON Platform Optimizations + +```python +from exo.platforms.neon import * +``` + +- **Purpose**: Provides ARM NEON-specific functionalities. +- **Usage**: Allows for optimization of code on ARM architectures that support NEON instructions, enhancing performance on mobile and embedded devices. + +--- + +## 5. Frontend Syntax Utilities + +```python +from exo.frontend.syntax import * +``` + +- **Purpose**: Imports utilities for parsing and manipulating Exo's frontend syntax. +- **Usage**: Used when extending or customizing the language's syntax for domain-specific applications. + +--- + +## 6. Standard Library Modules + +### 6.1 Scheduling Utilities + +```python +from exo.stdlib.scheduling import * +``` + +- **Purpose**: Provides functions for scheduling and transforming computational kernels. +- **Includes**: Loop transformations, tiling, unrolling, and other optimization techniques. + +### 6.2 Standard Library Functions + +```python +from exo.stdlib.stdlib import * +``` + +- **Purpose**: Imports standard library functions and classes. +- **Usage**: Offers a collection of common utilities and helpers used across various Exo programs. + +--- + +## 7. External Interfaces + +```python +from exo.libs.externs import * +``` + +- **Purpose**: Facilitates interaction with external libraries and functions not defined within Exo. +- **Usage**: Allows for the integration of external code, such as C functions or hardware-specific routines, into Exo programs. + +--- + +## 8. API Cursors + +```python +from exo.API_cursors import * +``` + +- **Purpose**: Provides cursor-based APIs for navigating and modifying code structures. +- **Usage**: Enables advanced code introspection and manipulation, useful for metaprogramming and automated optimizations. + +--- + +# Conclusion + +The imports listed are essential for setting up an Exo environment tailored for high-performance computing. They collectively provide: + +- Core language functionalities. +- Advanced memory management. +- Platform-specific optimizations for x86 and ARM NEON architectures. +- Utilities for syntax manipulation and code scheduling. +- Integration capabilities with external codebases. +- Advanced APIs for code transformation. + +Understanding each import helps in leveraging Exo's full potential for developing optimized computational kernels and applications. + +--- + +# References + +- [Exo Language Repository](https://github.com/exo-lang/exo) +- [Python `__future__` Module Documentation](https://docs.python.org/3/library/__future__.html) +- [Exo Documentation (if available)](https://github.com/exo-lang/exo/wiki) + diff --git a/docs/System.md b/docs/System.md index f132b609..6b0e7fc2 100644 --- a/docs/System.md +++ b/docs/System.md @@ -1,3 +1,64 @@ # System overview Talk about compilation process. + +# Exo System Overview + +Exo is a programming language and compiler designed for productive development of high-performance kernel libraries targeting specialized hardware accelerators. This document provides an overview of the Exo compilation process, as illustrated in Figure 1 of the PLDI'22 paper. + +## Compilation Process + +The Exo compiler consists of a frontend and a backend, with user schedules applied in between. The input to the compiler is a set of Exo source files (`*.exo`), and the output is generated C code (`*.c`). + +### Frontend + +The frontend performs the following tasks: + +1. **Type Checking**: Ensures that the program is well-typed according to Exo's type system. +2. **Bounds Checking**: Verifies that array accesses are within the specified bounds. +3. **Assert Checking**: Checks that any `assert` statements in the code are satisfied. + +If any of these checks fail, the compiler reports an error and halts the compilation process. + +### User Schedules + +After the frontend checks, user-defined schedules are applied to optimize the program for the target hardware. Schedules are written as a sequence of rewrite rules, which transform the program while preserving its semantics. + +Exo provides a set of primitive scheduling operators, such as: + +- `split`: Splits a loop into two nested loops. +- `reorder`: Reorders two nested loops. +- `unroll`: Unrolls a loop by a specified factor. +- `inline`: Inlines a function call. +- `replace`: Replaces a code fragment with a semantically equivalent implementation, often used for mapping to custom instructions. + +Users can compose these primitives to define higher-level scheduling operations using Python code. The Exo compiler applies the user-defined schedules to transform the program. + +### Backend + +After the user schedules are applied, the backend performs the following tasks: + +1. **Memory/Precision Checking**: Verifies that the program correctly uses the memories and data types specified in the hardware library. +2. **Code Generation**: Generates C code from the transformed Exo program. + +The backend checks are performed after scheduling to allow the schedules to modify the memory and precision annotations in the program. + +## Hardware Libraries + +An essential part of the Exo system is the ability to define hardware targets as user libraries. These libraries specify the details of the target accelerator, such as: + +- Custom memories +- Custom instructions +- Configuration state + +By defining these hardware details in libraries, Exo allows targeting new accelerators without modifying the core compiler. The schedules can then use these hardware-specific features to optimize the program for the target accelerator. + +## Source Code + +The source code for the Exo compiler is available on GitHub: [https://github.com/exo-lang/exo](https://github.com/exo-lang/exo) + +The repository contains the implementation of the Exo language, the compiler, and a set of hardware libraries for different accelerators. + +## Conclusion + +The Exo system provides a productive environment for developing high-performance kernel libraries targeting specialized hardware accelerators. By combining a flexible scheduling language with the ability to define hardware targets in libraries, Exo enables achieving state-of-the-art performance with significantly less engineering effort compared to traditional approaches. diff --git a/examples/cursors/README.md b/examples/cursors/README.md index 249f01f1..9c0ee85b 100644 --- a/examples/cursors/README.md +++ b/examples/cursors/README.md @@ -1 +1,210 @@ # Cursor step-by-step example! + +""" +Exo Cursor Tutorial +=================== + +This tutorial demonstrates how to use cursors in the Exo scheduling language +to navigate and transform Exo object code. + +Cursors allow you to select and refer to parts of the code such as expressions, +statements, and code blocks. They support spatial navigation within a procedure +to proximate locations. + +Key concepts covered: +- Finding cursors using patterns +- Navigating using cursors +- Applying scheduling primitives with cursors +- Forwarding cursors after transformations + +Example 1: Finding cursors +-------------------------- +""" + +# Assume we have an Exo procedure p with this loop nest: +# for i in seq(0, n): +# for j in seq(0, m): +# C[i,j] = A[i,k] * B[k,j] + +# Find a cursor to the i loop by name +i_loop = p.find_loop('i') + +# Find the same i loop by pattern +i_loop2 = p.find('for i in _: _') +assert i_loop == i_loop2 + +""" +Example 2: Navigating with cursors +---------------------------------- +""" +# Find cursors to key parts of the code +j_loop = i_loop.body()[0] # j is the only statement in i's body +C_store = j_loop.body()[0] # C[i,j] = ... is the only statement in j's body +A_load = C_store.rhs().lhs() # A[i,k] in the RHS of the C[i,j] = ... statement +i_loop_parent = i_loop.parent() # The parent scope of the i loop + +""" +Example 3: Applying scheduling primitives +----------------------------------------- +""" +# Divide the i loop by 4 +p = divide_loop(p, i_loop, 4, ['io','ii'], perfect=True) + +# Reorder the j loop to before the ii loop +p = reorder_loops(p, [j_loop, ii_loop]) + +""" +Example 4: Forwarding cursors +----------------------------- +""" +# After dividing the i loop, the original i_loop cursor is invalid +# We need to "forward" the cursor to the new procedure +with proc.undo(): + # Undo puts i_loop back in a valid state + assert i_loop.is_valid() + + # Divide the i loop again + p = divide_loop(p, i_loop, 4, ['io','ii'], perfect=True) + +assert not i_loop.is_valid() # No longer valid after divide_loop + +i_loop = p.forward(i_loop) # Forward the cursor to the new proc +assert i_loop.is_valid() # Now valid again in new proc + +# Additional navigation is done relative to the new proc +ii_loop = i_loop.body()[1] + +""" +This covers the key cursor concepts from the Exo 2 paper. Cursors +enable powerfully composable ways to refer to and transform code! +""" + + + + + + +To create an Exo Cursor tutorial in Python using the code examples from the paper, here's a Python file outline with documentation and code examples inspired by the paper's description of Exo 2. + +```python +""" +Exo Cursor Tutorial - Python Version + +This tutorial introduces the concept of Cursors in Exo 2 and demonstrates +how to use scheduling operators with them to manipulate loops and optimize code. + +Cursors in Exo allow you to refer to parts of code by their structure or name +and perform scheduling operations such as loop tiling and vectorization. + +""" + +# Example 1: Basic loop example using Exo 2 + +def gemv(M: int, N: int, A: list, x: list, y: list): + """ + GEMV kernel: y = A * x + Args: + M (int): Number of rows in matrix A + N (int): Number of columns in matrix A + A (list): M x N matrix stored in DRAM + x (list): N-dimensional vector stored in DRAM + y (list): M-dimensional vector stored in DRAM + """ + # Ensure dimensions are multiples of 8 + assert M % 8 == 0 + assert N % 8 == 0 + + for i in range(M): + for j in range(N): + y[i] += A[i][j] * x[j] + +# Now we perform some scheduling operations + +def schedule_gemv(gemv): + """ + Example scheduling of the gemv function using Exo 2-style transformations. + We will tile the loops to improve cache locality. + """ + # Divide the 'i' loop into two: io (outer loop) and ii (inner loop) + g = divide_loop(gemv, 'i', 8, ['io', 'ii'], perfect=True) + + # Divide the 'j' loop similarly + g = divide_loop(g, 'j', 8, ['jo', 'ji'], perfect=True) + + # Lift the 'jo' loop outside + g = lift_scope(g, 'jo') + + return g + +# Cursors example + +def cursor_example(): + """ + Example of how to use cursors in Exo 2 to locate loops + and apply transformations. + """ + # Define gemv kernel + g = gemv + + # Find the 'i' loop + loop_0 = g.find_loop('i') # Find by name + loop_1 = g.find('for i in _: _') # Find by pattern + + # Verify both references point to the same loop + assert(loop_0 == loop_1) + + # Now we can apply scheduling to this loop + g = divide_loop(g, loop_0, 8, ['io', 'ii'], perfect=True) + + return g + +# Helper function for tiling +def tile_2D(p, i_lp, j_lp, i_itrs, j_itrs, i_sz, j_sz): + """ + Perform a 2D tiling of the i and j loops. + Args: + p: Procedure to be tiled + i_lp: Name of the i loop + j_lp: Name of the j loop + i_itrs: New iterators for the i loop + j_itrs: New iterators for the j loop + i_sz: Tile size for the i loop + j_sz: Tile size for the j loop + """ + p = divide_loop(p, i_lp, i_sz, i_itrs, perfect=True) + p = divide_loop(p, j_lp, j_sz, j_itrs, perfect=True) + p = lift_scope(p, j_itrs[0]) + return p + +# Example of using tile_2D function +def tile_example(): + """ + Example usage of tile_2D to perform 2D tiling on the gemv kernel. + """ + g = gemv + g = tile_2D(g, 'i', 'j', ['io', 'ii'], ['jo', 'ji'], 8, 8) + return g + + +if __name__ == "__main__": + # Original GEMV kernel + gemv(8, 8, [[0.5 for _ in range(8)] for _ in range(8)], [0.5 for _ in range(8)], [0 for _ in range(8)]) + + # Apply scheduling + scheduled_gemv = schedule_gemv(gemv) + + # Run example cursor operations + cursor_example() + + # Run tiling example + tile_example() +``` + +### Key Points: +- **`gemv`**: Implements the original matrix-vector multiplication. +- **`schedule_gemv`**: Demonstrates basic loop tiling for better performance. +- **`cursor_example`**: Shows how to find loops using cursors and apply transformations. +- **`tile_2D`**: A helper function to generalize the 2D tiling operation. +- **`tile_example`**: Applies 2D tiling on the `gemv` kernel. + +This is a basic tutorial to demonstrate Exo Cursor usage and scheduling optimizations. You can expand it with more complex optimizations based on the examples from the paper【5†source】. From 30ca7382395b522ddfd5165b4fe93dd50d8bd0dd Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Wed, 23 Oct 2024 16:54:19 -0400 Subject: [PATCH 05/26] update cursor example --- examples/cursors/.gitignore | 1 + examples/cursors/README.md | 200 +----------------------------------- examples/cursors/cursors.py | 141 +++++++++++++++++++++++++ 3 files changed, 146 insertions(+), 196 deletions(-) create mode 100644 examples/cursors/.gitignore create mode 100644 examples/cursors/cursors.py diff --git a/examples/cursors/.gitignore b/examples/cursors/.gitignore new file mode 100644 index 00000000..6cbfadf7 --- /dev/null +++ b/examples/cursors/.gitignore @@ -0,0 +1 @@ +cursors/ diff --git a/examples/cursors/README.md b/examples/cursors/README.md index 9c0ee85b..f34b3288 100644 --- a/examples/cursors/README.md +++ b/examples/cursors/README.md @@ -1,14 +1,12 @@ # Cursor step-by-step example! -""" -Exo Cursor Tutorial -=================== +Very simple example using the tile2D example (in the paper!) -This tutorial demonstrates how to use cursors in the Exo scheduling language -to navigate and transform Exo object code. + +This example demonstrates how to use Cursors to navigate and transform Exo object code. Cursors allow you to select and refer to parts of the code such as expressions, -statements, and code blocks. They support spatial navigation within a procedure +statements, and code blocks. They also support spatial navigation within a procedure to proximate locations. Key concepts covered: @@ -17,194 +15,4 @@ Key concepts covered: - Applying scheduling primitives with cursors - Forwarding cursors after transformations -Example 1: Finding cursors --------------------------- -""" - -# Assume we have an Exo procedure p with this loop nest: -# for i in seq(0, n): -# for j in seq(0, m): -# C[i,j] = A[i,k] * B[k,j] - -# Find a cursor to the i loop by name -i_loop = p.find_loop('i') - -# Find the same i loop by pattern -i_loop2 = p.find('for i in _: _') -assert i_loop == i_loop2 - -""" -Example 2: Navigating with cursors ----------------------------------- -""" -# Find cursors to key parts of the code -j_loop = i_loop.body()[0] # j is the only statement in i's body -C_store = j_loop.body()[0] # C[i,j] = ... is the only statement in j's body -A_load = C_store.rhs().lhs() # A[i,k] in the RHS of the C[i,j] = ... statement -i_loop_parent = i_loop.parent() # The parent scope of the i loop - -""" -Example 3: Applying scheduling primitives ------------------------------------------ -""" -# Divide the i loop by 4 -p = divide_loop(p, i_loop, 4, ['io','ii'], perfect=True) - -# Reorder the j loop to before the ii loop -p = reorder_loops(p, [j_loop, ii_loop]) - -""" -Example 4: Forwarding cursors ------------------------------ -""" -# After dividing the i loop, the original i_loop cursor is invalid -# We need to "forward" the cursor to the new procedure -with proc.undo(): - # Undo puts i_loop back in a valid state - assert i_loop.is_valid() - - # Divide the i loop again - p = divide_loop(p, i_loop, 4, ['io','ii'], perfect=True) - -assert not i_loop.is_valid() # No longer valid after divide_loop - -i_loop = p.forward(i_loop) # Forward the cursor to the new proc -assert i_loop.is_valid() # Now valid again in new proc - -# Additional navigation is done relative to the new proc -ii_loop = i_loop.body()[1] - -""" -This covers the key cursor concepts from the Exo 2 paper. Cursors -enable powerfully composable ways to refer to and transform code! -""" - - - - - - -To create an Exo Cursor tutorial in Python using the code examples from the paper, here's a Python file outline with documentation and code examples inspired by the paper's description of Exo 2. - -```python -""" -Exo Cursor Tutorial - Python Version - -This tutorial introduces the concept of Cursors in Exo 2 and demonstrates -how to use scheduling operators with them to manipulate loops and optimize code. - -Cursors in Exo allow you to refer to parts of code by their structure or name -and perform scheduling operations such as loop tiling and vectorization. - -""" - -# Example 1: Basic loop example using Exo 2 - -def gemv(M: int, N: int, A: list, x: list, y: list): - """ - GEMV kernel: y = A * x - Args: - M (int): Number of rows in matrix A - N (int): Number of columns in matrix A - A (list): M x N matrix stored in DRAM - x (list): N-dimensional vector stored in DRAM - y (list): M-dimensional vector stored in DRAM - """ - # Ensure dimensions are multiples of 8 - assert M % 8 == 0 - assert N % 8 == 0 - - for i in range(M): - for j in range(N): - y[i] += A[i][j] * x[j] - -# Now we perform some scheduling operations - -def schedule_gemv(gemv): - """ - Example scheduling of the gemv function using Exo 2-style transformations. - We will tile the loops to improve cache locality. - """ - # Divide the 'i' loop into two: io (outer loop) and ii (inner loop) - g = divide_loop(gemv, 'i', 8, ['io', 'ii'], perfect=True) - - # Divide the 'j' loop similarly - g = divide_loop(g, 'j', 8, ['jo', 'ji'], perfect=True) - - # Lift the 'jo' loop outside - g = lift_scope(g, 'jo') - - return g - -# Cursors example - -def cursor_example(): - """ - Example of how to use cursors in Exo 2 to locate loops - and apply transformations. - """ - # Define gemv kernel - g = gemv - - # Find the 'i' loop - loop_0 = g.find_loop('i') # Find by name - loop_1 = g.find('for i in _: _') # Find by pattern - - # Verify both references point to the same loop - assert(loop_0 == loop_1) - - # Now we can apply scheduling to this loop - g = divide_loop(g, loop_0, 8, ['io', 'ii'], perfect=True) - - return g - -# Helper function for tiling -def tile_2D(p, i_lp, j_lp, i_itrs, j_itrs, i_sz, j_sz): - """ - Perform a 2D tiling of the i and j loops. - Args: - p: Procedure to be tiled - i_lp: Name of the i loop - j_lp: Name of the j loop - i_itrs: New iterators for the i loop - j_itrs: New iterators for the j loop - i_sz: Tile size for the i loop - j_sz: Tile size for the j loop - """ - p = divide_loop(p, i_lp, i_sz, i_itrs, perfect=True) - p = divide_loop(p, j_lp, j_sz, j_itrs, perfect=True) - p = lift_scope(p, j_itrs[0]) - return p - -# Example of using tile_2D function -def tile_example(): - """ - Example usage of tile_2D to perform 2D tiling on the gemv kernel. - """ - g = gemv - g = tile_2D(g, 'i', 'j', ['io', 'ii'], ['jo', 'ji'], 8, 8) - return g - - -if __name__ == "__main__": - # Original GEMV kernel - gemv(8, 8, [[0.5 for _ in range(8)] for _ in range(8)], [0.5 for _ in range(8)], [0 for _ in range(8)]) - - # Apply scheduling - scheduled_gemv = schedule_gemv(gemv) - - # Run example cursor operations - cursor_example() - - # Run tiling example - tile_example() -``` - -### Key Points: -- **`gemv`**: Implements the original matrix-vector multiplication. -- **`schedule_gemv`**: Demonstrates basic loop tiling for better performance. -- **`cursor_example`**: Shows how to find loops using cursors and apply transformations. -- **`tile_2D`**: A helper function to generalize the 2D tiling operation. -- **`tile_example`**: Applies 2D tiling on the `gemv` kernel. -This is a basic tutorial to demonstrate Exo Cursor usage and scheduling optimizations. You can expand it with more complex optimizations based on the examples from the paper【5†source】. diff --git a/examples/cursors/cursors.py b/examples/cursors/cursors.py new file mode 100644 index 00000000..5999e2bc --- /dev/null +++ b/examples/cursors/cursors.py @@ -0,0 +1,141 @@ +from __future__ import annotations +from exo import * +from exo.API_scheduling import * + +""" +Cursor Example + +This example introduces the concept of Cursors in Exo 2 paper and demonstrates +how to use scheduling operators with them to manipulate loops and optimize code. + +Cursors allow you to select and refer to parts of the code such as expressions, +statements, and code blocks. They also support spatial navigation within a procedure +to proximate locations. + +Key concepts covered: +- Finding cursors using patterns +- Navigating using cursors +- Applying scheduling primitives with cursors +- Forwarding cursors after transformations +""" + + +""" +1: Basic loop example using Exo 2 + +GEMV kernel: y = A * x +Args: + M (size): Number of rows in matrix A + N (size): Number of columns in matrix A + A (tensor): M x N matrix stored in DRAM + x (tensor): N-dimensional vector stored in DRAM + y (tensor): M-dimensional vector stored in DRAM +""" + + +@proc +def gemv(M: size, N: size, A: f32[M, N], x: f32[N], y: f32[M]): + assert M % 8 == 0 + assert N % 8 == 0 + + for i in seq(0, M): + for j in seq(0, N): + y[i] += A[i, j] * x[j] + + +print("1: Original GEMV kernel") +print(gemv) +print() + + +""" +2: Finding cursors +""" +# Find a cursor to the i loop by name +i_loop = gemv.find_loop("i") + +# Find the same i loop by pattern +i_loop2 = gemv.find("for i in _: _") + +# Check that two cursors are pointing to the same 'i' loop +assert i_loop == i_loop2 + +print("2: i_loop points to:") +print(i_loop) +print() + + +""" +3: Navigating with cursors +""" +# Find cursors to key parts of the code +j_loop = i_loop.body()[0] # j is the only statement in i's body +C_store = j_loop.body()[0] # y[i] = ... is the only statement in j's body +j_loop_parent = j_loop.parent() # The parent of the j loop + +# Check that j_loop's parent is indeed pointing to the i_loop +assert i_loop == j_loop_parent + +print("3: j_loop points to:") +print(j_loop) +print() + + +""" +4: Applying scheduling primitives & Cursor forwarding +""" +# First, rename the gemv +g = rename(gemv, "gemv_scheduled") + +# Divide the i loop by 8 +g = divide_loop(g, i_loop, 8, ["io", "ii"], perfect=True) + +# Divide the j loop by 8 +g = divide_loop(g, j_loop, 8, ["jo", "ji"], perfect=True) + +# Now, we want to reorder ii and jo loops, by lifting the scope of j_loop +# We can still use the j_loop cursor! +g1 = lift_scope(g, j_loop) +g2 = lift_scope(g, g.forward(j_loop)) + +# Assert that g1 and g2 are the same (`j_loop` is implicitly forwarded in the first line) +assert g1 == g2 + +print("4: Tiled gemv") +print(g1) +print("4: g.forward(j_loop) points to:") +print(g.forward(j_loop)) +print() + + +""" +5: Defining a new scheduling operator +""" + + +def tile_2D(p, i_lp, j_lp, i_itrs, j_itrs, i_sz, j_sz): + """ + Perform a 2D tiling of the i and j loops. + Args: + p: Procedure to be tiled + i_lp: Name of the i loop + j_lp: Name of the j loop + i_itrs: New iterators for the i loop + j_itrs: New iterators for the j loop + i_sz: Tile size for the i loop + j_sz: Tile size for the j loop + """ + p = divide_loop(p, i_lp, i_sz, i_itrs, perfect=True) + p = divide_loop(p, j_lp, j_sz, j_itrs, perfect=True) + p = lift_scope(p, j_itrs[0]) + return p + + +# Example usage of tile_2D to perform 2D tiling on the gemv kernel. +final_g = tile_2D(gemv, i_loop, j_loop, ["io", "ii"], ["jo", "ji"], 8, 8) + +print("5: tile_2D applied gemv:") +print(final_g) + + +__all__ = ["final_g"] From 3cb527e5cf93d684bd0167b539a82206c4e933fe Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sat, 26 Oct 2024 10:56:28 -0400 Subject: [PATCH 06/26] Revise text --- docs/Design.md | 34 +++++++++----- docs/Imports.md | 81 +++++++++----------------------- docs/System.md | 6 +-- docs/images/system-overview.png | Bin 0 -> 37473 bytes 4 files changed, 45 insertions(+), 76 deletions(-) create mode 100644 docs/images/system-overview.png diff --git a/docs/Design.md b/docs/Design.md index 9f6c56c3..ca0ff13e 100644 --- a/docs/Design.md +++ b/docs/Design.md @@ -1,12 +1,13 @@ # Design document for Exo -Here is a summary of the key design decisions of the Exo language in github markdown format: +Exo is a domain-specific language designed to enable productive development of high-performance kernel libraries that target specialized hardware accelerators. -# Exo: A Language for Hardware-Accelerated Kernel Libraries +The key design principles of Exo are: +- Performance transparity: We do not do "magic optimization" that are surprising and opaque to users. +- WYSWYG: Exo IR closely models C-style code and will be trivially lowered to C code. +- Give the performance control back to users -Exo is a domain-specific language designed to enable productive development of high-performance kernel libraries that target specialized hardware accelerators. The key design principles of Exo are: - -## Exocompilation: Externalizing Hardware Targets +# Exocompilation: Externalizing Hardware Targets One of the main ideas behind Exo is **exocompilation**, which allows users to define hardware targets externally to the compiler in user-level libraries. This has several advantages: @@ -16,6 +17,8 @@ One of the main ideas behind Exo is **exocompilation**, which allows users to de Users can model custom memories, instructions, and configuration state in libraries to target a specific accelerator. These hardware abstractions can then be used to write hand-optimized code or as building blocks for higher-level scheduling transformations. +More info can be found in the [PLDI paper](https://people.csail.mit.edu/yuka/pdf/exo_pldi2022_full.pdf) and [./instructions.md] and [./memories.md]. + ## Fine-Grained Primitives for Performance Control Exo offers a set of fine-grained scheduling primitives that give users low-level control over performance-critical details. These primitives can be composed to build complex transformation schedules. Some examples of these primitives include: @@ -25,16 +28,27 @@ Exo offers a set of fine-grained scheduling primitives that give users low-level - `replace` for mapping code fragments to custom instructions Having explicit control over these low-level details enables Exo to achieve performance competitive with highly-tuned vendor libraries and hand-optimized assembly code. +Primitives can be found in [./primitives/]. + +## Rewrite-based Scheduling Language + +Unlike previos popular frameworks like Halide and TVM which uses _lowering based_ compilation process, Exo uses _rewrite based_ compilation process. -## User-Defined Scheduling Operations +This has a few advantages: +- Less magic +- Easy to print in the middle of scheduling process and see what is going on. + +# User-Defined Scheduling Operations While the flexibility of fine-grained primitives is necessary for achieving peak performance, directly using them can be verbose and laborious. To address this, Exo allows users to define new higher-level scheduling operations by composing the core primitives. These user-defined scheduling operations can encapsulate common optimization patterns and hardware-specific transformations, greatly improving productivity. They can be put together in reusable libraries, further enabling modularity and portability. -## The AIR Abstraction: Action, Inspection, Reference +More info can be found in the ASPLOS paper and Cursor.md. + +## The AIR Framework: Action, Inspection, Reference -To enable user-defined scheduling operations, Exo introduces a powerful abstraction called AIR, which stands for Action, Inspection, and Reference. +We identified that Action, Inspection, and Reference are the key scheduling language design mechanisms that enable user-defined scheduling operations. - **Actions** are the scheduling primitives that transform the code (e.g., `split`, `reorder`). - **Inspection** queries properties of the code (e.g., loop bounds, memory access patterns). @@ -49,7 +63,3 @@ A novel feature in Exo's design is the concept of cursors, which serve as relati Cursors support navigation operations such as `next`, `prev`, `parent`, enabling powerful code transformations using relative positions. Multiple cursors can coexist, allowing different parts of the code to be referenced and modified simultaneously. Using cursors, complex scheduling operations can be built using simple navigation and rewrite rules, with the cursor abstracting away the details of manual AST manipulation. - -## Evaluation - -The effectiveness of Exo's design is demonstrated through case studies targeting specialized accelerators like Gemmini and x86 CPUs with AVX-512 extensions. With Exo, state-of-the-art performance is achieved on key computational kernels like matrix multiplication and convolution, using an order of magnitude fewer lines of code compared to handwritten libraries. diff --git a/docs/Imports.md b/docs/Imports.md index 86c7b84b..ca96c5b8 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -1,10 +1,6 @@ -Document about how to import different modules +# Imports in Exo -# Explanation of Imports in Exo Language Script - -This document provides an overview of the import statements used in an Exo language script. [Exo](https://github.com/exo-lang/exo) is a programming system that facilitates the development of high-performance code, particularly for hardware accelerators and specialized computing platforms. - ---- +This document provides an overview of the imports used when writing Exo. ## Table of Contents @@ -25,7 +21,7 @@ This document provides an overview of the import statements used in an Exo langu from __future__ import annotations ``` -- **Purpose**: Enables postponed evaluation of type annotations, allowing you to use forward references in type hints without causing issues during runtime. +- **Purpose**: Enables postponed evaluation of type annotations, allowing you to use forward references in type hints without causing issues during runtime. This is necessary to support Exo's `x : f32` syntax. - **Context**: This is a standard Python feature that improves compatibility and performance when using type hints in your code. --- @@ -37,39 +33,40 @@ from exo import * ``` - **Purpose**: Imports all core functionalities from the Exo language. -- **Includes**: Fundamental classes and functions necessary for defining and manipulating high-performance computational kernels. +- **Includes**: Fundamental classes and functions necessary for defining and manipulating high-performance computational kernels, such as `proc`, `instr`, `config`, `Memory`, `Extern`, `DRAM`, `SchedulingError`. --- -## 3. Memory Libraries - -### 3.1 Importing `DRAM_STATIC` +## 3. Frontend Syntax Utilities ```python -from exo.libs.memories import DRAM_STATIC +from exo.frontend.syntax import * ``` -- **Purpose**: Provides access to a static DRAM memory model. -- **Usage**: Used for declaring and managing statically allocated memory regions in DRAM. +- **Purpose**: Imports utilities for parsing and manipulating Exo's frontend syntax. +- **Usage**: Used when extending or customizing the language's syntax for domain-specific applications. + +--- -### 3.2 Importing Multiple Memory Classes and Errors + +## 4. Memory Libraries + + +Even though users can define memory definitions externally to the compiler in the user code (see [./memories.md]), we provide memory definitions for some architectures as examples. +What we support can be found by looking into src/exo/libs/memories.py. ```python -from exo.libs.memories import MDRAM, MemGenError, StaticMemory, DRAM_STACK +from exo.libs.memories import DRAM_STATIC, AVX2, AVX512 ``` -- **Components**: - - `MDRAM`: Multi-dimensional DRAM memory abstraction. - - `MemGenError`: Exception class for memory generation errors. - - `StaticMemory`: Base class for statically allocated memory types. - - `DRAM_STACK`: Represents a stack allocated in DRAM. -- **Usage**: Facilitates advanced memory management and error handling in performance-critical code. +For example, you can import `DRAM_STATIC` like so. Similary you can import AVX2, AVX512 + --- -## 4. Platform-Specific Modules +## 5. Instruction Libraries -### 4.1 x86 Platform Optimizations +Similary to memories, we provide some hardware instruction definitions as a library. ```python from exo.platforms.x86 import * @@ -78,7 +75,6 @@ from exo.platforms.x86 import * - **Purpose**: Imports optimizations and definitions specific to x86 architectures. - **Usage**: Enables the generation of optimized code tailored for x86 CPUs, including SIMD instructions and cache management. -### 4.2 ARM NEON Platform Optimizations ```python from exo.platforms.neon import * @@ -89,17 +85,6 @@ from exo.platforms.neon import * --- -## 5. Frontend Syntax Utilities - -```python -from exo.frontend.syntax import * -``` - -- **Purpose**: Imports utilities for parsing and manipulating Exo's frontend syntax. -- **Usage**: Used when extending or customizing the language's syntax for domain-specific applications. - ---- - ## 6. Standard Library Modules ### 6.1 Scheduling Utilities @@ -141,27 +126,3 @@ from exo.API_cursors import * - **Purpose**: Provides cursor-based APIs for navigating and modifying code structures. - **Usage**: Enables advanced code introspection and manipulation, useful for metaprogramming and automated optimizations. - ---- - -# Conclusion - -The imports listed are essential for setting up an Exo environment tailored for high-performance computing. They collectively provide: - -- Core language functionalities. -- Advanced memory management. -- Platform-specific optimizations for x86 and ARM NEON architectures. -- Utilities for syntax manipulation and code scheduling. -- Integration capabilities with external codebases. -- Advanced APIs for code transformation. - -Understanding each import helps in leveraging Exo's full potential for developing optimized computational kernels and applications. - ---- - -# References - -- [Exo Language Repository](https://github.com/exo-lang/exo) -- [Python `__future__` Module Documentation](https://docs.python.org/3/library/__future__.html) -- [Exo Documentation (if available)](https://github.com/exo-lang/exo/wiki) - diff --git a/docs/System.md b/docs/System.md index 6b0e7fc2..d398ed5e 100644 --- a/docs/System.md +++ b/docs/System.md @@ -1,10 +1,8 @@ # System overview -Talk about compilation process. +This document provides an overview of the Exo compilation process, as illustrated in Figure 1 of the PLDI'22 paper. -# Exo System Overview - -Exo is a programming language and compiler designed for productive development of high-performance kernel libraries targeting specialized hardware accelerators. This document provides an overview of the Exo compilation process, as illustrated in Figure 1 of the PLDI'22 paper. +![System overview](images/system-overview.png) ## Compilation Process diff --git a/docs/images/system-overview.png b/docs/images/system-overview.png new file mode 100644 index 0000000000000000000000000000000000000000..bb38171432b6dd7792fb875907a48c7613529162 GIT binary patch literal 37473 zcmYhD1zc3!_V;I~A&2g6knV<|yBn15E)kUO?vflz8UaBBLAnuXDUlAPK^lAy&vWm6 z|L_^+%$a@8*}c~9yVoXGOG6P0oeUiW0%0jD$?1SV2$R6^0V)FUcN!0cA9w?M>nO^A z>L)1nfe#V3M#^^T>Y!)9F)9cdj0Zw|N&@_Xz+@oge~v*Q6)^e#j&;B+|49SfGujb^ z^q({)!28pW9`FLL{?9ui9Q^<92uJvDYJ^ER;(w1p2TzwFZ7+}kZ)hG$#@-+h=I^H$ zcza%M8U%uYl;x!L{J;l4UWK^GH%fVU&)ZLc2~!xPVS2nuI1I`YBD?vxoYG!NY2F=3 z@3$C4N>RoAW5fvw2V$(C$5)|e-zmP0PZazBezner+=?WB%>9@PZ2$5?|Hs4V-@C5A zZ{#1-5Jy6{U|uvLDHKZ31V)INKvsXc?|(isK%TxP7D)-AACmT3lM+68AN=qeB6Z!v zT9JQtF{Al+DBe=?ev|sI-&U&Pgv8a#Zc*@;mj)(-_J#gC*PF?~zp5%sD#ly?c?SFy zaHG;!clFH9kKckH&zFBbUTTOP`iyb3W}u{e`gyl{<-6iJRP+8zyPSsJntSlw^2627 zuHC{nt8b1wORM^d9xp0J(wUE&#yHnhokji53XTI5N5zgibg$cg*WZoXOM0IU5VuYU z+_tv<47yX}`|7jmznfU7H^jQ~^7{+#uL9p4D+(^Jy8d=<_NQ9_h2*BCQ0nHPb)jC7 zsk=j4!{Ge1wjNE*`5Yu1bWuvJjgoS~=*5kA!S+q^j{W-2nE9hutq2bC!v=8|$?q@) zn0Xy{k2z*4rT7cCwrz)m4NoK_KGmn&u2cAZjrdN(_Ef0^(2mQN_b-AU{ur7#mbZTS zQ^)9)fKXH4{!2%dLja1L2!Axc88sv_b!pTQ+NmRV@9>;a7b>vLGFE_be0XYtAKjry z6h9_0jH5*%K@Y#pgEz^4(pW#>!XA@%p>8*0#b-p?)V9*f2i zro4cYaOw*K>5I!+@%OLiYKUI=Xd`E`2LsKlDY{?g9{Afd&MffHT-Qk?S3jEy8Bxml zH>tnTQupi3Qa2;p+2%oCZg%1ggKx&n-P-q=kULL$F^c})?=C8$xBUG3c*R^4(1hF` z7klw*-qbzQ>-&pWfk4w6rYWCD{XLgr&i6oA;MvXhcKRafFiaQEe)i{mZ~@ichoXlA zAU$PW*7fNSwbY+k-MKBwW4|Gw6Zx`co4wwTBsLR9Re`b5fL$8b!Z6|pWc{dL?-l>9 z|9k|IJP8fQmh>x46qK1^`xWtFLNq>2xlig(m8NUg!{v+`pZa9gqr~lAsl+a5IN)>2q&{N-_)}9P-1!+mW6XZ8gVjQiGdcLZV@H6w&~M{7=;||llCVrmqc(&`qp{AT*3w;uLL=?q4ycMn z`=W;h>*UVA562aMVJZxxw*)g(ulHz^KQ->;IvXg4A>F7#86Zqih|P<@Q35$6W=oZ; zGow=MprKEtxa_#)yV+ggRfm01;3e+Oiubp`KQ-ozx+B+&Sw;x2Y3BOV3(m*+z6C#g z55Bo;Ob$FqbPu4X04e(+^NdU09k{=q$y+^%62ftBKqW>$J*Ag!vq=;6{w5(b!hhI2 zL-lf2+bondCcemXkkmfM5``37bdLtfIA9Gs*-RO;AI5z3<5dy+s+r$<|i5S zv(CF8$0(8=aXP3XRwOw7+id1lQ8u(AyNBO`F8ORzXL^IK;q0ubj4HI)u>v2+vJS9= zuMsMG!{db6DX6$R_)PAQwj)M%@zT9tfDJI;8 z3-6G1@9pHd6VVK85>miLy5p#I+QVm)4d=RmzSz|?sYe|=heMg4Z6i?ApqT{0=M!qq|om;?-3~RVYJJ=?@S_ zB=1b@U!zGVk^EFf>vRqDwT~u$F`0oOi9Zl(sHWFrd}QGTr|hX-0k; za&Vj`^a<1x%et=9L2_BdH4F^|3ZT>d{=E99{>Q7I1!|d%qh|iu#&)E!nHng4bXngS z%Ku1|KbXBaWNdtH@J;uf8ZJl}mAmP%2SS~ZgUUS3{*{p)qwaBEX;M#UBcC9GVYrR@ z@Ac5rYrAPtP8x}0qZC4+MO;T!?;nAB?|VKd;{8B;9{IO{ufVR4;M7;`{e>Yo_0 z$toHZsAU@sM&ly6k(##=j>>XgBUsCSYau;vU8rXUTIzmVnt- zyKZo}7mh84?46)cg}JB}#>;V2BV6}!@`~7WoXH2TMQ6*(55WKq`2#7=(PJU6^l)PR zZuZ8tOb&t3j}0U{JKaxzEiYA^n4?hg_bT|WkPj;6KCx0#HQE6Vb_~eh$M6}-+p5E8 z$qQOS9QZJpH3Wsy#Nn5li{b~z;yI-iyDyd_y}_u6(mN!s#K|-PcFEXCCVS&yv7sOU8W{GK0(1fCPP1_{*R}7_KVM6lUzlbb;=!1$iNOONlVI!)QlRU=y z%xDa=z^E~5i>Zgp?wQ=CM@G7!HT0e+!QU*ncRj@Dz$4Tc6H(=BEKAzU*ues{TAQgZ z3Ryem#&TpcXs7R?K^YLFiraV_e%fEvLD&K-x-r;7Auel(Ez?8;RhR_GS1j3u;ch5G z{W$9a8)qmb?e}M6b4B6tNcx7EX^){0st|GqN&;|cjG>mlDV7CUrgDD>7s9w&4^FVY z0;c#bmpm#z$i`!6Dxwctib5Uj=+hcj|IHnZ1%7QCjy96rLA_P(7wJ)b5E^?I@$UuK2=n#Ano zoc=uo4E0SUG<=@Q&Y-m@7MFqNw=4EjM4t)orNMFph4#?2CJmSZa4A{`nr(T?c=gd! zX;L7bW3JvGz6!<Q?MeuQ8530fy;4+h>fHRD6n5 z{tUXPws>Wzw|o80^=(lbU}z;m(!2iXxLM?-f$`?^2{Gct7?~D5abbDvewqGd>y|kK zz7)(nn2Rj#_%-0CSx^Y4gjiGL0{;D-)HhlHbss!+qorBM*+POxs&O+hHyci8gSU$iaZ zp}5kv)g!1uZe;}6mD++1OlsEDD$25JJqn|YOU3xACzD7SNPSq9DCkB#$8yH^9`Sp1 zsXB-H2FBY_XFJM3obL>-?bl-{NnO|EfZNP;H#JRRq`-2o%q>qf?_GY z$c&Ul!IN_veuN0<8I#O~`T3WBh(NRe#{}5&$89DVkP9*zqp|3D?m0sX`$W-%(Y<;9 z84V0;IV=;{Zf*VduYcFQS__3tx0;5eAu*(UCA);BDO}53i|oxDDIVeamZ!aI>ISpb zHiQM%5zPe9v%~!{wVoMGXb=WB44jxW6o8i+>?0;yQhT2HCko749JFdFq-4n(zDpKjOL;9A2#2uQ$bDm8>Vh`1F zJotS4#S%t~8H0ak(PpV2a2+Z-M$1mQRf*3U1AC83e%4Z3cJszd0A6x3`YcPB**Q-T z$#yi&h%r4zT-M)RN%?DcuT17f4gEclkp5PV*BlQ8xDeKPHw_BFVkrz&J#x5=n z!x6TgE>ku;edc#DrC3azx{9DV@%Dr$C@#SS7oz8oTP}rdke0TKw245f7AebZ*ymMM z{DSsCx;H35>hL%-o+|R~$QfkP^Nz))-d(ct)Z1mKlt-Y6>*oc^$4H-kdPdD zl&=lR8I&Iw<3X+|kJW>)2yO#zIp4X`7c6KC19D9r{qwXdGXd=FcSoN++0(#`dWt?$ zjM)ZoL`yl&Ulb?&uT#hrs9AkJAAaN$$GJ);akBZgQGlL`)Z2~qg5Hxh?@0_U4`#TuUby0?qvm4j7Rb4X z?}r_oP!JPa!vAah2NjP(_yFq_0XR5^t5`_CK>Bd@Eb+T zBpwdB*e5si>EZN{>FCl_X3}&(YyS7i(N9Xd4^!Rif4NhF@_W&M73np3nf{Ub|E7d` z1LvYibGQBv25{JdAtWgj+Wh~*rXf=90Oz25{3DG2c_>g3lfytwC5n&a{I8-IAfn^I zIZ8W+v?ty2pL@bnfy|Tnr}6xsy!hk*=S0UR8M6ORrs@~5kb0=Hxn9|oY@O9n?L19%XXp7raovL* ziG`(Z`NvtsPP%DkG~m;cS%<0eoj-@6Nd9f++*;G6xaHtYmMU%dGHpg%QT z*8^O>L>}VzCw=w>FXywn?zeAti@H8!0BSk-5zwR$fPy<3ob_GK-xuuU8{lyc7Bva1Qtzc9j>TMgLTIdl6#Q zI*A$eX{g`w^6F*o_m1(_CuH;6{I6&MQZYh2d;4AqU_X1G6w8xb*h1Gn1v2d%MQUf{ zzn%G1w080zD#T8fN&b&FySA34#Z1O7jRO|h?g5+R#O48~TtTPt0OW{6j)=i?H^z*N z@!u!T}ifsUpkqaP$O3M;`gOTs%!JURr z^7)Aw)Qp^%0uiOxHh>BzAPY7LJWTDmP%8{la%Wk{FOEBLG4o7WW)s%+B{3W$k)6bPtVv${z7^ zkDlUas`f!LXYt4D&3jP}Cu;vKT$*rYTL3^koF)f%0M5GjX+M6^bw4u*)z%fTKc=oN zgdk5qdiGEdjiy9+;SzXHD>LI-8{Kst2Oa~@&I-{mLqGu9SH;|;Gn%}I-O8oAKva(p zj?;-2$(<`#;~)pq4+eOPKdL@M$ER)tjDX*!KY2+KDyiH83?zVIPGSKqQ?~Uh2;R;# zoVE{v2EjtMS&gAO3y5C#r$bTz@Sx!Hw?z+!LMS{y-;&=PwLz@lCn0?InT72BM{T>8 zt8d(|*Gl3!(+b6^4VT@!I+c>e4x91&MuDzw9r|V?mO1F}-Oq|p2xZE~vL`@Irh_Ym zK&MaLQMc%Jc55WPipSu^OMMxP36OMcT9xMgJfLxb8$AFXqmg5ICV@o9yxlKaVG!f} zH`Okyn_kn9Z+<6tPi;J_tBEym`2EXe%NXb91Hk1YmJ~fHrwm$z%4tu`PvM6gm2_)AO!o-)%Epod*{XUWr9)1Ek-a5&9~;3C+1p}WBr&b_(t`J4!}fM zVo(>_;~~k*M!>yDMqv~OF(B%)XbhKe1*~I)M6#E`sT+sRN}x?RL2IaMyiY|#fojRa zf*_PWlfbOPCLsXeB5%gWEoLcTn7{IB1JNVpDEWvY50)vX69K)549g-iDem%#T0OZi zLO*OpTvO0Iz{}+ zIt})y0v_EEN(6{*vKxb|quV`vfWZoZCxxwFKjUiR)fc{L-3COgE&$lXqz_gPhh?-} zi2km{k8DaX+AqQ1mp>y872J|fw?xJ{RpI3C4D%b7mr98osA!ByASHt2$UM=*@v<+| z7#;Q}q{x<&I5@=_CII9l?P6(m_~`eV3}udPk2{2ri4P1Fs;T=(M&O*kXZ6FLM%#Sf zDvq<0h_o|O<2UjA@Bb>Sp#mP_TO*jAMy-z7)H-+JfhcevJRov1u!f1cUOdkQq&2=hYs z6gh=wwO4#V$hBs+nfjoiHIbdC$=d#eS1HHPqp-AGmWq z0YWSu2?iz(^&sXTO{5s*7Jk=n^DiU6AOdU;=KC%)f_<(B6PwZ-IC+KP?t*kHEh9Z@ z_^``|uL{Dr@*HO}Xv60W$XX2aFH=f(VbOMsyec%t7<6Wf=NpQWJhgrdtQ<*C*x;&< z`qATe!p%oyh*BUKebh6Jf_9WN_)uCQc+4DUvXPYUD}TC;M_-yM=g75;*jyMqsEj?; ztO#AH>L(WAX-nT?KP;cZ$7uV)1ta(7*+0hJhZ)GD&lhtNUKg_#_w5?`m`B7Q%x4<44H~1wZBVs z^AL4e9U+O#{;ZK)xSK#pWQYmXJ6S%EgsyI-61#0E5&5A^1PV0w6lc0@N=MlbKhFxQ z8(hBLQ$e}`m$vP!tm;dGhEsezA1d>^rSJJg(lqj~gHz)!&aw`gX(p^K9(U;)j@o~l zz}%Zgn0!l9Tsv@WQ=6`7>2ylzLrH9qDL%W69RaleBVhizt~!qk0Q1&cA%On1i+&SU z$+?qb4Sg)+j{Y%0GYs@(!%`Hdlfy}30Z5Wb)1V?H@p4X8OSO9<-8_Xm>qzw}sRlG5 z0r8}U!;4}hQ6DiAk`o)5`0a$8RN5f~Q2=togFpfLO7-Sooh28Eg8nz$Z8$Te^oJRZ zo(oUpdko>2d>d>QSdI3}kqGVpasc!r(^}rBqH#&rh6g5I&f}5G6Ho%Q^^1QFNIDly zOLowM95Y*UNYJHuN89f~7AvAiZ{vtVsqeeIm5d;PXL z&oAEkk$Pj<)%)F>0X843nVbMF+$AhBHdE^Ff+*3=K=E@4CpBQW9DNyC(fU^Ru&{**+`r}JgpHL? z&rlR~5rZyz^fvC~R8XIae1y4=t))kS%Oyxw%?(c}}d`hJp zC?$#CTl_L6AtEKPdhSBNIZzrCc@c?qWVram5KG6V(7Xy;yWg=Sc}EnC4-YVm<% zpKlTh882=q<)|HJ@#X!XCc-k`*4RE`U~kr$9}}L0HbsZsn?8>A0;)JQ)mx{^BiKz` zPX#r%ME)>p!_KGVV$DkWl0s8IjuWj3f?>+bO$LQcN3`M&%&In8?vd^0r|;x@Ye1(; zi}U@{5oJATz(ZR3j(AeD&!zOl@sRoTMp~4pM{;+Z$c!1;R+Nm@V%bAY>H}PQz`cc5 zu@5AFizR+RmY{1i?VD`oH|W8+!xLsZ{I6ff$U;TrF!i%Qfp~D7{A6~n%OTaSkm7~` zyF6FBb}4|gI`pD~DpT*E-oni0mStL>Hj~6MjvCI1^uHR8gsyvqOec}ewGYE=gUU}l zQZ#Ya0VnvYkXUz$-Ryo?hChmZ2SRYcn=vyN8_~z1cXe8mS<``BfStg;tsC`FvOe&*vGpqbD@dNOMcGv5mc2z4i%jO@?k_9t}Y*)Srd%k z`9YRVpGOIykgW4kzBlU`S=Ak4@ zPw2|k*!Eqpj25XT>9td^p+@5J{qfS!2)t&8Y)35kyu4#0jxGPI`Aci7g3uQl5`Xme zLYd!YC`}RQd^%IxfTO(k*5WICOW$Vkfw`9`D1p}^&f9Ix5B3ua`#{X8geK?$FQK5* zuKV?y-rT$p>t3TGnhUe7yIalo2gmDQ)j5(QL}td;XM`_-#hK zJ90BJMy@$jb%p#3#_TFzU=H%ePI^nWhWZ9wiqTA4{)ilAbV6UA4=>>M?yqa zTKWkwrFjdbAVwA|z2GA!kUyo**xJTC@&@(g%ZP+xRX(sc@dF!!x*^14^3v1qm4%et zhN^a)ZgMe^#lSj$k^eqKSyto=bTfY^oKQ!)IZ$xsMwFp+9WL#IjvGn6`%$>Wj{op$ z2mTl!tnkPFvUdszYjsQOeZMfVTNp&iF(apUCU8og&VFvD4j2k<2(g8U!xu51XXB!F zX7RlWqkt?V;eo6nT4RpeDi=G#r{R5@CH*1^VyV(@{5m_N5eRz3(+gQ#tKfSVM!Fv7 z9AO162dmw9C5Jv~-`Y608sFUvkfP99xRAs-RylH!LyByRG|`p#$xsDBnWUweWsEkd z?2iYnX|d4k^mNb$fTnL^Omj&;ha#%m4Z2kt%+rbhcAa~9f_IGFSMt^9P;c(H5jH!^ zbIA=C&iQNxY3x&mnk9S+xZv`upMf{5vQ$3U>|_^C1dc%oAyE9Y>~MJ_2@r-uhz5eg z8V;qXcua9~#$c~ycv8cr7ub@d9q&&2#bmsa0#^)KAk_sFMkR-#GF!{f2BZjZL;xklDy284fp0)^SFzGcgFf>E-j%Cg>bZ9fkw!|{M(?}jWEMo zt_J9cz=h`rU2fR z{PBqm4lb-9cwC>8u@=ixQnpu@-)=bQLp(=|8{1Z7b=tkc)4JRnCzXw!{i-0R{VsRD z;|eB6i~|ZL!q-wim5W;H#qz9|$&dM7dV5$RwKo(hApg88!8+Ty-VxD&>z>8QI#pUVT9}nfBWy~gAmnQT6A;KWnNh*<=*?~fOXF>GlpFW#J z?+g+X_L5}FH>gNk#s!2aKw|d*t2E)5t_c%#a?g`g5hrMIx;62niq==j)QkF@_(Zwt z6q_VRLjSIX$TJ)^CmL?=h5hxDXn`yi*mv`!6r|EqId=vkr9`VE>5ki`_NpF)q*X`c6tI!P_;~ohHcL0?8{)KqcbT;$ z`+6r0k9AT>-BS?5qs27nW}BXId_5PFmFvs(hSTal>%$|VJlntmXAJ#IZ3xqRXtFP`Y!ixm;$mZMD&>2cg4ur#oYtatRjX+L*J za=Rae+{Gfis0-Fto^AFE;q|H5(I@(5u+zz+&o4Sd_18&2$C@mjGC|?j(p&hm+m4=# zclAQK?;CaC5Xq!#ZOgm~RK<$|e*kR@cf6roQ)d_D#2mZbm$nl5NTt87lD)LE` z;Lcu+H{fo{3gxUE%ztrA;}c$N-0>LNGo=>OUNGP{Ov zB#OR6=_~@KWMR9I)PY8#=Zt%_aBvw+7DD1z7Txf<--y@6F5h!-k_T0IaEkQPb5R)1 z{+}H{Yv9i2$WsMgXv3t53XV7=;=pSe3YOM=Rq?Y7aWPHT9t~3=0vB`$0!~iluSW&! zI;tP@yK4~kV1w-$NiVCtj7fr`Se@iC zglk85VX-M#E&_ z(S8}VDjZ%nrhgD(m3W>RpNnwp-wCXhxl(I)edJsgK&7A|rYN5&)O(7GOGt4m))i(Yw*R{A1ZM%k>^$ck=i@|Zk#Ph&SdtZ3DPhCDdYPQmP`v{ zon9VyUYR6)nDW!lnoa#qh|;>^g~1krPqs2hka|N5ks<8)Y5Q@}jx9IpnY|CbX`V*o zqTDMz^ZhScdJU;l_9RMHEZlBg55DMl(25wt8H^cw_`AFWqA}0?AAh|7?V-r)UXkY} zI*Nt}W1c?~lqBoSV2dFkpKE2QL`lHAbEtlgLYweryM{F4(vJKt95KUtC!)|z6(Pye zhuU7k=^ThsoO!Mp{`Kak9nzWI!=LXJ5e?=cG_*?`uB39Pn3m*NF{Ji<6kfx$lPLF= zw)=XHLM&c|C)Fn6-O^iJ0Ei|U9e!=wG78sK=d&FpSS+T-8Cq2TXtkt`RyrG(GX8Po ztI*FC(S_=y_!8%Lv20~A3Q>89f=iVt0!boBMsQh?=VYn@C?239*Jq8Igy>tvi^KT5 zs8#^ZTgkK4{&@TQ(5RR>oS`B2nbq&Q)_2^cJ=M7UelAV= zbYWWW6gT7UlBlxrz5>ZF`|&mPh`4!xD63Lf00NHNX?^M$3ziYRp?V+qVe)O2lRo8i zPN%sw=v3j;&-aO6eg9xVZeb$XA=V_No&ay>`JTRgi43C1=NuvY6w1AZ4fZxt#M?qb z>E>~qX5j5(MV(L0peDf7qldb4JFHZogBMVY5h-SwVVJDbRyVemK!{Xcn#cT#MhJ+0 zr|^wi1^a9qD1k-Q4v5zp+ctIY%w)2wZ`+Ae&$D4>RU(idL+b(}D3?u0vvo~zu*2yQ zQ!?1U*EJJ^p-Ow#geZ8sel?#Hu~3As!xcg~BX+Sa_X3sVvw)DyTL6fksB8i8nq~NZ zrOJu_5<9`x?Iq7i_d>%PzBccrBsstD{3w|=_zh2yRgpTd8DA_!8wbL&ZW-r7%VNAw z%0%)}1di{-8btftfN|}!Nt*3{Von8~cv^TLlLIl8NgW=^jwjv^-l-@B^eFfS_t3(H zC9UeXX8bj003v=jGEMzTb>zVL3CCbXmyBo zZAAfS$|wRI#AtytfOdzm*Ow$qfGfqI>mW8u9dw48?E(P!aadzoJ)!7uayNB9ES7Lo ztrHQV^+xY2n;z~hC1&|di8wcvi!9bY6l>3RGNt1x%b<_20A%%icHi^w!&QYxb*^AH zsz=02l*wW|zMzcAFV96~h;}0o7;7z1pT#3{k2uFAb|BCK}jE>T8ak{z(k=wzC=0l`qz8uN+d!R1Qm>?m1I90Z1ipl0;&`+vREn& zBL}hs1a9%?IY;3N9TkbNu&ds-u|VP<&f%~xqt~>NsFD74sJvjC*$*T6Pr1lw;!i*GDVs9ze;L9QO`tLL5GRV-~v_&o#to^AWYuuM1 z*s{cS90DW%1`?AvKNXECx-D|4PlnB+y%khWNaKheJL#CM7#6`zgVO&E5QD(tkdx;e zLps?Ungv)60gQtdp;D}ir56ul(7V9nCn|mpr2u=<#J;mYsHJ6ECtr(2kGNB6sl_CQ zA}z%XIT2?d;ikB-l(9(9a5_lW<$i%D>XhVEn=x#VS1At~%6{`AeWzDyTzp#+<$0!P zeJLaUkF)3n-+ZFX;#o~mMr&uE%LHi-+Q0_L<3LQ3i-7WgYx!7bANrgfk|>&^AIJmj z$)%8?h-s9EhdsC9|8jWDMi#ml&HC#dqAy>(nn0y^+Gp2KzaebvdEKvGiPh(gzrPB zqjcDy{h27%+gZ+e$SKIzlPVY5zEji_ZJp=(F_hNEUX1yGbm9Vmpq4fP2^xZ-?UE;IM=e^pkgLIWfa8i3gKY1*~jfF}t06~|G)XLU? z{T)pBQ?5NtAsndV;Q0Q_I2{#~+Ym(2_5^+F?)TVand(pL+^^#iMUI<)agAcS;n;Yb zj-}@|hM(sB>3Nr2Eiu66gaYWm+>Jx7mUozZA`6z*7}U_Dd5=c%Hw2|VYb|yl8jY|xXxPKM)1m|4wPxg36?SNZkAexK(w2P@)L&dZyPWJ&;4N z=OwW|m{Ixh*tvd z$m*R`s!tOww*{H%_201yJB$-M5 zXbIi>2RBQbwBEK5MAM#)^vm{H#zzO+lBd$!i>%!u6t3BjWbS5}x+x20zY`_{9GS}x ze8cSPK_#3J3OO+$E=AftqT1$;uP&za$&RuTQRZ#uC}P_hN1R8%41_M{ZRMrg-SZ{< z<&g(`=8$G>P35>jj>-!GPN*mpFPmjMgyhTw<*!rQpSeVY<$*2iVP@*jtxbTn4cT{( zd*o|l#(eSIVMg0AwI(|6G}v^X9BYX&up*+E(KoNG_|{U3nvlLI0WxU)2E$vVA1iWR zhOX+*<-|U`N1s{iRi;=t^ei0^0Vd+Zw^wm*j1|dxi^kEv7JkYd{fGnc{8Z0YU{@}e z#0%%*pJ_@NSr&vr|8RrLw!P%8?K1lL-sC+2ccxM^jrmzC1hQ8*bUWK9HY`u0mjc4V zJki@%Gr`OOrtDwp;+1ntw~Wm zW1C8ar}(hK-4BSjD9h+>r(R1LcZrG_qtMz(>RzEC-Rh^Zq%V|_r|b1x)%4aw=}rr| zYvX&2rwrxnX&XPPLvnWc8jn#{#=P^k%)f;VD(mvI6mzXHDVNWG?yYWQ(`XOaM=2+P z@IqOe$4_3uf1s)(Yg+Vky$)mB`DdaKF%!!-NG#km%uv_+t3&=-nEtfrcCV198m`lI zFtC|o(T*hBhltq3fGUpK^?U5nUS*k=TVb{6E12s#b6Y~R$y6mzf4IQW8GDyWVMScdLVASOkVnXbg2arCfsrpfv+)X zmN^9GXepmCXP?74j1zlhw0`r_!2WcwnJ3vQe1eKB0!7Y;JLBTF)!mSWYlUSnG6%VoLhQ8Yz2>j%kl)6e$?2RLrS( zbe7+GaWXrA_~?Cdf)~gXYq(ApSd1!jXZo)dLBlB_X|XTIc~m){>)m`6TvodUw*26x zHE6?RL$J%8>HBbEs@#6j*H&bdi0J>6gr(JC%Xg+NI5iy+gVm{Ja4~lSKQphgN`t}x zi-wUx?OiB&+LvYuHCm=ns;2yxQEy?9n{^7tweB6KgBllLjBet}$OG*3#9N1|(Ms19 z-LHhQfwTQ=1l8iW%Hz7NO5^Pep+h>+sFOks;83}oQh6MdgAzg~sEK8+jSpWfAx@zo z4Y3D&Nnez%Pn)2?DL|l>cMV=s?1?4>quBn%OZe`!wDk9rxl+n0x;Rpo);@W4z1UY6 zldNkcioxNvsX+6zR0?V>NC#&)wY*`=4=HWEWpf55NvIDcTrCbLM8K)V&9~FIEHDN8c;9Gc@!@7Hrq9?azvwG&IN*-IdZAqgH{MS>cpOrj;y8 z&;4Yqcum7BuvYNV(UP0Vwj_o(hK^uU8<|L!589@J2o>n8jid7^lQA`Th~S8IxqBrz zI}H;EeE_rSo6m*}%C$0rgMvDXOpMd}3`Xk^%;|QM2<|fE(GVy*R|o-p#%Id494B5C zggx4|7{4IAUMJ)|Et81!=-1Uc4N%EGw%#u)5l*2c2tW7qs!!**yNkyrabsy-#Me*Z z)K0#79f!h*gO{_NNrKcVBO*9coi}Y8-F>eQNs%YgLR+fbXme^qB-H5U(fU!4s1CSM zqTj$FA-#VS_$Dk0*Ir2bucgGM?9KJYP{f5ckjaUk5`F5i)~CUJLAAL{SiJ3C`zRkq ziHk2sSE!cBW>sy;vxdMAJB9J{_Eb6vCxwvlQ`-pI$SNtWUbBGNKXW>Gtg8;($X6|CY9a?@;)Kw|tQ4NYs0uEDEx|5+G zuV+`-&cdmh**Kr2_TJDLR(hAc$D_TQQ};c)t}IO|IfRp;xz8)TW6Bw%k}IbzR~p2% z(7LUw{ zwX-886G+9Y?+v2FcN}}lWSDTbws^>+T}!gA8FU3~e{)}9n7*JGx9!x>{)Uz@LzgM( zX=HFOfW9|Y?zRHGwrf(9i(MfwvP2zFuvSd$)k>ZvF{wicun?eVqIJ7@oA!l*av|{3 zhm+h{8^)#YCY=>z>Z`OGS79g~*qo`^?TmW#<#Lni+h{IUdCRp1=S3ke#t$#h^SxC^ z=fT&p=J5rRFxpQLSndT^@|XpK~sS%{)^39Fe%S&1?(|%ooYiSTRNh@ zBZR>m{fAa-NGg+Qd&O_Ao+03+$WG)njWY_jp-!S3t<31iVLN%X%=Mn#X14aGf`XrR zo?0hM%uf)YB2v=*9GFGV6rR>oMn-0)*A2xv36P3aOQqnd#J_f&=FCuu3$9uS{u|pd z*QJdWOs^w72{K3DO-dPRKuAr;BvasV>3i}1_4hhVlC2C(t$Fj zaZJFeX*_C`4SwadU5~d&0ViE$3D4xQ=$lNuSE9%dWE%}pQFN#s8+!Nm?+q;|lY{i> zdaYoLYMbR_MuXLOffb3qe}DP7+AxK;!f;|q-Drx3_ui8^Wvs4?s(=SAk-sF+p`0JH z^*w&KPb{*_b+qbnrRe`YXODHzyK|lJpf{fM1%TVF7|xX{3dj&qr>)DHUBt%x4T!yr zkg}&``VXg(U!%{HX-JK|KRFlqnZM)TOm;Ew6i)LmvS{#VT`w7bq})oIK>rX{zPmSa z_LCpvKs-w-aK~CXUpMUU8HL)~+=3iV7H*RUn`*-Omg{(rqmec26lTqT+2s6E57W4B z+?r(J!w;g-CX)?2DEJiv^`^gv64qj_PUre<_era#>*p1* z3MBXmWSj9y#bE_yo5-a&$5fF+W_yas(0TDZKq{!B=)G)hWeKsJSvr~Aiz3ucs}3wY z?2dA9&bo8$nrcohwx3*I7Inj;J!t7p9cvKfP){9V*!hKSQIXZVLr-*1(KVa3&uzbZ z;3JRbjt7)N2&K<08Q^Q4{N9o8_F3<>Z!Ch2ALbwrS8UYbmOp~)eoqh)rhq+s?6KxX z{kR3A^d)`@9{YGVd#8S_xI{PUU5as+R-kw7lu;IA4+|_>U=VH+H0f>H5Yvq9?A8f& zM)2h2)sfvs_pH~9R^eW`HoDKu3udptYTB8gXCl*|7Es+M~07IO&`7G zAhEzqU7Tipr-M9Y(4G%iXC%VO7}2Xm3NXrM?ueSlpFr^Xf}ko=_r5c1E1^W^|a;n zbxjhFQNH|fro{72XtF2gKrLm+S$va$xMCgq-LpObj8_MW7WLex-p!Tw6iL=ZUH)7* z=@w4;W3WiNMzc3b`uC}e`;dr*&PxnVM42I0Z21PKXK&HhNaExMnIVITh+r{42Xx_t zxAJ`&A3WJ4Fn$R_kq)%jz~P~(&ea-WKjtH9HJ^6MSV%}|S4RNBDGSHO7u>=YV{Zdb zzKNeFSVzT@c12M3S^Vs0oU`5b_&whe^P{?MCZqKu^rra0o}1x-B_9BWaAy&2ylzX` zHOpwq9R-jf-YeI>Cab`a&N zjWkbQO|EvBtg7FyHHbK4@p^^u^Wsf*H|fcbn)g0i-0``5R3>G$Kr}2=>7xh3D=@;Y z{=zy6g)q{LrDSC&<=h25#^;aSwD_DKeP#QHSu&pgLNi?4F|mq5Cow$gr1<5^A!(_Y zNQ?NG(_q~Vfe8j@P=ExUqil6c?2@*xMSWnt(hkk_o0BuruF5QYcb6dB5{NYmMfanS1Yj?OmIlnNP3t50P+0# zwvICXQmzm-{JiW z65KGXHydNrJ*2Fw7kiFr4meOJjPEEgnMf&2T1zS!W=uAcEOe*Rh4}R%LWl1|_tzrZ z1dFBf7$g=@BS~h?e`z2pjdVYwsW`$xR$_9;UyAqw3)QtdEnlOPqixeGrZqAlsg)CE zLbf4%#SKmEG17J5+=#TVN%oSXN1PIF~8sC?5&)V=mk9hkA2$wrmyhU znu@n2I3>wVpo$77m*e-2a-O5u1XBAmDv8Mw4pciPfnAxqZ2oeY`7$*@f{74Bh(B8$ z3BpWbLgdI!O^`vTqaIIRbE^Fu)HX;35UL~4=vCf=pfSa?1%1~tEj}Q84e@qJwXb-Y zh!=*?!tyar{`+8}r02XDy{v8G_QOH?Y|sk_qoz4Uc~p?IQC<#Lff#IiKj$`N*FV${ zz0H=p*)%4Q{DDcHKTLn}3xQ{#|L&oOE!EDwIs6r}g#bm+H@|s*%v1XIc7~;2iwqf3 zl^))Kn4DmdV&MhsAHKGOBsI{%jYGrrmq=sF$y|q!zY36HQHN|4|`s3l){4>JKLXo5NVoJ z`<>qB^ZE!j#-l2P1dY&HWvLt3*YK-UA(3}-Q2m6KY+;w+nX*@*r2r(RHlNT>agogT zrS#7LBTORZqW^Aqgge6#L51<#N4OK_dqoP$kT0(*LlCi|%9@ro@X-JPEsDU9B0F;r zt{Kv_NYoPLw9~g(Zs4Xx142ckn7lcL@0W*wf({0DiU}gYe+h-X+`_vcB9&fs)T%ffnNzs5pdsc(vgSlS&&SN#7BSGIu)#WBu567M?g)8kRwf{cOpAZk=i-}S?~p!jy+Wqct2^CX0pE*A zU@Ml&fOD=jkmv=NwN+iX=q5i75L+Ti#Ts`WBdON%5=Ox#mQVGQ;@ny3z#%F+PKF<} zT&<5LNl(#bmA4UWmvp|U!VSr5bTTj#Vn~Mr!8Z&S>9Dy{vOug#iObjsNg7IkJA>^g zc{|Nu6DLo^k$%LfynGX%8v2S}Tn!M(=8ec&MYG3d)Rm+&<97(e42^RfO-V#e!6Ml5m$+ zm|pigh+(zv5PX)y_{{$`-_EJB+nYa)yjyiFN1%8uGl!OJ0ykMCSc;K1)+_hEq~K*6zud zea3Fw&XR5pTyOCtwvn}A9ac!mtjwS+>cpu?riq&3JrVmeeR zz3~d4mypC6PvXi*KAwQZPDzu0iq#+);KNAyCu^c`{)7*;jY>|O*hklKn=d(R=E@`; zU?_PYB5!1L!!`IGTNII5K*%^yi+fL$M3}ifluHBW^kFyK{j*bACBNaG zFSI_60j-)XJ7ne+AQc*$&pMmRN3q+#LKtUyDgpG&!~y{Y8s#d*CcHr%xiFc3>Z!V3 zon&vxXgcyNYi#V)PHY=XrXXc`jdKQ@P&RE)5geYW}j|>HkUqItSTFl=C^8QSfY{< z69kj=2nd%38yu9HXJztYF4g~BqGL$=)WCF_uGmP8UZY7~&!O4_q`#eTeXAVC_BKWG z;d(@t<hDw}x%HZ2v?SBUtpszr28!cL;$VoB+%4`bU<D_KVpbW)9knuK4<)l?EZeE zB3gN5mS01*HF;osi~95GPnSS$O1hUA4lhn+iipUzCln?6z`%{-&HubS15%b!pJ z0vV6^is>V#Fu-zoG)}JG2aPZR_#BlSdCY)^2-BKk*B>O*^iI5`%=0Y30!(M57=N3d zcR{0EUcIi_Tad@GU@%PL_rPgc!lf-2hI@psO=XrIzV*O11?u(o1L{kBnCeed6ylhI zjnD|Db&*JNEZae}cdU4DwzLoe-cabnM3(*>6>G@qryI9eq6NAAGw9!>t2lfNthswU zopl}9#jEf{^>)ZcmF?T4Xmo*i+q!^BvH0(ydyMJR$?8Je6^@^bY^DI{i_q+|I*mg? z_-NGfBP8`bGuB=Ckzs$Fiz^~gEz`1oE?AVZ$D)hx%ivdv+2=Zo|7yK_kNJMgF|!)= zdL$y+xr&DQ<;j0#8#03itPJ$|L{s;dQ4$rG0d)$m3#=7J%VJ6|iLUj#6dk3s1#jK} zqF?I~aOj4#tE|k2@~jbQK)<_)FO~~Gg69s{A02vv&oxIgRu5>0r3Y%$X*%O4cYJ{; zZ22K9ZL*{axpj5z(E^kfud_VGE$M|+^59m?EJRM|?%phGv1o9&M_tlyLu!8vT+*gn zLR7%s=)ZPl)O}YBjV_4 z!+W-5A7#_N?7V%sLU;GqW$vgI`d}UUKOV|z9=B@RiHUu=(O@1=-SB7;&K>9Wwvhs{ zm^C3Uw4nLTt3Tc=2&+mfU~gCu&PvptmSs!z=1JE-p*pgUK<<=|bfv*ms7V>B8>SDd2_Nkw$J>fF!ldw3CO@7#n+5Xx&f3D7nuC_+3YR9|!zS81i zcKG5adQ}JqtCz*pE3&rjTjh|^yk;NUzTb^xDZMTT+8Wnzc!?l}c2MdMVx`jeF4aUl zUm|?9x|Ouw%Q7@%2M*%trW&8%ogXFt@<=#`4vmUK1uqA*4wv8t4hm6h0v1^PYS`pR zAe1+8!YeZ6&Zi}tooJ|fu`Z;@QK`WzJ$I7}x-N+_Y>RPWFx+L!X@29>?>3@A88|3Z zw22zdtl>}UsPW4cI%fN)Nj0!gp?m-`8m!bH{0h>LRnIlO(?#2#2!bJN){tk1_H-IdoH`WZgVcj(vieszZX8vHiEOQx+NrD zeQ0y-sO#W7ZX_!9eF^C2Vk$~?AlI@e9Euig$b)8}_J&mXfZcThfV{W4?HWSAbiXjD z1#U+jy1aBcGx9%^-fp?Eiph0WN=!)x`61z?tY`MP{bFNPH_Di5RZ8N)ZlSG?Qr$(u zkx}I*1if9MGBixhiVhp&XwV>=mL-X2Tv2&|_ex9qzWSVKA}u$7+potRo@n^>r5gsk zwE?Qz=w#HS51gwVR#+Xdhx66Z;`_7K9cus7dnkqL*4s?cbXxr`Rp>zOhuM%WH+fSF zRlOD<#3I}$1yA9#@E$#s+2!Uv7Cu$2`T$My3BDaDd6{FiiH&le@DL#AnYjUkj=|Z} zMOa|a8g1hJ(1h$0IH;*wB`OaU+hR3`&(SK*f6lxZ4aDy%CCvOL`QOldPfM9jGoQsP zLXO^){7*YFqva7yuy^&dua%j|HCV8&$LZMusoa4mSUar0*ETErRc$VY4Um!%dS(dg zp&Ykwowjke%TG6`B`c4)I%{*hMhd=@`_O}4K6`L2GjWqtSJSHoZB>oA`rEgif_B}d z9fT#(=E~-CO@jpd0w3*KB4(UlbDU%Xyc(3b z?lpLO(Y%F_#W;})k=(_MNr?$$)CBiiTkUf3%JmF0D%n3&WWfnaZJYpi;V2jURW#z<`SX{is;dHqnU8lusc>G)eOK1IIE;JzSjo<8oiq zvlH|q_e|fWZfBh?TE&;nc|m1|v}PE7uk9|Lhj>FCwjETcYX@F01)ziyLqj@_gr&}W zYQSU(A$YB*DMsPkys*el&MHe_!BywiHO3grD<0OG z-p#AW=i`P&v?(K(^eiV}Et+4>OXY(6e?z1a$6`Uy$Hz8r2s9d(OwWQbYb9?lzNLM{6@aOli*|XfDKN|1^H}5P zP=d)D?vl@qNsW&dv6Z5uJdxUob2tx3W1Y~|`|gVts)OdOP$wRQUC*M@)}5bQD2*fZ z8iY8!+22xhMAI@XktlZVcgRR;508JR-l40k>Jj`-D%VttYgmY=qWi%9e#1?5mVE@D z&El3lMfYerpXLcm+$i%$!Q~~t9h|kIViAULU-4QgqO{lMm2l?xgvqy1-4b6cpM89I zkV@4Oqxwh(s&89U=TrK+VI(4wyiBaixwFrAjm zO;0LF1Y$)isi>HhJ)*#4HHp_POhCw5F!tM7WbJT<4qA&r_}x+ZWVDn$&>gH%DIzBA z@p^SR;-igobFu)R-?pcN#X+$6Z?Z)}X3TOw2_|t`P4X>-4xqwyh5(h3qT?)?J-38Z znj%9dOJ~a2!WA~pIsoX67$XWenm!Qy^3uG3m?J=(gDsfAcsU&&#t6eaPnt%)7VRq{ zmQ(S>ayS^A&Ps<7S$ASL8_bpi5_;YMq+%0+EVIXzl`KDDCYk^duSI>Bk~X-6t>|7l zurClDMTXQQfz)s5Mn%H9@W@|EhKSp>;|B&+s4OxQ|4MIq50MMH)Xiss3 z39x7~eHK!1`&qn3$H%)r$-t3ss;B0>r^ zYZ#g?b}oHMLr(&!mL}`5e!0Y93Mn)!eURuN*L}kV6)g#6$?v}5yR_3^_g&RggTYo8 zS?Xh&H^>Ax?r}EHLZYym)X#DyP`-stPt6ZBZyyg=Aa=)nsDhmmtQIdQrSd_CeU65t zUmDi9zG586j3DnEian?GgyTj!o*WY4V8@2yc+w%8r8aMDub~8^HdJGXo)Zr)w!!lG zhegk(jE#t$ybVFtF9(#Ej*W|qNDYKZs%(SWNV&t!8ue1-**YG2zl~J9EXwJ>QfA>k z&3BaEvU4)L5bA^pj8N)Uo^`b1bTf?!u9)&K0jBN zbO;;B?v$YPpLVj<(e(cr8VqrkT61ij`nLVDyP@zkupE}FaMrfq!*7BB0&M{UL~<&g zDiS~ZA)IT>G05f|5ie6$lUZN$-`JKecBmj(PQh_v7AG3HfI#Pd!PqT;e(2Pl8^0o1 zC0}9TT#t~1d7={2QL!13(w_FwCZ-NqB#n{kG7pf9P(kVNM8(#2pwEg~zZgQ6Hy>zO zFRNQCpyL2x0J;up@@y!tPNK|f>%nwkN^yf)S`1Je932HJ?dyh@sZXwgYI#HN$dg&a zL(jkBPgEIH@13IZOW&-ED4US`Q(DQr`9zi(aq%=g@_KUV(3fOTzAPu zXp{rha6?$fbo}hC|Fe$adD7Z|$qY@WNQ{mp$B+VO?1E14k+br{a;yET4^z{Yn5br6 zHKl1FKQZ8b%*vqMp@w&Bcp_=sDy|)5|jkg{crljww9mjrCJqa^XAD6f&^e;PG$>i zcYuI;9>4tiGtX0ATDmQzGp#o5VVkWoA7=$uzp`sc%Af@SG~L~qp;=Fv!eg;n4vrIn z6@@8|C5x7C!kSx%Y>fn?zmhx@w%$#6Yk40!vlb&%10mA`9B9!074XdaK8(XH2k%);1*)j$vqxm2du#9i}7ZEQ@^wqg|KZCp!${Q#$Eat#_|m|H4=0^V=Ee=wq( zkL~z^bh%@1fkCw&lD;B|n1!qTn^a43xyddDw&Tq&Ay_@U1dtAtsK=Ps0332W)|e~h zEN|i{NY&YU0OiUfQQLZ3;xdw8H&I!?`B#-lp+! zV_2Ot=`kX#>so0g_zICiuyCBHDL+x6(te}CxOJC%o!>8Ap;2Ca>nLE*sOo&3r zNLzv8qJe+ld$C=Rn1d!vShrrYY>%Gcj&EqcbxK9d@mo5NoOuaN!36uy;f3i;nodGsx1w+M{{b zeGEGerf?(DUZgXvR6Iu0=Q z(X?Z)81@;A2l8v>Zk8! zEk_rRvG^B=#ez=&DqC0%gp?twHk0uK9c$(8zq!)hV>_U#CpNNn5E@nEEcdaZ8r@<5 z$V}8iS>nq1NkL`~Y#oPL{$#RDp@WmYPWJ%%169g=mdXJsY#O7Q0Er`f7-SqNJp5$@ zCq8~O3~uP@A=nyMwe||2Sc{=HHkk7{Zb$=3J)uhpAnr+~NjbYz`a$YJ%Yn@`6(s~m z1g+Pp{mOa->5i>gINPq>=He^KFkTSJ47j~9aRI9=yXBg^(j0HLV>~j6={*^-t(`TN zmD6migXan-C2hy03X=v^iZq6FS6YtFuZ=dd*7j4Z?}cVaNH~Q691!nz9DO!2KM{m3 z%%6x9LrzDDK_q(g-z-P-@f|Bqp=F3#dRP`E9mXs2pp@v0puasT5P!L zV~2sGX=(3w^Lx&8bXVR$VXrO?P%}8w?2g)L+Y_rQ?PYLogmf|UEXFC9{KSWTuBO;f zxWgua&={D=V>CYnmCh{ubPAEMi=a*3 zXekof8peXH`jVmm4BhXX%wO_7m`E_;T9K?-aeunyfAYI5N%}YF)~<0O9&9&N>R~B7pw_87j^z?J8(;GBOZGy|;8h z@ZN?}u7qXIG8t}ozy=39@e5a_VgMp!Uh=IwU`3`&fsSS>7Ul4ETdDBims281IYF)1 zk*+ifQ{{+8`HUMZuq6Ux(cilCi`5xrvW%RW^;k0`4xsgxY30DV&WIJ`zqeYN{~}#9 zQub^iqe%Q?Myp^~8J3qb(ZT>{zhd30jQ61^JI={L@&%^(RAgcJWQX_{I!4|lBN*5Z zoEn<$@Ny+Z(Kd}TjfJ3zGTicONjcHmTeF;x)bfLIXT@626Z4TlO)4mqX?iY#K`I8}$R0U44Xy=X*SYY>K zwEcc{DnTg(fNP{H;*EBRLX-lMJ(Io-{+)Ikv%_pEK@0s``46H^>)71Qeeo)klGHQj z4sc6fP)O-ei~@u)+CdK6XN*5(DGji)GEy>76`iZs$}8+3*nh>eZ)BCih5HFom(t{s zL!FxtHsRkr^RDk=ml|K4!_D|sd*ZzNadZ(b8yHoj`e zIW`)Jel#+D35M~IWXAW+l4}qRj!N43-@pDktbY`>%~MOyQ*Zu9jI-GLj~KUSuzwi- zm}aQ3+x+GrsYozy3U9^S>1(FJ*bxijG94(HYxpd^iOex~6656erBNDV9({$9ZsB`V zL!xm_^0y^H3#^*7&LxhuF7_i;1OVG4c_tpB-hRn4_LAUqMBF-v-KRolWey3k)BN1o zsraa$Wa=TQB8oIMM`hn|{g1`<>C|V@oi4`VtkLfYqb`ffi!om16GGtL137=Mb_(gm zX23386jsOc{r$$rQEEj7jh<#Oiu;a2&&irVy0z;RhcHn=65rcDLhN&@?|wO%x=J%p zg-s*&+I@~|8-eUW>_9HckRqD;OB`jhjPcCfq{+N-f-C+bDc|MTQ>tXqqRp|BJ$Of1v?-Cw28yIT2HnK7I4`|%F)JF3W!e)~_+R7jK? z8v{UEiYZj)gBl+oDvEJ**m6e-JWlan4WVgCb3pYDiXj8Y09Kf6?_LpON#Hi`hlc;V zG`;iZ|JJ2MD0*<#>rQ)ppHKOXKL|5Ut`IF1TifT!`eYX?L9uTM=r(0Le^}_o`T9}0 z??s3@)6iOrDFw4?eJ8e>K&P5+uW~GGIGBd}ee%SjRUPenhj0(HZwRXwr%`Oh* zH@hVVnVt|>W;PC8&umAH~!DV@$54Zi1bYNY7Bj z0Ak9H7;tdF<`sV3S_Gg}h}9(=md5%@1fE^g@jV#xR4H-J~aeV8;fk(wu#yk7ztoIT_ zkF-hF+;|9~w+DbKA|HjvhIDwigo!%6vu?34xZsEOAXe@tA zi+>{pAQ8?4b}{y>jwGYSz#h*jL17IkCdlR`FJR3o1|*(PB4}$~I6>%QL|C-wWSaI6 z{jD=XKhqkM(VMSU%PVs{CTpHVc2x-v!*m#8s834D75_+bX=sOuR{HPNz@Sd_AFQrOU-(q#?oZ&rdLhU>#j=pdv3>iGG!)X!k zz0+5w8GHHE4Mk79x{h{y?_Bc9$Xd{)*)|u?;m}6c+Gl*F`#4=ytaQPZ(~0iO1}74Y z@zpf->H>n0;uFt#JVV)pWK`sBPl!p`Tqp!>N@JVTdks8|#5g|RV_
yV&-rdy5UYm(zZTyR+uup;#bE@MNN5~7l8jNdfc|3(ZCW8A7!8g! z#QV9XCA4{zZ-#hwNb7X=n~pm;U&>BjCibUpO>pNp61afly{+p*B5HP%eLfT;x`5@u zDo2z(8n6K=2s>e1Mu1cv|H>IAV z=L4yDIZh0NF`{fKj=8i&7{(OXTC8gSAwR5=PGAzl0BxPH(L~+2T?VLbkYCBU<@tV$ zVTNI-iD#SrCXKx+v5t;h!0^=0IO97VCXkN7++ee3-a=qh7ZnvH`g;~ayBqk`Jsgi1 zeyDcXi6c;a5G1zPcV9lo0LFXjCRu33XU{K>HmU{{B4t$(nA?mQ?2yXg?>k2e`9{5Z zPOy|(3fF5Um8zW+1M}hyQ}+?n)TGXuTH{^yFSg9WgI-@kRX_f&GR#5`ahSIS`Yz`o z0^47_T#d%yj8L^#T#Wec16#dldY6yaNJ~Hs6N+YJ8!+RAb;+PR4SorrKL6SFh7OOq zz_~5%D<@H&=4*D5DU7<%z3y#0iZ3e`mHbGy;0sb&Igwi2Rnq22$;8VqH%U*>x=k>& zobtUDCIov4$%tSt!s+dHF`?p+Nv#_n@O(* z`TT(U9wH-?JjaK1fd{K0SdX zaNk8#f-+&mqdh+fcqxZ1vlASrICZM1ykZSmVP;EVPj%jpVkNR6GE)}#4+12HsLMcY zc64sxZ7s%vo{!dv{5?pu4G6jhA=>DPWP7pl?$3J83&Gh}&9nT?lafZn?UZ6h$64}r zAH(cE81>o(bw*h~&KO2GCKwq8VDcsmlV#3YV9KjD#-20eTCqRqtMmc~T;9Q>u7-Qq zM~ZM*4*XFg=LiU%Pc)G#&tIogyOy8}z_AsVil9_zuYL^NH87lg#P)lOJH&0Q?6+l! z9y(@j&KXDTU&uY87?rsg>hP=biB|GiX!h2x?#rY`#Kwp5&_bp%T?E~ z){iV`P5Hr&aSJ? zti>k+MBHI&jtK!;WAhoh^huUi$km~|n=H$^1_^5gaLecf09>_ksvS3=T2kLy>Hru3rlLy`Fc#0Da_vzL5=I zL*RgXN}JGrntV0MEx;~J+?z{n!JCV3$r9Pz)QKvtk4-aSZ>p48+R)eKrV@#7gZ?ev zk2?-|Cvx4SY7==EJa5Hn<6(#D0X1`%>LY9WFecWmg8hFezjclwMwXlTWXSC*1_wW(qE727^ zy3^;?_UkqIKjYt6C}xeejK&%W=Fbo*j#Q1ibnkakOsb^q6sETxl}~grWXT zTAdysm4K9|Ndf0wW=cObY3PkpLSua-oh2+%0p3Tq-3 zIBcspH*abRnyq%oh!0?q*b@Wxc2$GzqymPme!L2+Hknyko!$hTE*%(2*+%4($$579 z7&d8GR;hfUsy2M{(OHF$-MJOHmXgAG1;5~5Z`oQYJFnZH`7`uYonIwloLCZ^A-rU` zUvmrjW}gLMi7Hy4IIwWe!_0%jY|eynijYO-x=`Y2Pbi9A8ass9FUj*XZ2Ft*?B9b10R_7p}zh|#utA)PaQr&Q6Z`tc5aQj=)JDsbdh+wPgh^fwGemS(nA z#*qf&7YA=KL%PayS@2*(iWCcCXu#eHnldjIE-d}~+UY?0C_9QoyXKrk7~n z9&q!>oW&t^Oa8njbZDinI4V%mDOB0Pgcxod!^}O0KSDBOJDSLe2~z&1Kl5E})hvDv zZ(6&YtzNmDQrANMFj@@nQIXrKfQyxFcY=a*(J_eE*FTIP>E!!7YS-SV%LZ+SI4eK(&O z4scnaf?3WZX^>-c2$Io)bm^inFT<1LxA~Kqe@mlk)p3!@4m~cV`J-ygf!;iZ(WYv% z9)<+%$p2D$S3U}t|A4Mk3-)8bABTJneLz!F9@c#3K%776FQMC<%-Ie-0h7RKtdV2p z=%g{Lj~xUmdpjN0L+q#g z>0CLbqA7XVn1U#Ql7R(hoJH1+N^{fqzI?^LLjes(5F{(};j#Gkjv-QLqT!}@G32!y zG=_-HIlFe>_zoNXaQ~t!3wTxjss0Y;pU9k$mz8$mS;!~frb(V)$47KS)XrbFB;mT- zp4_MPx}|PS8z|&=;Sq=cs*Vp(=k00e({vc@(yP2XZ^E`UBC*PG$0SuN%jO(~PIan$ zbIPH-saMb0*J}*)^_0!D3(y6&s~4H3F%+RMPA0y;(FyI71)Xj=6TNzd3J$S4Vi(BI&jYJP^Wer7|Qlc_>A|yei1{bVXa(~XAI8D{}ePo77kMx^sI-Ry0 zlPFZg9zSS)qkZq;H&N>;vHO#t^Q=#L?bzU3P-3^ZkGx05uc7HOX*_!Lp9()>i&kMe ztv=S}z6rNI#H&dskLKPI_YEAL8^AaAYF-Sl2v#MaxNqN7&O6JQhr9cq8F)D;!YA%O zfesCRycEul8AjW4Dy}dpI5Y`qZyyhL1s~3Ijl*6yun#eRotco3BLioqT1i{TLrPvBB#kI( z?m^Wag5?LZpt(-Q&u9D9-h5U{Nweuzf=hhF^q9C0BnUk&&MEtw%h_Q%1@@r<|AC{c zI^_CF=AM13><4?ai!6oFut!WfU@?$i)sRr3H1Bx72A7&DpDM*eK^vgbFZx|Od>3u$KSq!#@m2g&f zNy~YM25h=HN*YE*%ZTV5a#!Gr<0Gl)jI8*h%{OB4=4*&9_20UdZ$hiwAsHdt$LdaX zfD^uBF^r$RRehy!i!Az$Uj*5*yAriux%XFP`Rk+eh{)`Zx=iZCZ4EM=qqzem#6Hpa zG1}_UAn}wlqH^rM*_Vye>k3l6FaS-hl)RR`hz7j2@clJo6)v2ErEXOOBN3@wrR2q( z3+7ZUV4yEb6p`TFG8EG|pp~(eSw6Q0t(SObRr%KC^)ktEAivLz=@q)bm5u1_H)X zsY6$pj2lYcz5VMAE=%KD9&eRBp=Sj6v>Xsn@x;WPTz-;CdK+FI%^8`C^jTO1=XdDb z;Cf@6DvMJ8Cn1^8 zZh)>836ii3JGm~U-=Q1M4I6WvH{&%l<=VljCs?(2f5=orO@f6SM~qPSRBv~ZLZBF! zG9SV*lwTB*jXL zQaiiQdqo-1n-krh3o+NEm`?hq9F#OIsUJ4wU9Y<~&x1Ht+7>q8Mxi^Ym@uj$rW+Yj zGO}?~K|Csw*TAq5+c&1WROabdOqc}9=u(t&74PiC4u2TNd$6X}o0Tbe7FNy-TO7AH zt6R`@P5gRR;l@b;jT;)?6rAOV? z8SBM5yq|T2oXUqG>q{4g z8v8o?_iOHq!gp2F;ZMX5K2qjkyBzZt4nQYrTF#{1VZ`M5Q zcZiZI#)#Myu8A!@VB=3A>7YaL^zzB8?$6wW*`G#*3uy`Qb2SS)1G8;=P!~W^6O6&@ zGq$csYGg6q&GI7Pp&E?(_@)P;_sG%~_(-^PB#xBbi6{;o#nt5BCcPy+>03@ooog<> zvTF~qWN41b*0=a2ISJfHlPJnlM2-71E7QgRr*w7d9i%D8Jz04#vjqh{RkamqrZqIR zD+OrtFL3p!Wg5;9RlVv{Z;*o2WXzN3(G=2~7n#3m%*0{?^`!1lvUq$@w($faUnq;% zge>f1BYQ~?gSl4Z%#!|V-Okj0&4yx|wzpRlI1ZT2wj9doZ|CeY)rwTV%BA~wi%D(j z(ISGLqM1<5TAQUOJ8!CPAerk{3%(GE=hZ)hajtNsMgNXoAb;~8^CslwL3vr4#PeVn z$*hJrw*{XWiZ*IVD-@i=eM8QtWsKtu$Mxa`{iGj$QEW5yx5t$8wokoXzSPX!ROk=_)SriK_uS?f0_MA))tWt z2y{bGKihN)xIJj^vTvg8rlfQD&}J^SRJK@e&|KHZeI9n4nq8Ftp1pY&_&Ma)%-py8 zH5wDecMbZm@KB!DdlDmjbip$EgSIrk574`s17-V4s@U_;}HU=UTC!%rgd6QVIzxH_7SbC2b2t;g*;yJO|$?mfnrEw#)? zF9cH(&oF9u#ZDk_f$`%ObVIxDI5Z^_VPou{qV)}hBiA#NHQ#RW!gBV1qleAY!$$0g zyE*f!=NM+a@`mOG-y9x(>uDz>sxE%p7qDM``dJGQm25#r=@pLif*d4)V5L$*57ly`h+cq;rxdkmEa*_AL?oT?U#Va=703ZHKj6=?y-j&a6Cf zx9RbBR>|W#gS2|Dq>pSdAM(Q754c%zK1JMU5YuaQ4HlhSU9l$cix3z!lfG#<(I2tn zGx>HqS@Ju4SP62Cnj-1^V1>C&r(xa-g*_MiJ;R>;Wl|l*W4q~J8H!2C_jLvz+5Tvmh}!sjw|4Ot007TSSx#D~ zgl@fs^_8NiOiyQXuoJ(}Ccp6(uZe4^UY#=!GV?<1+^SobKC@kWvk2ROp{Ws)&HJGw zw9$;Xm!aqNW})VCT0**pFzQzyhpm;fPBHYqB1+*kqT1yhcM{tDewe9<Vm+6iU3IEDXX}^$j47J!-26Q7%MK0f zh4P#Vu#gEcbVzh{pfC}uQCGCJ?e~J8^GC7YVN$rW@eX{{)!ElbLz@8Dmy@L!br4n> zXt%X4HtX?FD8N2!jImKuOL2V?^|Q~Ha4=c!Y~#+R}zqy zT85e$gl>2FP4t_DW!J;N%<#ow_2h01kxuBRCOPk*QX%#eks}$$ZzB(kyOmgM6jZu} zGNG(o!AZRN{-5v3LZm&mB7(U}*k&z={zHj>NpRVq`N53;rdhfAXn(x8P^OX9lW^LN z9XU=b4^tHfEliXF_HMY#>QW$H{V!?6f6Q`G;cb?MX{9^oQyx6-p?~02SsCkR;J09i zKQ)rJXX_tPoAd_`@*%{eC%to2dWf9=izc0horL+-0T*Mr^7!0L?cV#pvL!wC1$;La zWTrvzWzOfOn&iXh8UMQaMDa@IQJvo81R4GsVds_>_q#HXC?Yv)OI9`KF;L%%SM2nT zF^M$ZBoJaii#%!aSy_Hwd^@rI>xIkhnoNoY!yVWCQwdBjD>ychgt#KKHJ^sEA@aadMjf2q{VBvdGXcAq1#Grei@B1PO-e)m`Z&F8*d&6 zRdAY`8#fC2RaRWCpAJO-E~mU*V|!b*XDTa_&$@mENyd~H3Wq^JhzD5_h)`Ef)k>7r zSyzWpSS>^H4GtZ<{OV*4U3R?qGNOIbu?|@wUYQQ_uJxdIraSXU10dQ?J6XaQKP(zo z&Z;_pGexr~D*K2rfiWsg)S6~9C7Pegu%oKPrHg(NRu#*yjYX4x8Ij-#vG=6o&xT`3 z9=)skXQFGAQ@9@v3l_O$M7&8sfZ@ZzkP^EC%vqF7TwvU^oPb7Q>lm8?=0uMaudxXt zlO1lojC_uCLZn;61e{;7G(b;F4l@exX69b;-kW64wf1*dQNH~A;oWAFX)gK-tiej6 ztRY*idqm3%0W?mqXUm&AV;4$qu%^`ib-8UVYlF>Ro{?h~$aa;$O-DT52ieI)CU6}Y z%UsML79HdtGT=_TgC<*focv!?XC4pb)(3EgG9rd#$zUW2l}48AV`(HVE@7@cB#CB7 z_GE};nHHjMNFim*C}hh{_K+6wo66aZ*ciJwo$1C%!9p_DrV*Q*2c{^EBwp0 zCH0Ieh$eAp>`lVeMn$&%_cXH?r8$%5l8oY2?Q|@pFx@V^MfefX-tTUOv5^@E^CDV_ zuUzV17?`?W+y6M>{o=D)>0K)~Lo)e08j0xoj6xoH6;alI?E{By+e^jJO0|Ww88dB$ zs)*>D6px!mG&Y9j5Ief6i_^qj4VMoKBkW_Izoaw$HV=Pt`S#^h>(9kR!%rW?|De1U zGdp^L7#8}5zk*NSo31u4e?e%FXA3*hjG>K@GF>o&(V@2;GLjn(?P|T9s|L9)lIm1W z4a6w%DGUsC71p04)A?gE+Vhb!nwH0@W7#=UJT7X$_$70;G(uDbTxA*f`D%F6@TK}zmIx&=&2A zDpap`z3Ew&+xIwtODS7Us{13bCLVMyGCFDwI;xF!9!z{k0;fF&$WJx?xLT!_u3D9M zfHJj74SAZZd%xacdnrk`TSg0VPIv3f6z@X1HSspcDc@ym^F5ZSco9A~A)ko)XwQb+ z>}5I@=l!aFH42k$6=vL&{{F$J>#R7YmTp!ekuvxa?NbmJXpoq~IrGBJMFN&8bodWn ztd{ycMGK`nK)j@w?7Tgz?*xidV+HMsc&B%_TbxeY@H8sQF#6#=hBX18=ez5_rss84 zKV15#lhmG>^?dClTR5WY<d02;mDBXsoXrq`bIt zto;%`e|$Nv5D57i*84UADl3xJWuE|gVx|5(2+}yHOL*#`&y+60Wrtxn!3ZGARvY_g zl}AiZ%#Bn6fOdNwa!jwT^q8zCnlS9F+E^bd&Kj**xe)|4ACWpVP%y&$tsuhq3l>g0 zj^*2xA7bN}89;@;PzndX1h7U_4}JumFl+_>47ns{|0j!TCWbTe?r2Y4M*Vq z^B8XVM?R^izhYXETkH2fL!;YkT+InW{>?xj)r~2;#y3_5Km5s^^^w|<>0f@8-|dE-9t89}9`UJVmm*nZmQT8Nj+>GF>x9c- z9c@a+X~eLI%Z~ewh9&Psr6VTtrX(a*V7tZY-Fw>%?Bx3h0&ndV%E4lP7q9*v2;ClDS{W^{` zxi^RPv0i$9`!iU@RKZ8fW2QXdC2hxf#Jxqj(O5+kr);uHptv9@*UI=EJA9n?HH?bx z`(Fh-P{C1D?|ZB?>Y%CpKVbq7`^&dZy3+${GD_ie|Hr?N4x_Fa`4*alEX5Bucq@Sd zaRq3YbnsK%U1)sFEi7IKgQd&1h=yrr$?C1o2`*+rKIYZ1_XTa&0om-& z5REbZT+AWq@^NcrUJqA~l^z_OwKeVe+-v19{2kvy)kUrP$9C*mMzcf1Un2tmKRtmP zWrybUnK(j~S@xdT0ILoF{M~B)voH#&P08R?*}Ovs1haw@h(xa#f2-|maph;6HDCQo+9|ll}_Lz=YgjGkP`INbGI{kjdb@a!vcnOVH zFV4O0g}W5s^kp6hu&;^lq|qJ^KWOOj#HWqtgW zJsS;Ss>0U6B+#UbuUt|QMm&~;ff6`SN={8N)r6cqWNyKBL2N5EeC8|S;d_u?2Vc-YneWXQ(8+X3i{x!t?ej z%gr)#jG&vvBfMqi+nA7zIER;`qm3a#$z5pu?K3=rP8Q7VN9i)a z-5q_m^xzfgc)sv>)y~!e#EFu4YPxVHBDiB0VuJ_9-it}2a%Hko1G67Em+$iikpOYl z)H&}IkMkDT>P~v6a46-co!@YSxr><&4eap$hA;`8P>ZQ!4;Edvl6$^%{lZqle+P%t5|k)ruP#r$5R9M{s_$GmKu*a!qj$c#NMJTd>Hw|| zTA#0k%2lI_vUiYufFG9~3`4&AUtR}h`0L!6Jtee@$_z(vVTHvPvhA#g=96R_YP#F| z?Z!&F7;?ynrK@>UnuU6n4q{reQQb~i&e&&2U5;XMr2ugz1d5FJ%b8f3)zMsj)JdXF zm5**+(WeJSm(R-DFGkzTa{6^m09ggWhUYzuV~oGP;+VaRiG@Cu>4|AbiWWfH%Zdbi z#JfXf4!#C{B3tFEey>;UTrasF z&Nd5~=yCsK*fdi;ZG8tX@VkGtBs&TNIeZJQ42!Hb`ah|#4Zx7_NE_+G_vdp(P~*;> zY76SbyBa}jKDJoDKVLR=(XL3carfwp24q-OJ{>?1%P*8wnL1x0+KGj#4R@=z$MG<` z-$s!1Vn5&OuPuy@($m|_0*O`T|YC)c;Nat8Tjs6NdH0h6RNb~JK zqc%GD2bwPk*w%iKvZ~jKhNd8@ERT~;e7XbEko?=$Ugh};KVT~y!)gs9VHfI9xwaz} zT>-rNV%pzs35lrm-VCQ;O)l)IG?Lfc>;+*TVHlK62Z$Wa$PRjeD%g5%N97q4BDLC8 zrAac$mbVxmEhQl!#M&nU&Ae7sSNg73muaTJBbqP>-1A(U-kO<-?410h;f9-@lfhq=DEQE{=2YBm@A0{ZQ6q^waqV*(z|6V)?Y3eXZlGRlGrb?car8s3@^ij9y}oKF;tQ}<0XcHQdqY+zQ1yGm8YCsV1l2) MX=A;79mlZ$0H*((%K!iX literal 0 HcmV?d00001 From c3e37f21905c12b81c4fb18a1a7ebdd4481c4290 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sat, 26 Oct 2024 12:27:32 -0400 Subject: [PATCH 07/26] update --- docs/Design.md | 30 ++++++++------- docs/Imports.md | 43 ++++++++++----------- docs/System.md | 76 +++++++++++++++++++++----------------- examples/cursors/README.md | 33 +++++++++++------ 4 files changed, 101 insertions(+), 81 deletions(-) diff --git a/docs/Design.md b/docs/Design.md index ca0ff13e..5c4db703 100644 --- a/docs/Design.md +++ b/docs/Design.md @@ -1,23 +1,25 @@ -# Design document for Exo +# Design Document for Exo Exo is a domain-specific language designed to enable productive development of high-performance kernel libraries that target specialized hardware accelerators. The key design principles of Exo are: -- Performance transparity: We do not do "magic optimization" that are surprising and opaque to users. -- WYSWYG: Exo IR closely models C-style code and will be trivially lowered to C code. -- Give the performance control back to users +- **Performance Transparency**: We do not do "magic optimizations" that are surprising and opaque to users. +- **WYSIWYG**: Exo IR closely models C-style code and will be trivially lowered to C code. +- **User Control**: Give the performance control back to users. + +--- # Exocompilation: Externalizing Hardware Targets One of the main ideas behind Exo is **exocompilation**, which allows users to define hardware targets externally to the compiler in user-level libraries. This has several advantages: -- Hardware vendors can support new accelerators without maintaining compiler forks -- The cost of adding support for new hardware is significantly reduced -- Proprietary details of hardware can be protected +- Hardware vendors can support new accelerators without maintaining compiler forks. +- The cost of adding support for new hardware is significantly reduced. +- Proprietary details of hardware can be protected. Users can model custom memories, instructions, and configuration state in libraries to target a specific accelerator. These hardware abstractions can then be used to write hand-optimized code or as building blocks for higher-level scheduling transformations. -More info can be found in the [PLDI paper](https://people.csail.mit.edu/yuka/pdf/exo_pldi2022_full.pdf) and [./instructions.md] and [./memories.md]. +More info can be found in the [PLDI paper](https://people.csail.mit.edu/yuka/pdf/exo_pldi2022_full.pdf) and [./instructions.md](./instructions.md) and [./memories.md](./memories.md). ## Fine-Grained Primitives for Performance Control @@ -28,15 +30,17 @@ Exo offers a set of fine-grained scheduling primitives that give users low-level - `replace` for mapping code fragments to custom instructions Having explicit control over these low-level details enables Exo to achieve performance competitive with highly-tuned vendor libraries and hand-optimized assembly code. -Primitives can be found in [./primitives/]. +Primitives can be found in [./primitives/](./primitives/). ## Rewrite-based Scheduling Language -Unlike previos popular frameworks like Halide and TVM which uses _lowering based_ compilation process, Exo uses _rewrite based_ compilation process. +Unlike previous popular frameworks like Halide and TVM which use a _lowering-based_ compilation process, Exo uses a _rewrite-based_ compilation process. This has a few advantages: - Less magic -- Easy to print in the middle of scheduling process and see what is going on. +- Easy to print in the middle of the scheduling process and see what is going on. + +--- # User-Defined Scheduling Operations @@ -44,14 +48,14 @@ While the flexibility of fine-grained primitives is necessary for achieving peak These user-defined scheduling operations can encapsulate common optimization patterns and hardware-specific transformations, greatly improving productivity. They can be put together in reusable libraries, further enabling modularity and portability. -More info can be found in the ASPLOS paper and Cursor.md. +More info can be found in the ASPLOS paper and [./Cursor.md](./Cursor.md). ## The AIR Framework: Action, Inspection, Reference We identified that Action, Inspection, and Reference are the key scheduling language design mechanisms that enable user-defined scheduling operations. - **Actions** are the scheduling primitives that transform the code (e.g., `split`, `reorder`). -- **Inspection** queries properties of the code (e.g., loop bounds, memory access patterns). +- **Inspections** query properties of the code (e.g., loop bounds, memory access patterns). - **References** point to specific parts of the code to apply actions to. Together, AIR allows scheduling operations to be defined as composable rewrites on the code. The language implementation guarantees the correctness of these rewrites with a set of effect analyses. diff --git a/docs/Imports.md b/docs/Imports.md index ca96c5b8..6faa7c69 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -7,9 +7,11 @@ This document provides an overview of the imports used when writing Exo. 1. [Standard Python Future Import](#1-standard-python-future-import) 2. [Core Exo Module](#2-core-exo-module) 3. [Memory Libraries](#3-memory-libraries) -4. [Platform-Specific Modules](#4-platform-specific-modules) +4. [Instruction Libraries](#4-instruction-libraries) 5. [Frontend Syntax Utilities](#5-frontend-syntax-utilities) 6. [Standard Library Modules](#6-standard-library-modules) + - [6.1 Scheduling Utilities](#61-scheduling-utilities) + - [6.2 Standard Library Functions](#62-standard-library-functions) 7. [External Interfaces](#7-external-interfaces) 8. [API Cursors](#8-api-cursors) @@ -33,40 +35,25 @@ from exo import * ``` - **Purpose**: Imports all core functionalities from the Exo language. -- **Includes**: Fundamental classes and functions necessary for defining and manipulating high-performance computational kernels, such as `proc`, `instr`, `config`, `Memory`, `Extern`, `DRAM`, `SchedulingError`. +- **Includes**: Fundamental classes and functions necessary for defining and manipulating high-performance computational kernels, such as `proc`, `instr`, `config`, `Memory`, `Extern`, `DRAM`, and `SchedulingError`. --- -## 3. Frontend Syntax Utilities +## 3. Memory Libraries -```python -from exo.frontend.syntax import * -``` - -- **Purpose**: Imports utilities for parsing and manipulating Exo's frontend syntax. -- **Usage**: Used when extending or customizing the language's syntax for domain-specific applications. - ---- - - -## 4. Memory Libraries - - -Even though users can define memory definitions externally to the compiler in the user code (see [./memories.md]), we provide memory definitions for some architectures as examples. -What we support can be found by looking into src/exo/libs/memories.py. +Even though users can define memory definitions externally to the compiler in the user code (see [./memories.md]), we provide memory definitions for some architectures as examples. The supported memories can be found by looking into `src/exo/libs/memories.py`. ```python from exo.libs.memories import DRAM_STATIC, AVX2, AVX512 ``` -For example, you can import `DRAM_STATIC` like so. Similary you can import AVX2, AVX512 - +For example, you can import `DRAM_STATIC`, `AVX2`, or `AVX512` as shown above. --- -## 5. Instruction Libraries +## 4. Instruction Libraries -Similary to memories, we provide some hardware instruction definitions as a library. +Similar to memories, we provide some hardware instruction definitions as a library. ```python from exo.platforms.x86 import * @@ -75,7 +62,6 @@ from exo.platforms.x86 import * - **Purpose**: Imports optimizations and definitions specific to x86 architectures. - **Usage**: Enables the generation of optimized code tailored for x86 CPUs, including SIMD instructions and cache management. - ```python from exo.platforms.neon import * ``` @@ -85,6 +71,17 @@ from exo.platforms.neon import * --- +## 5. Frontend Syntax Utilities + +```python +from exo.frontend.syntax import * +``` + +- **Purpose**: Imports utilities for parsing and manipulating Exo's frontend syntax. +- **Usage**: Used when extending or customizing the language's syntax for domain-specific applications. + +--- + ## 6. Standard Library Modules ### 6.1 Scheduling Utilities diff --git a/docs/System.md b/docs/System.md index d398ed5e..5bcbcdc5 100644 --- a/docs/System.md +++ b/docs/System.md @@ -1,62 +1,70 @@ -# System overview +# System Overview This document provides an overview of the Exo compilation process, as illustrated in Figure 1 of the PLDI'22 paper. ![System overview](images/system-overview.png) -## Compilation Process +The Exo compiler consists of a frontend and a backend, with user schedules applied in between. The input to the compiler is a set of Exo source files (`*.py`), and the output is generated C code (`*.c`). -The Exo compiler consists of a frontend and a backend, with user schedules applied in between. The input to the compiler is a set of Exo source files (`*.exo`), and the output is generated C code (`*.c`). +--- -### Frontend +## Core -The frontend performs the following tasks: +`src/exo/core` defines IRs used in Exo and other core implementations. +- `LoopIR.py` is the main file that defines IRs (LoopIR, UAST, PAST), and their visitor functions (LoopIR_Do, LoopIR_Rewrite). +- `LoopIR_pprint.py` implements a printing procedure for the IRs defined in `LoopIR.py`. +- `prelude.py` defines `Sym` and `Srcinfo` used in `LoopIR` and everywhere. -1. **Type Checking**: Ensures that the program is well-typed according to Exo's type system. -2. **Bounds Checking**: Verifies that array accesses are within the specified bounds. -3. **Assert Checking**: Checks that any `assert` statements in the code are satisfied. +User-defined features like config, externs, and Memory's parent class implementations are in `configs.py`, `extern.py`, and `memory.py`, respectively. -If any of these checks fail, the compiler reports an error and halts the compilation process. +`internal_cursors` defines cursor movements that are used internally by `LoopIR_scheduling` implementations of scheduling primitives. +`proc_eqv.py` defines a union-find tree which we use to track the equivalence of procedures. -### User Schedules +--- -After the frontend checks, user-defined schedules are applied to optimize the program for the target hardware. Schedules are written as a sequence of rewrite rules, which transform the program while preserving its semantics. +## Frontend -Exo provides a set of primitive scheduling operators, such as: +`API.py` provides various user-facing entry points to Exo. +There are three types of parsing passes in the frontend. All the frontend code is in `src/exo/frontend`. -- `split`: Splits a loop into two nested loops. -- `reorder`: Reorders two nested loops. -- `unroll`: Unrolls a loop by a specified factor. -- `inline`: Inlines a function call. -- `replace`: Replaces a code fragment with a semantically equivalent implementation, often used for mapping to custom instructions. +### Procedures -Users can compose these primitives to define higher-level scheduling operations using Python code. The Exo compiler applies the user-defined schedules to transform the program. +`@proc` and `@instr` decorators are defined here, which call into `Pyparser`. +The frontend works like: API -> Parser -> TypeCheck -> BoundsCheck/AssertCheck -### Backend +`frontend/pyparser.py` defines a parser from Python AST to UAST/PAST. We don't implement our own lexer, but rely on the Python lexer to build a Python AST, and hijack it to translate it to Exo's internal ASTs. UAST is an "untyped AST" which is an untyped version of LoopIR (LoopIR is the "Exo IR" in the paper terminology). UAST is used when parsing the full procedure definitions (`@proc` or `@instr`). PAST is a pattern AST with holes, which is used to parse fragments from the user code not in the procedure, to parse arguments to the scheduling primitives (e.g., `n + m`). -After the user schedules are applied, the backend performs the following tasks: +`typecheck.py` literally typechecks but also converts UAST to LoopIR. +`boundscheck.py` checks any out-of-bounds errors in the frontend code. It also checks that all assertions in the code are satisfiable. It invokes an SMT solver. -1. **Memory/Precision Checking**: Verifies that the program correctly uses the memories and data types specified in the hardware library. -2. **Code Generation**: Generates C code from the transformed Exo program. +### New LoopIR Expressions -The backend checks are performed after scheduling to allow the schedules to modify the memory and precision annotations in the program. +Some scheduling primitives (such as `expand_dim`, all primitives that take `NewExprA` type as an argument) require the construction of new LoopIR expressions. +`parse_fragment.py` implements this pass. It calls into `pyparser.pattern` which invokes the parser with `is_fragment=True`. +It's not possible to use holes `_` when parsing new expressions. Holes are for pattern matching for reference. -## Hardware Libraries +### Pattern Match for Reference -An essential part of the Exo system is the ability to define hardware targets as user libraries. These libraries specify the details of the target accelerator, such as: +Cursors can be obtained by pattern matching. The pattern gets parsed into PAST and then matched against the LoopIR to obtain a reference. +`frontend/pattern_match.py` implements this functionality. -- Custom memories -- Custom instructions -- Configuration state +--- -By defining these hardware details in libraries, Exo allows targeting new accelerators without modifying the core compiler. The schedules can then use these hardware-specific features to optimize the program for the target accelerator. +## Rewrites (User-Scheduling) -## Source Code +After the frontend pass, we obtain LoopIR. Files in `src/exo/rewrite` implement Exo's rewrite-based user-scheduling process. +- `LoopIR_scheduling.py` is the main file that implements all the scheduling primitives. Many implementations of primitives call into `Check_...` functions, which are the safety checks implemented in `new_eff.py`. +- How we handle analysis to preserve functional equivalence of rewrites is a whole other topic we don't go into details here. `new_eff.py`, `new_analysis_core.py`, and `analysis_simplify.py` are all files related to the analysis. +- `LoopIR_unification.py` implements a unification process for supporting the `replace(p, ...)` rewrite primitive. -The source code for the Exo compiler is available on GitHub: [https://github.com/exo-lang/exo](https://github.com/exo-lang/exo) +--- -The repository contains the implementation of the Exo language, the compiler, and a set of hardware libraries for different accelerators. +## Backend -## Conclusion +The backend is responsible for lowering LoopIR to C code and performs backend checks like precision analysis, window analysis, and parallelism analysis. -The Exo system provides a productive environment for developing high-performance kernel libraries targeting specialized hardware accelerators. By combining a flexible scheduling language with the ability to define hardware targets in libraries, Exo enables achieving state-of-the-art performance with significantly less engineering effort compared to traditional approaches. +- `LoopIR_compiler.py` is the main file in the backend, which compiles LoopIR to C code. +- `mem_analysis.py` implements a memory consistency check. For example, if a callee expects an `AVX2` annotation but the caller passes `DRAM` memory, it raises an error. +- `parallel_analysis.py` implements a parallel analysis. +- `prec_analysis.py` implements a precision consistency check, but also coerces the precision where possible. +- `win_analysis.py` implements a window analysis to check if callee and caller window annotations (tensor or window) match with each other. diff --git a/examples/cursors/README.md b/examples/cursors/README.md index f34b3288..d0d1a799 100644 --- a/examples/cursors/README.md +++ b/examples/cursors/README.md @@ -1,18 +1,29 @@ -# Cursor step-by-step example! +# Cursor Step-by-Step Tutorial -Very simple example using the tile2D example (in the paper!) +This tutorial demonstrates a simple application of Cursors using the tile2D example (as shown in our ASPLOS '25 paper). +## Overview -This example demonstrates how to use Cursors to navigate and transform Exo object code. +Learn how to use Cursors to navigate and transform Exo object code. Cursors are powerful tools that allow you to: +- Select and reference specific code elements (expressions, statements, blocks) +- Navigate spatially within procedures +- Apply transformations to your code -Cursors allow you to select and refer to parts of the code such as expressions, -statements, and code blocks. They also support spatial navigation within a procedure -to proximate locations. +## Key Concepts -Key concepts covered: -- Finding cursors using patterns -- Navigating using cursors -- Applying scheduling primitives with cursors -- Forwarding cursors after transformations +This tutorial covers: +- Pattern-based cursor identification +- Cursor navigation techniques +- Applying scheduling primitives using cursors +- Cursor forwarding after code transformations +## Getting Started +To run this example: +```bash +exocc cursors.py +``` + +Notes: +- The tutorial uses the tile2D example for demonstration +- Focus is placed on fundamental cursor operations From 7ecc2e7e519155e0cfa91a57dad8253acb2e91c9 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sat, 26 Oct 2024 12:43:20 -0400 Subject: [PATCH 08/26] . --- README.md | 28 +++------------------------- docs/System.md | 18 ++++++++++++++++++ examples/README.md | 10 ++++++++++ 3 files changed, 31 insertions(+), 25 deletions(-) create mode 100644 examples/README.md diff --git a/README.md b/README.md index 6d080c41..34d7d6f8 100644 --- a/README.md +++ b/README.md @@ -30,11 +30,6 @@ You can use optional arguments to customize the output: - The `--stem` argument allows you to specify custom names for the C file and header file. -# Examples - -Take a look at [examples](examples/avx2_matmul/README.md) for scheduling examples, and [API documentation](docs/API.md) for scheduling interface documentation. - - # Build Exo from source We make active use of newer Python 3.x features. Please use Python 3.9 or 3.10 if you're getting errors about unsupported features. @@ -63,7 +58,6 @@ Finally, you can build and install Exo. (exo) $ pip install dist/*.whl ``` - ## PySMT Depending on your setup, getting PySMT to work correctly may be difficult. You @@ -119,25 +113,9 @@ pytest --cov=./ --cov-report=html Then, if you want to see annotated source files, open `./htmlcov/index.html`. -# Repository structure - -In this repository, folders are structured as follows: - -1. `src/exo` is where the core Exo implementation resides. - - **APIs.** Documentation for the APIs can be found in the [API documentation](docs/API.md). - - `API.py` defines a stable API for top-level decorators (`proc`, `instr`, and `config`). - - `API_scheduling.py` defines a API for scheduling primitives. - - `API_cursors.py` defines a API for Cursors. - - **Standard libraries.** These could be user-defined, but we provide them for convenience. - - `libs/` contains some common memory definitions (`memories.py`) and custom malloc implementations. - - `platforms/` contains instruction definitions that are part of the release. - - `stdlib/` contains user-level scheduling functions such as `vectorize`. - - Other files are implementation details of Exo (e.g., `typecheck.py` implements typecheck), are not exposed to users. -2. `apps/` contains some sample applications written in Exo. -3. `dependencies/` contains submodules that Exo's apps and testing depends on. -4. `examples/` contains a step-by-step example of scheduling basic matrix multiplication on AVX2. -5. `tests/` contains the Exo test suite. -6. `docs/` contains additional Exo documentation. +# Learn about Exo + +Take a look at [examples](examples/README.md) for scheduling examples, and [documentation](docs/README.md) for various documentation about Exo. # Contact diff --git a/docs/System.md b/docs/System.md index 5bcbcdc5..d6ca20c4 100644 --- a/docs/System.md +++ b/docs/System.md @@ -6,6 +6,24 @@ This document provides an overview of the Exo compilation process, as illustrate The Exo compiler consists of a frontend and a backend, with user schedules applied in between. The input to the compiler is a set of Exo source files (`*.py`), and the output is generated C code (`*.c`). +In this repository, folders are structured as follows: + +1. `src/exo` is where the core Exo implementation resides. + - **APIs.** Documentation for the APIs can be found in the [API documentation](docs/API.md). + - `API.py` defines a stable API for top-level decorators (`proc`, `instr`, and `config`). + - `API_scheduling.py` defines a API for scheduling primitives. + - `API_cursors.py` defines a API for Cursors. + - **Standard libraries.** These could be user-defined, but we provide them for convenience. + - `libs/` contains some common memory definitions (`memories.py`) and custom malloc implementations. + - `platforms/` contains instruction definitions that are part of the release. + - `stdlib/` contains user-level scheduling functions such as `vectorize`. + - Other files are implementation details of Exo (e.g., `typecheck.py` implements typecheck), are not exposed to users. +2. `apps/` contains some sample applications written in Exo. +3. `dependencies/` contains submodules that Exo's apps and testing depends on. +4. `examples/` contains a step-by-step example of scheduling basic matrix multiplication on AVX2. +5. `tests/` contains the Exo test suite. +6. `docs/` contains additional Exo documentation. + --- ## Core diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 00000000..5c1b69d4 --- /dev/null +++ b/examples/README.md @@ -0,0 +1,10 @@ +# Scheduling Examples + +This directory contains several examples, along with documentation and code. +If you are new to Exo, we recommend going through the examples in the following order: + +1. [AVX2 Matmul](./avx2_matmul/README.md): This example demonstrates how to take a simple matrix multiplication kernel and transform it into an implementation that can make use of AVX2 instructions. It provides an overview of Exo and its scheduling system. + +2. [Cursor](./cursors/README.md): This example shows how to use Cursors to efficiently write schedules and define a new scheduling operator. + +3. [RVM](./rvm_conv1d/README.md): This example illustrates how to use Exo to optimize a new hardware accelerator completely in the user code. From 535a61fafa178df5c734e64a9fb9910cf8e19d46 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sat, 26 Oct 2024 12:52:16 -0400 Subject: [PATCH 09/26] . --- README.md | 2 +- examples/README.md | 2 +- examples/avx2_matmul/README.md | 9 ++++----- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 34d7d6f8..2b832c5c 100644 --- a/README.md +++ b/README.md @@ -115,7 +115,7 @@ Then, if you want to see annotated source files, open `./htmlcov/index.html`. # Learn about Exo -Take a look at [examples](examples/README.md) for scheduling examples, and [documentation](docs/README.md) for various documentation about Exo. +Take a look at the [examples](examples/README.md) directory for scheduling examples and the [documentation](docs/README.md) directory for various documentation about Exo. # Contact diff --git a/examples/README.md b/examples/README.md index 5c1b69d4..caf1c065 100644 --- a/examples/README.md +++ b/examples/README.md @@ -7,4 +7,4 @@ If you are new to Exo, we recommend going through the examples in the following 2. [Cursor](./cursors/README.md): This example shows how to use Cursors to efficiently write schedules and define a new scheduling operator. -3. [RVM](./rvm_conv1d/README.md): This example illustrates how to use Exo to optimize a new hardware accelerator completely in the user code. +3. [RVM](./rvm_conv1d/README.md): This example illustrates how to use Exo to define and target a new hardware accelerator entirely in the user code. diff --git a/examples/avx2_matmul/README.md b/examples/avx2_matmul/README.md index 4328da35..4db6742c 100644 --- a/examples/avx2_matmul/README.md +++ b/examples/avx2_matmul/README.md @@ -259,10 +259,9 @@ This will print out the results of running kernel with and without the AVX instr [blas]: https://www.netlib.org/blas/ [blis]: https://github.com/flame/blis -## Stay tuned for more automation! +## More Automation? Congratulations on completing this example! -You might have felt that the scheduling operations in this example were very low-level and might be laborious to write. -We felt the same! We have a pre-release version of Exo that provides scheduling automation _external_ to the compiler implementation. -By sharing the repeated pattern of schedules and using our novel reference mechanism called Cursors, we achieve fewer lines of code than what we've shown here in the upcoming release. Please contact Exo developers at exo@mit.edu if you want to learn more or wish to collaborate! - +You might have felt that the scheduling operations in this example were very low-level and could be laborious to write. +We felt the same! We implemented a new feature called Cursors that provides scheduling automation *external* to the compiler implementation. +To learn more, please take a look at the [cursors example](cursors/README.md) and our ASPLOS '25 paper. From cfcf96f38fb39fb26b590ba27501bd8ef77098ab Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sat, 26 Oct 2024 13:11:17 -0400 Subject: [PATCH 10/26] docs readme --- docs/README.md | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/docs/README.md b/docs/README.md index ef566ad6..ade39702 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,11 +1,29 @@ -write where is what documentaiton +# Documentation -Learn about primitives, learn about external hardware definition, learn about Cursors (with link to all of them) in the main README +This directory provides detailed documentation about the Exo interface and internal system. +- To learn about the design principles of Exo, read [Design.md](Design.md). +- To understand how the Exo system is implemented, read [System.md](System.md). +- For information on writing Exo object code, APIs, and imports, refer to [Procedures.md](Procedures.md), [object_code.md](object_code.md), and [Imports.md](Imports.md). +- To learn how to define memory, instructions, and externs externally to the compiler in the user code, explore [externs.md](externs.md), [instructions.md](instructions.md), and [memories.md](memories.md). +- To understand the available scheduling primitives and how to use them, look into the primitives/ directory. +The scheduling primitives are classified into six categories: -# Further read -thesis -papers -tutorial -examples +1. [Buffer Transformations](primitives/buffer_ops.md) +2. [Loop and Scope Transformations](primitives/loop_ops.md) +3. [Configuration States](primitives/config_ops.md) +4. [Subprocedure Operations](primitives/subproc_ops.md) +5. [Memory, Precision, and Parallelism Transformations](primitives/backend_ops.md) +6. [Other Operations](primitives/other_ops.md) + +# Further Reading + +The following papers provide a high-level and holistic view of Exo as a project: + +- [PLDI '22 paper](https://people.csail.mit.edu/yuka/pdf/exo_pldi2022_full.pdf) +- [ASPLOS '25 paper](.) +- [Kevin Qian's MEng thesis](https://dspace.mit.edu/handle/1721.1/157187) +- [Samir Droubi's MEng thesis](https://dspace.mit.edu/handle/1721.1/156752) + +For more documentation and actual Exo code, refer to the [Examples](../examples/README.md) directory. From 819cc7c0e271b434f08ac3cbf47b6adfbac18a55 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sat, 26 Oct 2024 13:13:13 -0400 Subject: [PATCH 11/26] - --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 2b832c5c..46eeb232 100644 --- a/README.md +++ b/README.md @@ -112,6 +112,7 @@ pytest --cov=./ --cov-report=html Then, if you want to see annotated source files, open `./htmlcov/index.html`. +--- # Learn about Exo From 9cd78282460f724bc12022301ace6c7c854eb2bb Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sat, 26 Oct 2024 13:38:58 -0400 Subject: [PATCH 12/26] TODO: polish import --- docs/Imports.md | 45 +++++++++------------------------------------ 1 file changed, 9 insertions(+), 36 deletions(-) diff --git a/docs/Imports.md b/docs/Imports.md index 6faa7c69..bddc1d28 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -15,7 +15,6 @@ This document provides an overview of the imports used when writing Exo. 7. [External Interfaces](#7-external-interfaces) 8. [API Cursors](#8-api-cursors) ---- ## 1. Standard Python Future Import @@ -26,7 +25,6 @@ from __future__ import annotations - **Purpose**: Enables postponed evaluation of type annotations, allowing you to use forward references in type hints without causing issues during runtime. This is necessary to support Exo's `x : f32` syntax. - **Context**: This is a standard Python feature that improves compatibility and performance when using type hints in your code. ---- ## 2. Core Exo Module @@ -37,7 +35,6 @@ from exo import * - **Purpose**: Imports all core functionalities from the Exo language. - **Includes**: Fundamental classes and functions necessary for defining and manipulating high-performance computational kernels, such as `proc`, `instr`, `config`, `Memory`, `Extern`, `DRAM`, and `SchedulingError`. ---- ## 3. Memory Libraries @@ -49,27 +46,15 @@ from exo.libs.memories import DRAM_STATIC, AVX2, AVX512 For example, you can import `DRAM_STATIC`, `AVX2`, or `AVX512` as shown above. ---- ## 4. Instruction Libraries Similar to memories, we provide some hardware instruction definitions as a library. ```python -from exo.platforms.x86 import * +from exo.platforms.x86 import mm256_loadu_ps, mm256_setzero_ps, mm256_broadcast_ss ``` -- **Purpose**: Imports optimizations and definitions specific to x86 architectures. -- **Usage**: Enables the generation of optimized code tailored for x86 CPUs, including SIMD instructions and cache management. - -```python -from exo.platforms.neon import * -``` - -- **Purpose**: Provides ARM NEON-specific functionalities. -- **Usage**: Allows for optimization of code on ARM architectures that support NEON instructions, enhancing performance on mobile and embedded devices. - ---- ## 5. Frontend Syntax Utilities @@ -77,48 +62,36 @@ from exo.platforms.neon import * from exo.frontend.syntax import * ``` -- **Purpose**: Imports utilities for parsing and manipulating Exo's frontend syntax. -- **Usage**: Used when extending or customizing the language's syntax for domain-specific applications. +This module defines special symbols that are used inside Exo code. You may +import this module via `from exo.syntax import *` to suppress warnings and +see documentation inside an IDE (like PyCharm). ---- -## 6. Standard Library Modules +## 6. Standard Library Scheduling Functions -### 6.1 Scheduling Utilities ```python -from exo.stdlib.scheduling import * +from exo.stdlib.scheduling import repeat, replace_all +from exo.stdlib.stdlib import vectorize, tile_loops ``` -- **Purpose**: Provides functions for scheduling and transforming computational kernels. -- **Includes**: Loop transformations, tiling, unrolling, and other optimization techniques. - -### 6.2 Standard Library Functions - -```python -from exo.stdlib.stdlib import * -``` -- **Purpose**: Imports standard library functions and classes. -- **Usage**: Offers a collection of common utilities and helpers used across various Exo programs. ---- ## 7. External Interfaces ```python -from exo.libs.externs import * +from exo.libs.externs import sin, relu ``` - **Purpose**: Facilitates interaction with external libraries and functions not defined within Exo. - **Usage**: Allows for the integration of external code, such as C functions or hardware-specific routines, into Exo programs. ---- ## 8. API Cursors ```python -from exo.API_cursors import * +from exo.API_cursors import ForCursor, AssignCursor, InvalidCursor ``` - **Purpose**: Provides cursor-based APIs for navigating and modifying code structures. From 0e85ce7670d255451b8633c1abdbc4fe166b9800 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sun, 27 Oct 2024 15:13:08 -0400 Subject: [PATCH 13/26] update cursors example --- examples/cursors/README.md | 16 ++++++++-------- examples/cursors/cursors.py | 9 +++++---- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/examples/cursors/README.md b/examples/cursors/README.md index d0d1a799..1141ec6f 100644 --- a/examples/cursors/README.md +++ b/examples/cursors/README.md @@ -4,18 +4,19 @@ This tutorial demonstrates a simple application of Cursors using the tile2D exam ## Overview -Learn how to use Cursors to navigate and transform Exo object code. Cursors are powerful tools that allow you to: +Learn how to use Cursors to navigate and transform Exo object code. Cursors allow you to: - Select and reference specific code elements (expressions, statements, blocks) - Navigate spatially within procedures -- Apply transformations to your code +- Apply optimization ## Key Concepts -This tutorial covers: -- Pattern-based cursor identification -- Cursor navigation techniques +This example covers the key concepts presented in the paper: +- Finding Cursors with pattern-matching +- Cursor navigation - Applying scheduling primitives using cursors - Cursor forwarding after code transformations +- Defining a new scheduling operation ## Getting Started @@ -23,7 +24,6 @@ To run this example: ```bash exocc cursors.py ``` +Running `exocc` on `cursors.py` will generate the C code in the `cursors/cursors.c` file. +It will also print out the intermediate steps of the example. -Notes: -- The tutorial uses the tile2D example for demonstration -- Focus is placed on fundamental cursor operations diff --git a/examples/cursors/cursors.py b/examples/cursors/cursors.py index 5999e2bc..8cc58dda 100644 --- a/examples/cursors/cursors.py +++ b/examples/cursors/cursors.py @@ -13,10 +13,11 @@ to proximate locations. Key concepts covered: -- Finding cursors using patterns -- Navigating using cursors -- Applying scheduling primitives with cursors -- Forwarding cursors after transformations +- Finding Cursors with pattern-matching +- Cursor navigation +- Applying scheduling primitives using cursors +- Cursor forwarding after code transformations +- Defining a new scheduling operation """ From ea10d25d1b7ca7355ea512aa8e617fab9343e308 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sun, 27 Oct 2024 17:50:21 -0400 Subject: [PATCH 14/26] update --- docs/Cursors.md | 4 ++-- docs/Design.md | 34 ++++++++++++++-------------- docs/Imports.md | 44 ++++++++++++++++++------------------- docs/Procedures.md | 11 ---------- docs/README.md | 6 ++--- docs/System.md | 45 +++++++++++++++++++++----------------- examples/cursors/README.md | 9 +------- 7 files changed, 69 insertions(+), 84 deletions(-) diff --git a/docs/Cursors.md b/docs/Cursors.md index 6f4bc568..dfbaaf53 100644 --- a/docs/Cursors.md +++ b/docs/Cursors.md @@ -370,7 +370,7 @@ p2 = reorder_scope(p1, p1.forward(c).next(), ...) In this code, the navigation `.next()` is applied to the forwarded cursor `p1.forward(c)`. Attempting to change `p1.forward(c).next()` to `p1.forward(c.next())` will result in incorrect behavior. This is because navigation and forwarding are *not commutative*. -## further -More details can be found in our ASPLOS '25 paper or Kevin Qian's master thesis +## Further Reading +More details of the design principles of Cursors can be found in our [ASPLOS '25 paper](.) or in [Kevin Qian's MEng thesis](https://dspace.mit.edu/handle/1721.1/157187). diff --git a/docs/Design.md b/docs/Design.md index 5c4db703..9ac288d6 100644 --- a/docs/Design.md +++ b/docs/Design.md @@ -19,26 +19,25 @@ One of the main ideas behind Exo is **exocompilation**, which allows users to de Users can model custom memories, instructions, and configuration state in libraries to target a specific accelerator. These hardware abstractions can then be used to write hand-optimized code or as building blocks for higher-level scheduling transformations. -More info can be found in the [PLDI paper](https://people.csail.mit.edu/yuka/pdf/exo_pldi2022_full.pdf) and [./instructions.md](./instructions.md) and [./memories.md](./memories.md). +More info can be found in the [PLDI paper](https://people.csail.mit.edu/yuka/pdf/exo_pldi2022_full.pdf) and [instructions.md](./instructions.md) and [memories.md](./memories.md). ## Fine-Grained Primitives for Performance Control -Exo offers a set of fine-grained scheduling primitives that give users low-level control over performance-critical details. These primitives can be composed to build complex transformation schedules. Some examples of these primitives include: +Exo provides a set of fine-grained scheduling primitives that offer users low-level control over performance-critical aspects. These primitives can be combined to create complex transformation schedules. Some examples of these primitives include: -- `split` and `reorder` for loop transformations -- `stage_mem` for explicit data movement between memories -- `replace` for mapping code fragments to custom instructions +- `replace`: Maps code fragments to custom instructions +- `delete_config`: Removes redundant configuration statements -Having explicit control over these low-level details enables Exo to achieve performance competitive with highly-tuned vendor libraries and hand-optimized assembly code. -Primitives can be found in [./primitives/](./primitives/). +The key research contributions of Exo were supporting `replace` through unification and the ability to reason about configuration states. Explicit control over these low-level details allows Exo to achieve performance comparable to highly-tuned vendor libraries and hand-optimized assembly code. All the primitives can be found in the [primitives/](./primitives/) directory. ## Rewrite-based Scheduling Language -Unlike previous popular frameworks like Halide and TVM which use a _lowering-based_ compilation process, Exo uses a _rewrite-based_ compilation process. +Exo employs a *rewrite-based* compilation process, which differs from the *lowering-based* approach used by popular frameworks like Halide and TVM. -This has a few advantages: -- Less magic -- Easy to print in the middle of the scheduling process and see what is going on. +The rewrite-based approach offers several advantages: + +- Reduced complexity and less "magic" involved +- Easier to print and inspect the state of the scheduling process at any point --- @@ -48,22 +47,23 @@ While the flexibility of fine-grained primitives is necessary for achieving peak These user-defined scheduling operations can encapsulate common optimization patterns and hardware-specific transformations, greatly improving productivity. They can be put together in reusable libraries, further enabling modularity and portability. -More info can be found in the ASPLOS paper and [./Cursor.md](./Cursor.md). +More infomation can be found in the [ASPLOS paper](.) and [Cursor.md](./Cursor.md). ## The AIR Framework: Action, Inspection, Reference We identified that Action, Inspection, and Reference are the key scheduling language design mechanisms that enable user-defined scheduling operations. -- **Actions** are the scheduling primitives that transform the code (e.g., `split`, `reorder`). +- **Actions** are the scheduling primitives that transform the code (e.g., `divide_loop`, `reorder`). - **Inspections** query properties of the code (e.g., loop bounds, memory access patterns). - **References** point to specific parts of the code to apply actions to. -Together, AIR allows scheduling operations to be defined as composable rewrites on the code. The language implementation guarantees the correctness of these rewrites with a set of effect analyses. +Together, AIR allows scheduling operations to be defined as composable rewrites on the code. The language implementation guarantees the correctness of these primitive rewrites with a set of effect analyses. ## Cursors: Enabling Relative References -A novel feature in Exo's design is the concept of cursors, which serve as relative references into the code. Similar to a text editing cursor, an Exo cursor identifies a specific location in the program AST, such as a statement, loop nest, or even the gap between statements. +A novel feature in Exo's design is the concept of cursors, which serve as relative references into the code. Similar to a text editing cursor, an Exo cursor can refer to a specific location in the program AST, such as a statement, loop nest, or even the gap between statements. -Cursors support navigation operations such as `next`, `prev`, `parent`, enabling powerful code transformations using relative positions. Multiple cursors can coexist, allowing different parts of the code to be referenced and modified simultaneously. +Cursors support navigation operations such as `next`, `prev`, `parent`, enabling powerful code transformations using relative positions. +Furthermore, Cursor _forwarding_ let users reuse the cursor from the previous procedure in the current procedure. +Multiple cursors can coexist, allowing different parts of the code to be referenced and modified simultaneously. -Using cursors, complex scheduling operations can be built using simple navigation and rewrite rules, with the cursor abstracting away the details of manual AST manipulation. diff --git a/docs/Imports.md b/docs/Imports.md index bddc1d28..81cbd65c 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -22,8 +22,7 @@ This document provides an overview of the imports used when writing Exo. from __future__ import annotations ``` -- **Purpose**: Enables postponed evaluation of type annotations, allowing you to use forward references in type hints without causing issues during runtime. This is necessary to support Exo's `x : f32` syntax. -- **Context**: This is a standard Python feature that improves compatibility and performance when using type hints in your code. +Enables postponed evaluation of type annotations, allowing you to use forward references in type hints without causing issues during runtime. This is necessary to support Exo's `x : f32` syntax. ## 2. Core Exo Module @@ -32,13 +31,13 @@ from __future__ import annotations from exo import * ``` -- **Purpose**: Imports all core functionalities from the Exo language. -- **Includes**: Fundamental classes and functions necessary for defining and manipulating high-performance computational kernels, such as `proc`, `instr`, `config`, `Memory`, `Extern`, `DRAM`, and `SchedulingError`. +Imports basic classes and functions necessary for defining and manipulating high-performance computational kernels, such as `proc`, `instr`, `config`, `Memory`, `Extern`, `DRAM`, and `SchedulingError`. ## 3. Memory Libraries -Even though users can define memory definitions externally to the compiler in the user code (see [./memories.md]), we provide memory definitions for some architectures as examples. The supported memories can be found by looking into `src/exo/libs/memories.py`. +Even though users can define memory definitions externally to the compiler in the user code (see [memories.md](./memories.md)), we provide memory definitions for some architectures for convinience. +The supported memory definitions can be found by looking into `src/exo/libs/memories.py`. ```python from exo.libs.memories import DRAM_STATIC, AVX2, AVX512 @@ -49,50 +48,49 @@ For example, you can import `DRAM_STATIC`, `AVX2`, or `AVX512` as shown above. ## 4. Instruction Libraries -Similar to memories, we provide some hardware instruction definitions as a library. +Similar to memories, we provide some hardware instruction definitions for convinience (see [instructions.md](./instructions.md) to learn how to define your own accelerator instructions). ```python from exo.platforms.x86 import mm256_loadu_ps, mm256_setzero_ps, mm256_broadcast_ss ``` +## 5. Extern Libraries -## 5. Frontend Syntax Utilities +Similary, convinience extern libraries can be imported as follows. See [externs.md](./externs.md) to learn how to define your own externs. ```python -from exo.frontend.syntax import * +from exo.libs.externs import sin, relu ``` -This module defines special symbols that are used inside Exo code. You may -import this module via `from exo.syntax import *` to suppress warnings and -see documentation inside an IDE (like PyCharm). - - -## 6. Standard Library Scheduling Functions +## 6. Frontend Syntax Utilities ```python -from exo.stdlib.scheduling import repeat, replace_all -from exo.stdlib.stdlib import vectorize, tile_loops +from exo.frontend.syntax import * ``` +This module defines special symbols that are used inside Exo code. +Importing this can suppress warnings inside an IDE (like PyCharm). +## 7. Standard Library Scheduling Functions -## 7. External Interfaces +Exo provides users with the ability to define new scheduling operations using Cursors. For convenience, we have implemented scheduling libraries (standard library) that contain common scheduling operations users may want to use, such as vectorization and tiling. Users can import the standard library as follows: ```python -from exo.libs.externs import sin, relu +from exo.stdlib.scheduling import repeat, replace_all +from exo.stdlib.stdlib import vectorize, tile_loops ``` -- **Purpose**: Facilitates interaction with external libraries and functions not defined within Exo. -- **Usage**: Allows for the integration of external code, such as C functions or hardware-specific routines, into Exo programs. - +Alternatively, users can define their own scheduling operations by composing scheduling primitives directly in their code. ## 8. API Cursors +Cursors (see [Cursors.md](./Cursors.md)) are Exo's reference mechanism that allows users to navigate and inspect object code. When users define new scheduling operators using Cursors, they may wish to write their own inspection pass. API Cursors define types that will be useful for user inspection. + ```python from exo.API_cursors import ForCursor, AssignCursor, InvalidCursor ``` -- **Purpose**: Provides cursor-based APIs for navigating and modifying code structures. -- **Usage**: Enables advanced code introspection and manipulation, useful for metaprogramming and automated optimizations. +These API Cursors provide specific types, such as `ForCursor` for for-loops, `AssignCursor` for assignments, and `InvalidCursor` for invalid cursors. Users can leverage these types when inspecting and manipulating code using Cursors. + diff --git a/docs/Procedures.md b/docs/Procedures.md index 62c9467b..6516a36a 100644 --- a/docs/Procedures.md +++ b/docs/Procedures.md @@ -32,14 +32,3 @@ Cursors can be obtained by querying patterns on a procedure. All the Cursor rela - `.transpose(arg_cursor)`: Transposes a 2D buffer argument in the signature and the body. Returns a new procedure and is non-equivalence preserving because the signature has changed. - `.add_assertion(assertion)`: Adds an assertion to the procedure. - `.is_eq(other_proc)`: Checks the equivalence of this procedure with another procedure. - -## Scheduling Primitives - -We have classified scheduling primitives into six categories. Here are the links to each: - -- [Buffer Transformations](primitives/buffer_ops.md) -- [Loop and Scope Transformations](primitives/loop_ops.md) -- [Configuration States](primitives/config_ops.md) -- [Subprocedure Operations](primitives/subproc_ops.md) -- [Memory, Precision, and Parallelism Transformations](primitives/backend_ops.md) -- [Other Operations](primitives/other_ops.md) diff --git a/docs/README.md b/docs/README.md index ade39702..64e4fa60 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,11 +1,11 @@ # Documentation -This directory provides detailed documentation about the Exo interface and internal system. +This directory provides detailed documentation about Exo's interface and internal systems. - To learn about the design principles of Exo, read [Design.md](Design.md). - To understand how the Exo system is implemented, read [System.md](System.md). - For information on writing Exo object code, APIs, and imports, refer to [Procedures.md](Procedures.md), [object_code.md](object_code.md), and [Imports.md](Imports.md). -- To learn how to define memory, instructions, and externs externally to the compiler in the user code, explore [externs.md](externs.md), [instructions.md](instructions.md), and [memories.md](memories.md). +- To learn how to define memory, instructions, and externs externally to the compiler in the user code, refer to [externs.md](externs.md), [instructions.md](instructions.md), and [memories.md](memories.md). - To understand the available scheduling primitives and how to use them, look into the primitives/ directory. The scheduling primitives are classified into six categories: @@ -26,4 +26,4 @@ The following papers provide a high-level and holistic view of Exo as a project: - [Kevin Qian's MEng thesis](https://dspace.mit.edu/handle/1721.1/157187) - [Samir Droubi's MEng thesis](https://dspace.mit.edu/handle/1721.1/156752) -For more documentation and actual Exo code, refer to the [Examples](../examples/README.md) directory. +For more documentation with running Exo code, refer to the [Examples](../examples/README.md) directory. diff --git a/docs/System.md b/docs/System.md index d6ca20c4..3106287b 100644 --- a/docs/System.md +++ b/docs/System.md @@ -9,7 +9,7 @@ The Exo compiler consists of a frontend and a backend, with user schedules appli In this repository, folders are structured as follows: 1. `src/exo` is where the core Exo implementation resides. - - **APIs.** Documentation for the APIs can be found in the [API documentation](docs/API.md). + - **APIs.** - `API.py` defines a stable API for top-level decorators (`proc`, `instr`, and `config`). - `API_scheduling.py` defines a API for scheduling primitives. - `API_cursors.py` defines a API for Cursors. @@ -17,10 +17,10 @@ In this repository, folders are structured as follows: - `libs/` contains some common memory definitions (`memories.py`) and custom malloc implementations. - `platforms/` contains instruction definitions that are part of the release. - `stdlib/` contains user-level scheduling functions such as `vectorize`. - - Other files are implementation details of Exo (e.g., `typecheck.py` implements typecheck), are not exposed to users. + - Other files are implementation of Exo (e.g., `typecheck.py` implements typecheck), are not exposed to users. 2. `apps/` contains some sample applications written in Exo. 3. `dependencies/` contains submodules that Exo's apps and testing depends on. -4. `examples/` contains a step-by-step example of scheduling basic matrix multiplication on AVX2. +4. `examples/` contains examples of scheduling with Exo. 5. `tests/` contains the Exo test suite. 6. `docs/` contains additional Exo documentation. @@ -31,7 +31,7 @@ In this repository, folders are structured as follows: `src/exo/core` defines IRs used in Exo and other core implementations. - `LoopIR.py` is the main file that defines IRs (LoopIR, UAST, PAST), and their visitor functions (LoopIR_Do, LoopIR_Rewrite). - `LoopIR_pprint.py` implements a printing procedure for the IRs defined in `LoopIR.py`. -- `prelude.py` defines `Sym` and `Srcinfo` used in `LoopIR` and everywhere. +- `prelude.py` defines `Sym` and `Srcinfo`. User-defined features like config, externs, and Memory's parent class implementations are in `configs.py`, `extern.py`, and `memory.py`, respectively. @@ -42,47 +42,52 @@ User-defined features like config, externs, and Memory's parent class implementa ## Frontend -`API.py` provides various user-facing entry points to Exo. -There are three types of parsing passes in the frontend. All the frontend code is in `src/exo/frontend`. +`API.py` provides various user-facing entry points to Exo. The frontend consists of three types of parsing passes, all of which are located in the `src/exo/frontend` directory. ### Procedures -`@proc` and `@instr` decorators are defined here, which call into `Pyparser`. -The frontend works like: API -> Parser -> TypeCheck -> BoundsCheck/AssertCheck +The `@proc` and `@instr` decorators are defined in this section and call into the `Pyparser`. The frontend workflow is as follows: +``` +API -> Parser -> TypeCheck -> BoundsCheck/AssertCheck +``` -`frontend/pyparser.py` defines a parser from Python AST to UAST/PAST. We don't implement our own lexer, but rely on the Python lexer to build a Python AST, and hijack it to translate it to Exo's internal ASTs. UAST is an "untyped AST" which is an untyped version of LoopIR (LoopIR is the "Exo IR" in the paper terminology). UAST is used when parsing the full procedure definitions (`@proc` or `@instr`). PAST is a pattern AST with holes, which is used to parse fragments from the user code not in the procedure, to parse arguments to the scheduling primitives (e.g., `n + m`). +`frontend/pyparser.py` defines a parser that translates the Python AST to UAST/PAST. Instead of implementing a custom lexer, Exo relies on the Python lexer to build the Python AST and hijacks it to translate it into Exo's internal ASTs. UAST (Untyped AST) is an untyped version of LoopIR (LoopIR is the "Exo IR" in the paper terminology). UAST is used when parsing full procedure definitions (`@proc` or `@instr`). PAST (Pattern AST) is an AST with holes, used to parse fragments from the user code outside the procedure (see next two sections). -`typecheck.py` literally typechecks but also converts UAST to LoopIR. -`boundscheck.py` checks any out-of-bounds errors in the frontend code. It also checks that all assertions in the code are satisfiable. It invokes an SMT solver. +`typecheck.py` performs type checking and converts UAST to LoopIR. +`boundscheck.py` checks for any out-of-bounds errors in the frontend code and ensures that all assertions in the code are satisfiable by invoking an SMT solver. ### New LoopIR Expressions -Some scheduling primitives (such as `expand_dim`, all primitives that take `NewExprA` type as an argument) require the construction of new LoopIR expressions. -`parse_fragment.py` implements this pass. It calls into `pyparser.pattern` which invokes the parser with `is_fragment=True`. -It's not possible to use holes `_` when parsing new expressions. Holes are for pattern matching for reference. +Some scheduling primitives (such as `expand_dim` and all primitives that take `NewExprA` as an argument) require the construction of new LoopIR expressions. +`parse_fragment.py` implements this pass by calling into `pyparser.pattern`, which invokes the parser with `is_fragment=True`. +When parsing new expressions, it is not possible to use holes `_`. Holes are used for pattern matching for obtaining a cursor referene. ### Pattern Match for Reference Cursors can be obtained by pattern matching. The pattern gets parsed into PAST and then matched against the LoopIR to obtain a reference. `frontend/pattern_match.py` implements this functionality. + --- ## Rewrites (User-Scheduling) -After the frontend pass, we obtain LoopIR. Files in `src/exo/rewrite` implement Exo's rewrite-based user-scheduling process. +After the frontend pass, we obtain LoopIR. The files in `src/exo/rewrite` implement Exo's rewrite-based user-scheduling process. + - `LoopIR_scheduling.py` is the main file that implements all the scheduling primitives. Many implementations of primitives call into `Check_...` functions, which are the safety checks implemented in `new_eff.py`. -- How we handle analysis to preserve functional equivalence of rewrites is a whole other topic we don't go into details here. `new_eff.py`, `new_analysis_core.py`, and `analysis_simplify.py` are all files related to the analysis. -- `LoopIR_unification.py` implements a unification process for supporting the `replace(p, ...)` rewrite primitive. +- The handling of analysis to preserve functional equivalence of rewrites is a separate topic not covered in detail here. `new_eff.py`, `new_analysis_core.py`, and `analysis_simplify.py` are all files related to the analysis. +- `LoopIR_unification.py` implements a unification process to support the `replace(p, ...)` rewrite primitive. --- ## Backend -The backend is responsible for lowering LoopIR to C code and performs backend checks like precision analysis, window analysis, and parallelism analysis. +The backend is responsible for lowering LoopIR to C code and performing backend checks, including precision analysis, window analysis, and parallelism analysis. - `LoopIR_compiler.py` is the main file in the backend, which compiles LoopIR to C code. - `mem_analysis.py` implements a memory consistency check. For example, if a callee expects an `AVX2` annotation but the caller passes `DRAM` memory, it raises an error. -- `parallel_analysis.py` implements a parallel analysis. -- `prec_analysis.py` implements a precision consistency check, but also coerces the precision where possible. +- `parallel_analysis.py` implements a parallel analysis. +- `prec_analysis.py` implements a precision consistency check and coerces the precision where possible. - `win_analysis.py` implements a window analysis to check if callee and caller window annotations (tensor or window) match with each other. + + diff --git a/examples/cursors/README.md b/examples/cursors/README.md index 1141ec6f..90a3cef5 100644 --- a/examples/cursors/README.md +++ b/examples/cursors/README.md @@ -1,16 +1,9 @@ # Cursor Step-by-Step Tutorial -This tutorial demonstrates a simple application of Cursors using the tile2D example (as shown in our ASPLOS '25 paper). +This example demonstrates Cursors using the tile2D example (as shown in our ASPLOS '25 paper). ## Overview -Learn how to use Cursors to navigate and transform Exo object code. Cursors allow you to: -- Select and reference specific code elements (expressions, statements, blocks) -- Navigate spatially within procedures -- Apply optimization - -## Key Concepts - This example covers the key concepts presented in the paper: - Finding Cursors with pattern-matching - Cursor navigation From 672a647dcac55dd2e9d9a33fa0f66d2d9369cfcd Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sun, 27 Oct 2024 19:12:54 -0400 Subject: [PATCH 15/26] update --- docs/externs.md | 12 ----------- docs/instructions.md | 49 +++++++++++++++----------------------------- 2 files changed, 17 insertions(+), 44 deletions(-) diff --git a/docs/externs.md b/docs/externs.md index cc4dd933..28f85bfe 100644 --- a/docs/externs.md +++ b/docs/externs.md @@ -86,18 +86,6 @@ def globl(self, prim_type): - Includes necessary headers required for the external function (e.g., `` for mathematical functions). -##### `interpret(self, args)` (Optional) - -Define how the function behaves during interpretation (useful for testing or interactive sessions). - -```python -def interpret(self, args): - import math - return math.sin(args[0]) -``` - -- Allows the extern function to be executed in environments that support interpretation. - #### 4. Instantiate the Extern Function Create an instance of your extern class to make it usable in your code. diff --git a/docs/instructions.md b/docs/instructions.md index f91d3e05..3696ddb6 100644 --- a/docs/instructions.md +++ b/docs/instructions.md @@ -1,47 +1,40 @@ # External Instruction Definitions -Exo allows users to define custom hardware instructions within their code using the `@proc` annotation. These user-defined instructions can be leveraged during the scheduling process to replace specific code fragments with calls to hardware-optimized instructions. This feature enables fine-grained control over code optimization and hardware acceleration, making it easier to target specific architectures like SIMD units or custom accelerators. +Exo allows users to define custom hardware instructions within their code using the `@instr` annotation. +These user-defined instructions can be leveraged during the scheduling process to replace specific code fragments with calls to hardware-optimized instructions. +This feature enables fine-grained control over code optimization, making it easier to target specific architectures like SIMD units or custom accelerators. ## Overview -- **Custom Instructions**: Define hardware-specific instructions as procedures using the `@proc` decorator. -- **Replacement**: Use the `replace` primitive to substitute code fragments with calls to these instructions. -- **Pattern Matching**: Exo uses pattern matching to unify code fragments with instruction definitions. +- **Custom Instructions**: Define hardware-specific instructions as procedures using the `@instr` decorator. +- **Replace**: Use the `replace` primitive to substitute code fragments with calls to these instructions. - **Code Generation**: Custom instructions can emit arbitrary C code, including inline assembly, with placeholders for arguments. ## Defining Custom Instructions -Custom instructions are defined as procedures annotated with `@proc` and further decorated with `@instr`. The `@instr` decorator allows you to specify the C code to be emitted when the instruction is called, including placeholders for arguments. +Custom instructions are defined as procedures annotated with `@instr`. +The `@instr` decorator allows you to specify the C code to be emitted when the instruction is called. ### Syntax ```python -@instr("C code with placeholders") -@proc +@instr("C code") def instruction_name(args): # Specification of the instruction's behavior ``` - -- **`@instr`**: Decorator that specifies the C code to emit. -- **`@proc`**: Indicates that the function is an Exo procedure. +- **`@instr`**: Decorator that specifies the C code to emit. In the string provided to `@instr`, you can include placeholders wrapped in `{}`. These placeholders will be replaced with the names of the arguments when the code is compiled. - **`instruction_name`**: The name of your custom instruction. - **`args`**: Arguments to the instruction. -- **Specification**: A high-level description of what the instruction does, used for pattern matching. - -### Placeholders in C Code +- **semantics**: Semantics of the hardware instruction, written as Exo object code. -In the string provided to `@instr`, you can include placeholders wrapped in `{}`. These placeholders will be replaced with the names of the arguments when the code is compiled. +### Example: Defining a Neon Load Instruction -### Example: Defining a NEON Load Instruction - -Below is an example of defining a NEON load instruction that loads four `f32` values into NEON memory. +Below is an example of defining a NEON load instruction that loads four `f32` values into Neon memory. ```python from exo import * -from exo.core.proc import instr @instr("{dst_data} = vld1q_f32(&{src_data});") -@proc def neon_vld_4xf32(dst: [f32][4] @ Neon, src: [f32][4] @ DRAM): assert stride(src, 0) == 1 assert stride(dst, 0) == 1 @@ -50,10 +43,8 @@ def neon_vld_4xf32(dst: [f32][4] @ Neon, src: [f32][4] @ DRAM): dst[i] = src[i] ``` -#### Explanation - - **`@instr("{dst_data} = vld1q_f32(&{src_data});")`**: Specifies the C code to emit when this instruction is called. - - `{dst_data}` and `{src_data}` are placeholders that will be replaced with the actual argument names. + - `{dst_data}` and `{src_data}` are format strings that will be replaced with the actual arguments during codegen. - **`dst: [f32][4] @ Neon`**: Declares `dst` as a 4-element array of `f32` in `Neon` memory. - **`src: [f32][4] @ DRAM`**: Declares `src` as a 4-element array of `f32` in `DRAM`. - **Assertions**: Ensure that the strides of `src` and `dst` are 1 for correct memory access. @@ -92,25 +83,20 @@ def foo(src: [f32][4] @ DRAM, dst: [f32][4] @ Neon): Use the `replace` primitive to substitute the loop with the custom instruction. ```python -# Instantiate the procedure -p = foo - # Replace the loop with the custom instruction -p = replace(p, "for i in _:_", neon_vld_4xf32) +foo = replace(foo, "for i in _:_", neon_vld_4xf32) ``` #### Explanation -- **`replace(p, "for i in _:_", neon_vld_4xf32)`**: - - **`p`**: The procedure in which to perform the replacement. +- **`replace(foo, "for i in _:_", neon_vld_4xf32)`**: + - **`foo`**: The procedure in which to perform the replacement. - **`"for i in _:_"`**: A cursor pointing to the loop to replace. - **`neon_vld_4xf32`**: The instruction to replace the loop with. ### How `replace` Works -- **Pattern Matching**: Exo attempts to unify the code fragment (the loop) with the body of `neon_vld_4xf32`. -- **Automatic Argument Determination**: If successful, Exo replaces the fragment with a call to `neon_vld_4xf32`, automatically determining the correct arguments. -- **Semantics Preservation**: The specification in the instruction's body ensures that the replacement is semantically correct. +Unification... ### Step 3: Compile and Generate Code @@ -162,7 +148,6 @@ The `replace` primitive is documented in [primitives/subproc_ops.md](primitives/ ```python @instr("{dst} = asm_rvm_macc({src_a}, {src_b}, {dst});") -@proc def rvm_macc(dst: f32 @ RVM, src_a: f32 @ RVM, src_b: f32 @ RVM): dst += src_a * src_b ``` From 02a73f3efdfd1dd7316fd357642eb411c6fcebc9 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sun, 27 Oct 2024 19:33:42 -0400 Subject: [PATCH 16/26] update --- docs/instructions.md | 23 ------------ docs/object_code.md | 83 ++++++++++++++++++++------------------------ 2 files changed, 38 insertions(+), 68 deletions(-) diff --git a/docs/instructions.md b/docs/instructions.md index 3696ddb6..af38295f 100644 --- a/docs/instructions.md +++ b/docs/instructions.md @@ -200,26 +200,3 @@ void matmul_rvm(float A[M][K], float B[K][N], float C[M][N]) { - **RVM Tutorial**: [https://exo-lang.dev/tutorial.html](https://exo-lang.dev/tutorial.html) - **Running Code Examples**: [examples/rvm_conv1d/exo/conv1d.py](https://github.com/exo-lang/exo/blob/main/examples/rvm_conv1d/exo/conv1d.py) - -## Tips and Best Practices - -- **Define Clear Specifications**: Ensure that the body of your instruction accurately represents its semantics. -- **Use Assertions**: Include assertions in your instruction definitions to enforce constraints and ensure correctness. -- **Leverage Memory Annotations**: Use custom memory annotations to model hardware-specific memory behaviors (e.g., `Neon`, `RVM`). -- **Pattern Matching**: Structure your code to facilitate pattern matching with instruction definitions. -- **Test Thoroughly**: Verify that replacements are correct and that the generated code behaves as expected. - -## Conclusion - -By defining custom instructions and using the `replace` primitive, Exo provides a powerful mechanism to optimize code for specific hardware architectures directly within the user code. This approach offers flexibility and control, enabling developers to harness hardware acceleration without the need for extensive compiler support. - -**Key Takeaways**: - -- **Custom Instructions**: Define hardware-specific instructions with precise semantics. -- **Pattern Matching**: Use Exo's pattern matching to replace code fragments safely. -- **Code Generation**: Emit custom C code, including inline assembly, tailored to your hardware. -- **Optimization**: Optimize existing code by replacing computational patterns with hardware-accelerated instructions. - ---- - -**Note**: The examples provided are illustrative and may need adjustments to fit your specific hardware and use cases. Ensure that any external functions or assembly code used in the `@instr` decorator are properly defined and compatible with your target architecture. diff --git a/docs/object_code.md b/docs/object_code.md index a3ada455..38028674 100644 --- a/docs/object_code.md +++ b/docs/object_code.md @@ -1,6 +1,6 @@ # Exo Object Code Syntax -Exo is a programming language designed for performance-critical code, providing fine-grained control over code generation and optimization. In Exo, object code can be defined using Python-like syntax with specific annotations and constructs to model low-level programming concepts. +In Exo, object code can be defined using Python-like syntax with specific annotations and constructs to model low-level programming concepts. This documentation explains Exo's object code syntax using the following example of a 1D convolution operation: @@ -45,7 +45,7 @@ def generic_conv1d( ### `@proc` Decorator -The `@proc` decorator is used to define an Exo procedure (analogous to a function in other programming languages). It indicates that the following function definition should be treated as Exo object code, which can be further optimized and transformed. +The `@proc` decorator is used to define an Exo procedure (analogous to a function in other programming languages). It indicates that the following function definition should be treated as Exo object code (not Python), which can be further optimized and transformed. ```python @proc @@ -55,7 +55,7 @@ def function_name(arguments): ### Type and Memory Annotations -In Exo, types and memory spaces are explicitly annotated to provide precise control over data representation and placement. The syntax for annotations is: +In Exo, types and memory spaces are explicitly annotated. The syntax is: ```python name: type[size] @ memory @@ -66,7 +66,7 @@ name: type[size] @ memory - **`[size]`**: The dimensions of the array (optional for scalars). - **`@ memory`**: The memory space where the variable resides. -## Procedure Arguments +#### Procedure Arguments Procedure arguments are declared with their types, sizes, and memory spaces. They can have dependent sizes based on other arguments. @@ -81,9 +81,9 @@ data: i32[IC, N] @ DRAM - **`[IC, N]`**: A 2D array with dimensions `IC` and `N`. - **`@ DRAM`**: Specifies that `data` resides in DRAM memory. -## Variable Declarations +#### Allocations -Variables within the procedure are declared similarly to arguments but without the `@` annotation if they reside in default memory. +Variables within the procedure are declared similarly to arguments. Example: @@ -93,26 +93,13 @@ y: i32 - **`y`**: The variable name. - **`i32`**: The data type (32-bit integer). -- **No memory annotation**: Defaults to a standard memory space (e.g., registers). +- **No memory annotation**: Defaults to `DRAM` if memory is unspecified. -## Memory Spaces +#### Memories -Memory spaces in Exo are used to model different hardware memory regions, such as DRAM, caches, or specialized memories. The `@` symbol is used to specify the memory space. - -Common memory spaces: - -- **`@ DRAM`**: Main memory. -- **`@ SRAM`**: Static RAM or cache. -- **`@ Registers`**: CPU registers. - -Example: - -```python -out: i32[OC, N] @ DRAM -``` - -- **`out`**: Output array. -- **Resides in DRAM memory.** +Memory annotations in Exo are used to model different hardware memory regions, such as DRAM, caches, or specialized memories. The `@` symbol is used to specify the memory space, for example: `@DRAM`, `@AVX2`, or `@Neon`. +Memory annotations for your custom hardware accelerators can be defined externally to Exo and can be used as annotations in the same way. +While Exo provides default memory (`DRAM`) and some library memory definitions for convenience (`AVX2`, `AVX512`, `Neon`, `GEMM_SCRATCH`, etc.), it is recommended and encouraged that users define their own memory annotations for their specific hardware. For more information on defining custom memory annotations, refer to [memories.md](./memories.md). ## Loops @@ -126,7 +113,7 @@ for loop_variable in seq(start, end): ``` - **`loop_variable`**: The loop counter variable. -- **`seq(start, end)`**: Generates a sequence from `start` to `end - 1`. +- **`seq(start, end)`**: Iterates from `start` to `end - 1`. Example from the code: @@ -160,7 +147,7 @@ else: - Checks if `j + r` is less than `N`. - Assigns `y` accordingly. -## Operations and Assignments +## Assignments - **Assignment (`=`)**: Assigns a value to a variable. @@ -168,7 +155,7 @@ else: y = data[c, j + r] ``` -- **In-place Addition (`+=`)**: Adds a value to a variable and stores the result back. +- **Reduction (`+=`)**: Adds a value to a variable and stores the result back. ```python out[i, j] += kernels[i, c, r] * y @@ -180,6 +167,30 @@ else: data[c, j + r] ``` +## Limitations + +Exo has a few limitations that users should be aware of: + +1. **Non-affine indexing**: Exo does not support non-affine indexing. This means that any indexing operation must be a linear combination of loop variables and constants. For example, the following expressions are not allowed: + + ```python + data[i * j + r] = 0.0 # i * j is non-affine + if n * m < 30: # n * m is non-affine + pass + ``` + + To work around this limitation, you may need to restructure your code or use additional variables to represent the non-affine expressions. + +2. **Value-dependent control flow**: Exo separates control values from buffer values, which means that it is not possible to write value-dependent control flow. For instance, the following code is not allowed: + + ```python + if data[i] < 3.0: + pass + ``` + + If you need to express such operations, consider using externs (see [externs documentation](./externs.md)). + + ## Understanding the Example Let's break down the example code step by step. @@ -259,21 +270,3 @@ out[i, j] += kernels[i, c, r] * y - **Operation**: Accumulates the product of the kernel weight and the input data into the output. - **`kernels[i, c, r]`**: Kernel weight for output channel `i`, input channel `c`, at position `r`. - **`y`**: The input data value or zero. - -## Conclusion - -This example demonstrates how Exo's object code syntax allows for precise and expressive definitions of computations, particularly for performance-critical operations like convolutions. By understanding the annotations, loops, and operations, you can write efficient Exo procedures that can be further optimized and transformed for specific hardware targets. - -### Key Points - -- **Annotations**: Use `name: type[size] @ memory` to declare variables with explicit types and memory spaces. -- **Loops**: Utilize `for` loops with `seq(start, end)` for controlled iteration. -- **Conditionals**: Implement boundary checks and other logic using `if` and `else`. -- **Operations**: Perform computations using standard arithmetic operators, with support for in-place updates. - -### Further Reading - -- **Exo Documentation**: Explore more about Exo's syntax and capabilities in the official documentation. -- **Optimizations**: Learn how to apply scheduling primitives and transformations to optimize Exo procedures. - -By leveraging Exo's powerful syntax and features, you can develop high-performance code tailored to specific hardware architectures, enabling efficient execution of complex algorithms. From cb9c03a56f6f6473e5e22e8dd0daa0e45a0d0aae Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Sun, 27 Oct 2024 20:49:57 -0400 Subject: [PATCH 17/26] update instrutions and memories --- docs/instructions.md | 130 +++++++++++++++---------------------------- docs/memories.md | 11 ++-- 2 files changed, 53 insertions(+), 88 deletions(-) diff --git a/docs/instructions.md b/docs/instructions.md index af38295f..46561376 100644 --- a/docs/instructions.md +++ b/docs/instructions.md @@ -2,7 +2,6 @@ Exo allows users to define custom hardware instructions within their code using the `@instr` annotation. These user-defined instructions can be leveraged during the scheduling process to replace specific code fragments with calls to hardware-optimized instructions. -This feature enables fine-grained control over code optimization, making it easier to target specific architectures like SIMD units or custom accelerators. ## Overview @@ -67,134 +66,97 @@ class Neon(Memory): Once you've defined a custom instruction, you can use it to replace code fragments in your procedures. -### Step 1: Define Your Procedure +### Define Your Procedure Define your Exo procedure as usual. ```python @proc def foo(src: [f32][4] @ DRAM, dst: [f32][4] @ Neon): - for i in seq(0, 4): - dst[i] = src[i] + ... + for i in seq(0, ...): + ... + for j in seq(0, 4): + dst[j] = src[j] + ... ``` -### Step 2: Use `replace` to Substitute the Instruction +### Use `replace` to Substitute the Instruction Use the `replace` primitive to substitute the loop with the custom instruction. ```python # Replace the loop with the custom instruction -foo = replace(foo, "for i in _:_", neon_vld_4xf32) +foo = replace(foo, "for j in _:_", neon_vld_4xf32) ``` -#### Explanation - - **`replace(foo, "for i in _:_", neon_vld_4xf32)`**: - **`foo`**: The procedure in which to perform the replacement. - **`"for i in _:_"`**: A cursor pointing to the loop to replace. - **`neon_vld_4xf32`**: The instruction to replace the loop with. -### How `replace` Works - -Unification... - -### Step 3: Compile and Generate Code - -Compile your procedure to generate the optimized C code. - +After `replace`, the procedure `foo` will look like: ```python -print(p) -``` - -### Generated C Code - -```c -void foo(float src[4], float32x4_t dst) { - dst = vld1q_f32(&src[0]); -} +@proc +def foo(M: size, src: [f32][4] @ DRAM, dst: [f32][4] @ Neon): + ... + for i in seq(0, M/4): + ... + neon_vld_4xf32(dst, src) + ... ``` -- **`dst = vld1q_f32(&src[0]);`**: The custom instruction is emitted as specified in the `@instr` decorator, with placeholders replaced. - -## Understanding the Magic +#### How `replace` Works -By defining the behavior of hardware instructions in Python using Exo procedures, you can express the semantics of your accelerator or specialized hardware. The `replace` primitive allows Exo to reason about whether it's safe to offload certain computations to hardware instructions based on their specifications. - -- **No Compiler Backend Needed**: The heavy lifting is done within Exo, eliminating the need for a separate compiler backend. -- **Semantics Encoding**: The instruction's body acts as a specification, encoding its semantics for Exo's pattern matching. -- **Flexible and Extensible**: Users can define any instruction and specify how it should be matched and replaced. - -## The `replace` Primitive - -The `replace` primitive is used to substitute a fragment of code within a procedure with a call to another procedure (e.g., a custom instruction). - -### Syntax +The `replace` primitive is used to substitute a fragment of code within a procedure with a call to another procedure (e.g., a custom instruction). The syntax for `replace` is as follows: ```python replace(proc, cursor_path, subproc) ``` - **`proc`**: The procedure containing the code to be replaced. -- **`cursor_path`**: A string or cursor pointing to the code fragment. +- **`cursor`**: A cursor pointing to the code fragment to be replaced. - **`subproc`**: The procedure whose body will replace the code fragment. -### Documentation +The `replace` primitive works by performing an unification modulo linear equalities. The process can be broken down into two main steps: -The `replace` primitive is documented in [primitives/subproc_ops.md](primitives/subproc_ops.md). +1. **Pattern Matching**: The body of the sub-procedure `subproc` is unified (pattern matched) with the designated statement block `s` in the original procedure `proc`. During this process: + - The arguments of `subproc` are treated as unknowns. + - The free variables of `s` are treated as known symbols. + - Any symbols introduced or bound within the body of `subproc` or within `s` are unified. -## Practical Example: RISC-V Matrix Multiply + The ASTs (Abstract Syntax Trees) of `subproc` and `s` are required to match exactly with respect to statements and all expressions that are not simply integer-typed control. -### Step 1: Define the Instruction +2. **Solving Linear Equations**: Any equivalences between integer-typed control expressions are recorded as a system of linear equations. These equations are then solved to determine the values of the unknowns and ensure a consistent substitution. -```python -@instr("{dst} = asm_rvm_macc({src_a}, {src_b}, {dst});") -def rvm_macc(dst: f32 @ RVM, src_a: f32 @ RVM, src_b: f32 @ RVM): - dst += src_a * src_b -``` - -- **`asm_rvm_macc`**: Hypothetical assembly function for RISC-V multiply-accumulate. -- **Specification**: The procedure specifies that `dst += src_a * src_b`. - -### Step 2: Use the Instruction in a Procedure - -```python -@proc -def matmul_rvm(A: f32[M, K], B: f32[K, N], C: f32[M, N]): - for i in seq(0, M): - for j in seq(0, N): - for k in seq(0, K): - C[i, j] += A[i, k] * B[k, j] -``` +By following this process, the `replace` primitive effectively replaces the designated code fragment with a call to the sub-procedure, while ensuring that the substitution is valid and consistent. -### Step 3: Optimize Using `replace` -```python -p = matmul_rvm +### Generated C Code -# Apply transformations to expose the computation pattern -... +`exocc` can be used to compile Exo code into C. -# Replace the innermost loop with the custom instruction -p = replace(p, "for k in _:_", rvm_macc) +```c +void foo(float src[4], float32x4_t dst) { + ... + for (int_fast32_t i = 0; i < ...; i++) { + ... + dst = vld1q_f32(&src[0]); + } + ... +} ``` -### Step 4: Compile and Generate Code +- **`dst = vld1q_f32(&src[0]);`**: The custom instruction is emitted as specified in the `@instr` decorator, with arguments replaced. -```python -print(p) -``` +## Understanding the Magic -### Generated C Code +By defining the behavior of hardware instructions in Python using Exo procedures, you can express the semantics of your accelerator or specialized hardware. The `replace` primitive allows Exo to reason about whether it's safe to offload certain computations to hardware instructions based on their specifications. + +- **No Compiler Backend Needed**: The heavy lifting is done within Exo, eliminating the need for a separate compiler backend. +- **Semantics Encoding**: The instruction's body acts as a specification, encoding its semantics for Exo's pattern matching. +- **Flexible and Extensible**: Users can define any instruction and specify how it should be matched and replaced. -```c -void matmul_rvm(float A[M][K], float B[K][N], float C[M][N]) { - for (int i = 0; i < M; i++) { - for (int j = 0; j < N; j++) { - C[i][j] = asm_rvm_macc(A[i][k], B[k][j], C[i][j]); - } - } -} -``` ## Further Reading and Examples diff --git a/docs/memories.md b/docs/memories.md index a48fe9a2..1e9a4a73 100644 --- a/docs/memories.md +++ b/docs/memories.md @@ -1,15 +1,18 @@ # External Memory Definitions -Exo allows users to define custom memory types external to the compiler. This feature enables modeling of specialized memory systems, such as vector machines and hardware accelerator memories, directly within your Exo code. By defining custom memories, you can optimize your programs for specific hardware architectures and achieve better performance. +Exo allows users to define custom memory types external to the compiler. +This feature enables modeling of specialized memory systems, such as vector machines and hardware accelerator memories, directly within your Exo code. +By defining custom memories, you can optimize your programs to target specific hardware architectures. ## Overview - **Custom Memories**: Define your own memory types by subclassing the `Memory` class. -- **Usage**: Use custom memories as annotations in your Exo code or apply them during scheduling. +- **Usage**: Use custom memories as annotations in your Exo code or set them during scheduling. ## Defining Custom Memories -To define a custom memory, you need to create a class that inherits from `Memory` and implement the required methods. Below is an example of defining an `AVX512` memory, which models the AVX-512 vector registers. +To define a custom memory, you need to create a class that inherits from `Memory` and implement the required methods. +Below is an example of defining an `AVX512` memory, which models the AVX-512 vector registers. ### Example: Defining AVX512 Memory @@ -144,7 +147,7 @@ def foo(x: f32[16] @ AVX512): Use the `set_memory` primitive to change the memory annotation of a variable during scheduling. - **`set_memory(p, "C", AVX512)`**: Changes the memory of variable `C` in procedure `p` to `AVX512`. -- This is common when optimizing existing code for specific hardware. +- This is common when optimizing simple object code (e.g., GEMM) for specific hardware. #### Documentation for `set_memory` From 078fbcc6e3d826243db21490dd5d363cde1c9585 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 29 Oct 2024 14:31:56 -0400 Subject: [PATCH 18/26] add inspection.md and address some of David's comments --- docs/Design.md | 11 +++++----- docs/Imports.md | 3 +++ docs/README.md | 3 ++- docs/inspection.md | 50 ++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 61 insertions(+), 6 deletions(-) create mode 100644 docs/inspection.md diff --git a/docs/Design.md b/docs/Design.md index 9ac288d6..153aa7d2 100644 --- a/docs/Design.md +++ b/docs/Design.md @@ -17,9 +17,9 @@ One of the main ideas behind Exo is **exocompilation**, which allows users to de - The cost of adding support for new hardware is significantly reduced. - Proprietary details of hardware can be protected. -Users can model custom memories, instructions, and configuration state in libraries to target a specific accelerator. These hardware abstractions can then be used to write hand-optimized code or as building blocks for higher-level scheduling transformations. +Users can model custom [memories](./memories.md), [instructions](./instructions.md), and configuration state in libraries to target a specific accelerator. These hardware abstractions can then be used to write hand-optimized code or as building blocks for higher-level scheduling transformations. -More info can be found in the [PLDI paper](https://people.csail.mit.edu/yuka/pdf/exo_pldi2022_full.pdf) and [instructions.md](./instructions.md) and [memories.md](./memories.md). +More info can be found in the [PLDI paper](https://people.csail.mit.edu/yuka/pdf/exo_pldi2022_full.pdf), [instructions.md](./instructions.md), and [memories.md](./memories.md). ## Fine-Grained Primitives for Performance Control @@ -45,7 +45,8 @@ The rewrite-based approach offers several advantages: While the flexibility of fine-grained primitives is necessary for achieving peak performance, directly using them can be verbose and laborious. To address this, Exo allows users to define new higher-level scheduling operations by composing the core primitives. -These user-defined scheduling operations can encapsulate common optimization patterns and hardware-specific transformations, greatly improving productivity. They can be put together in reusable libraries, further enabling modularity and portability. +These user-defined scheduling operations can encapsulate common optimization patterns and hardware-specific transformations such as auto-vectorize, tiling, and even simulate scheduling operations from other USLs (like Halide's `compute_at`). +They can be put together in reusable libraries, further enabling modularity and portability. More infomation can be found in the [ASPLOS paper](.) and [Cursor.md](./Cursor.md). @@ -53,8 +54,8 @@ More infomation can be found in the [ASPLOS paper](.) and [Cursor.md](./Cursor.m We identified that Action, Inspection, and Reference are the key scheduling language design mechanisms that enable user-defined scheduling operations. -- **Actions** are the scheduling primitives that transform the code (e.g., `divide_loop`, `reorder`). -- **Inspections** query properties of the code (e.g., loop bounds, memory access patterns). +- **[Actions](./primitives)** are scheduling operations that transform the code. This could be compiler-provided *primitive actions* (e.g., `divide_loop`, `reorder`), or *user-defined* (e.g., tile2D in the ASPLOS paper). +- **[Inspections](./inspection.md)** query properties of the code (e.g., loop bounds, memory access patterns). - **References** point to specific parts of the code to apply actions to. Together, AIR allows scheduling operations to be defined as composable rewrites on the code. The language implementation guarantees the correctness of these primitive rewrites with a set of effect analyses. diff --git a/docs/Imports.md b/docs/Imports.md index 81cbd65c..f5385c8f 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -2,6 +2,9 @@ This document provides an overview of the imports used when writing Exo. +Exo's parser only resolves names in the local and global namespaces, and Exo reserves the attribute syntax (foo.bar) for configurations. +Therefore, if users wish to utilize Exo constructs, they must import them into their local namespace. + ## Table of Contents 1. [Standard Python Future Import](#1-standard-python-future-import) diff --git a/docs/README.md b/docs/README.md index 64e4fa60..b7d4c537 100644 --- a/docs/README.md +++ b/docs/README.md @@ -5,7 +5,8 @@ This directory provides detailed documentation about Exo's interface and interna - To learn about the design principles of Exo, read [Design.md](Design.md). - To understand how the Exo system is implemented, read [System.md](System.md). - For information on writing Exo object code, APIs, and imports, refer to [Procedures.md](Procedures.md), [object_code.md](object_code.md), and [Imports.md](Imports.md). -- To learn how to define memory, instructions, and externs externally to the compiler in the user code, refer to [externs.md](externs.md), [instructions.md](instructions.md), and [memories.md](memories.md). +- To learn how to define **hardware targets externally to the compiler**, refer to [externs.md](externs.md), [instructions.md](instructions.md), and [memories.md](memories.md). +- To learn how to define **new scheduling operations externally to the compiler**, refer to [Cursors.md](./Cursors.md) and [inspection.md](./inspection.md). - To understand the available scheduling primitives and how to use them, look into the primitives/ directory. The scheduling primitives are classified into six categories: diff --git a/docs/inspection.md b/docs/inspection.md new file mode 100644 index 00000000..8760ede4 --- /dev/null +++ b/docs/inspection.md @@ -0,0 +1,50 @@ +# External Inspection Functions + +Inspection is a metaprogramming feature that enables metaprograms (like schedules) to dynamically examine the properties of object code. Exo provides inspection through [Cursors](./Cursors.md), allowing users to examine standard AST properties such as variable names, literal expression values, and annotations (e.g., memory spaces and precisions) at scheduling time. Cursors also support local AST navigation, for example, accessing loop bounds (`loop.hi()`) and bodies (`loop.body()`). Inspection functions can be written externally from the Exo compiler, giving users the ability to customize them according to their needs. +For convinience, standard library inspection functions are provided as `exo.stdlib.inspection` module. + +Cursor types (such as `ForCursor` and `IfCursor`) are defined in `exo.API_cursors`, so you should import it when writing inspection functions: + +```python +from exo.API_cursors import * +``` + +Here are some simple inspection functions: + +```python +def is_loop(proc, loop): + loop = proc.forward(loop) + return isinstance(loop, ForCursor) + +def get_top_level_stmt(proc, c): + c = proc.forward(c) + + while not isinstance(c.parent(), InvalidCursor): + c = c.parent() + return c +``` + +Explanation: +- The `is_loop` function takes a `proc` object and a `loop` cursor as input. It forwards the `loop` cursor using `proc.forward(loop)` and checks if the resulting cursor is an instance of `ForCursor`. This function determines whether the given cursor points to a loop statement. +- The `get_top_level_stmt` function takes a `proc` object and a cursor `c` as input. It forwards the cursor `c` using `proc.forward(c)` and then iteratively moves the cursor to its parent using `c.parent()` until it reaches an `InvalidCursor`, which means the cursor reached the outer-most level of the procedure. This function finds the top-level statement that wraps the given cursor. + +Exo also exposes `ExoType` for expression types (defined in `src/exo/API_types.py`), which users can access using constructs like `ExoType.F16` and branch on it. + +```python +class ExoType(Enum): + F16 = auto() + F32 = auto() + F64 = auto() + UI8 = auto() + I8 = auto() + UI16 = auto() + I32 = auto() + R = auto() + Index = auto() + Bool = auto() + Size = auto() + Int = auto() + Stride = auto() +``` + +All the Cursor types and the kind of navigation you can perform on them are documented in [Cursors.md](./Cursors.md). From 78a8ae5f3ab79f6e8f244c90df937de496bd6d6e Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 29 Oct 2024 15:23:16 -0400 Subject: [PATCH 19/26] Add more stuff to object_code.md --- docs/Imports.md | 2 +- docs/README.md | 2 +- docs/object_code.md | 65 +++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 65 insertions(+), 4 deletions(-) diff --git a/docs/Imports.md b/docs/Imports.md index f5385c8f..6bf3ad8d 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -89,7 +89,7 @@ Alternatively, users can define their own scheduling operations by composing sch ## 8. API Cursors -Cursors (see [Cursors.md](./Cursors.md)) are Exo's reference mechanism that allows users to navigate and inspect object code. When users define new scheduling operators using Cursors, they may wish to write their own inspection pass. API Cursors define types that will be useful for user inspection. +Cursors (see [Cursors.md](./Cursors.md)) are Exo's reference mechanism that allows users to navigate and inspect object code. When users define new scheduling operators using Cursors, they may wish to write their own inspection pass (see [inspection.md](./inspection.md). API Cursors define types that will be useful for user inspection. ```python from exo.API_cursors import ForCursor, AssignCursor, InvalidCursor diff --git a/docs/README.md b/docs/README.md index b7d4c537..32288783 100644 --- a/docs/README.md +++ b/docs/README.md @@ -7,7 +7,7 @@ This directory provides detailed documentation about Exo's interface and interna - For information on writing Exo object code, APIs, and imports, refer to [Procedures.md](Procedures.md), [object_code.md](object_code.md), and [Imports.md](Imports.md). - To learn how to define **hardware targets externally to the compiler**, refer to [externs.md](externs.md), [instructions.md](instructions.md), and [memories.md](memories.md). - To learn how to define **new scheduling operations externally to the compiler**, refer to [Cursors.md](./Cursors.md) and [inspection.md](./inspection.md). -- To understand the available scheduling primitives and how to use them, look into the primitives/ directory. +- To understand the available scheduling primitives and how to use them, look into the [primitives/](./primitives) directory. The scheduling primitives are classified into six categories: diff --git a/docs/object_code.md b/docs/object_code.md index 38028674..b4bd24ab 100644 --- a/docs/object_code.md +++ b/docs/object_code.md @@ -62,13 +62,14 @@ name: type[size] @ memory ``` - **`name`**: The variable name. -- **`type`**: The data type (e.g., `i32`, `f32`). +- **`type`**: The data type. Supported precision types are: `f16`, `f32`, `f64`, `i8`, `i32`, `ui8`, and `ui16`. - **`[size]`**: The dimensions of the array (optional for scalars). - **`@ memory`**: The memory space where the variable resides. + #### Procedure Arguments -Procedure arguments are declared with their types, sizes, and memory spaces. They can have dependent sizes based on other arguments. +Procedure arguments are declared with their types, sizes, and memory spaces. They can have sizes that depend on other arguments. Example from the code: @@ -81,6 +82,59 @@ data: i32[IC, N] @ DRAM - **`[IC, N]`**: A 2D array with dimensions `IC` and `N`. - **`@ DRAM`**: Specifies that `data` resides in DRAM memory. +The `data` buffer above represents **tensor** types, which means the stride of the innermost dimension is 1, and the strides of other dimensions are simple multiples of the shapes of the inner dimensions. + +Exo allows **window expressions** as well, which are similar to array slicing in Python. Instead of accessing the buffer point-wise (e.g., `x[i]`), users can *window* the array as `x[i:i+2]`. This will create a windowed array of size 2. +Exo procedures take tensor expressions when annotated with `x:f32[3]` syntax and take window expressions when annotated with `x:[f32][3]`, with square brackets around the types. + +```python +@proc +def foo(x: [f32][3]): + for i in seq(0, 3): + x[i] = 0.0 + +@proc +def bar(y: f32[10], z: f32[20, 20]): + foo(y[2:5]) + foo(z[1, 10:13]) +``` + +In this example, `foo` takes a window array of size 3, and `bar` calls `foo` by slicing `y` and `z`, respectively. Running `exocc` on this will generate the following C code: + +```c +#include "tmp.h" + +#include +#include + +// bar( +// y : f32[10] @DRAM, +// z : f32[20, 20] @DRAM +// ) +void bar(void *ctxt, float* y, float* z) { + foo(ctxt, (struct exo_win_1f32){ &y[2], { 1 } }); + foo(ctxt, (struct exo_win_1f32){ &z[20 + 10], { 1 } }); +} + +// foo( +// x : [f32][3] @DRAM +// ) +void foo(void *ctxt, struct exo_win_1f32 x) { + for (int_fast32_t i = 0; i < 3; i++) { + x.data[i * x.strides[0]] = 0.0f; + } +} +``` + +Moreover, Exo checks the consistency of tensor and window bounds in the frontend. If you modify `foo(y[2:5])` to `foo(y[2:6])` in the code above, the bounds check will fail and emit the following error: + +``` +TypeError: Errors occurred during effect checking: +/private/tmp/tmp.py:12:8: type-shape of calling argument may not equal the required type-shape: [Effects.BinOp(op='-', lhs=Effects.Const(val=6, type=LoopIR.Int(), srcinfo=), rhs=Effects.Const(val=2, type=LoopIR.Int(), srcinfo=), type=LoopIR.Index(), srcinfo=)] vs. [Effects.Const(val=3, type=LoopIR.Int(), srcinfo=)]. It could be non equal when: + y_stride_0 = 1, z_stride_0 = 20, z_stride_1 = 1 +``` + + #### Allocations Variables within the procedure are declared similarly to arguments. @@ -167,6 +221,11 @@ else: data[c, j + r] ``` +- **Window Statements**: Creates a slice (in other words, _window_) of the buffer and assign a new name. + ```python + y = x[0:3] + ``` + ## Limitations Exo has a few limitations that users should be aware of: @@ -179,6 +238,8 @@ Exo has a few limitations that users should be aware of: pass ``` + Exo allows quasi-affine indexing by division (e.g., `i/3`) and modulo (e.g., `i%3`) by constants. + To work around this limitation, you may need to restructure your code or use additional variables to represent the non-affine expressions. 2. **Value-dependent control flow**: Exo separates control values from buffer values, which means that it is not possible to write value-dependent control flow. For instance, the following code is not allowed: From bd7771349c845bb103341d8c5fdf89ad401595e9 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 29 Oct 2024 17:19:23 -0400 Subject: [PATCH 20/26] add more text to memories --- docs/memories.md | 54 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 46 insertions(+), 8 deletions(-) diff --git a/docs/memories.md b/docs/memories.md index 1e9a4a73..262e6f6f 100644 --- a/docs/memories.md +++ b/docs/memories.md @@ -72,13 +72,21 @@ class AVX512(Memory): return False ``` -- **`alloc(cls, new_name, prim_type, shape, srcinfo)`**: Defines how memory allocation is handled. For `AVX512`, it ensures that the allocated memory represents 16-wide vectors of `float` type. +- **`alloc(cls, new_name, prim_type, shape, srcinfo)`**: Defines how to lower `LoopIR.Alloc` into C code. + Allocation in Exo is expressed as `x : f32[N, M]`. + - `new_name`: A C string representing the allocated variable name. In this example, it would be `"x"`. + - `prim_type`: A C string representing the primitive data type. In this example, it would be `"float"`. The mapping from LoopIR types to C types is as follows: + - `f16` -> `"_Float16"` + - `f32` -> `"float"` + - `f64` -> `"double"` + - `i8` -> `"int8_t"` + - `ui8` -> `"uint8_t"` + - `ui16`-> `"uint16_t"` + - `i32` -> `"int32_t"` + - `shape`: A list of C strings representing the shape of each dimension. In the example above, it would be `["N", "M"]`. + + For `AVX512` memory, the `alloc` method ensures that the allocated memory represents 16-wide vectors of the `float` type. - ```python - @classmethod - def alloc(cls, new_name, prim_type, shape, srcinfo): - # Validation checks and allocation code - ``` - **`free(cls, new_name, prim_type, shape, srcinfo)`**: Handles memory deallocation. For `AVX512`, no action is needed. @@ -88,14 +96,44 @@ class AVX512(Memory): return "" ``` -- **`window(cls, basetyp, baseptr, indices, strides, srcinfo)`**: Defines how to access elements in the memory. +- **`window(cls, basetyp, baseptr, indices, strides, srcinfo)`**: Defines how array accesses are lowered into C code. + + Usually, you cannot access your specialized hardware accelerator memory from C code, and you will need to use your accelerator instructions to access it, like the following: + + ```python + x : f32[16,16] @ your_memory + your_instr(x[0, 0:16]) + ``` + + The `window` member defines how `x[0, 0:16]` should be lowered to C code, as different accelerator instructions and memory have different addressing schemes. + + For example, the Gemmini accelerator's scratchpad memory is 2D and has a fixed column width of 16. The Gemmini instruction expects accessing the scratchpad by *number of rows* only, and accessing columns is not permitted. Therefore, the window definition will look like: ```python @classmethod def window(cls, basetyp, baseptr, indices, strides, srcinfo): - # Windowing logic for memory access + # Assume that strides[-1] == 1 + # and that strides[-2] == 16 (if there is a strides[-2]) + assert len(indices) == len(strides) and len(strides) >= 2 + prim_type = basetyp.basetype().ctype() + offset = generate_offset(indices, strides) + return ( + f"*({prim_type}*)((uint64_t)( " + f"((uint32_t)((uint64_t){baseptr})) + " + f"({offset})/16))" + ) ``` + Explanation of arguments: + - `basetyp`: type of the buffer in `LoopIR.type` + - `baseptr`: C pointer string to the buffer (e.g., `x`) + - `indices`: List of C strings for index accesses for each dimension + - `strides`: List of C strings for strides for each dimension + - `srcinfo`: Source location information, Can be used for error messages + + Both tensor and window expressions will be resolved to vanilla indices and strides. + + ## Understanding `can_read` The `can_read` method controls whether direct array access is allowed for the memory type. When `can_read` is set to `False`, you cannot read or write to the memory using standard array indexing in Exo or the generated C code. This models hardware that requires special instructions for memory access, such as vector registers. From ce38be9fd476c119d11fff2330fb9bd3f8df91b0 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 29 Oct 2024 22:25:28 -0400 Subject: [PATCH 21/26] Address David's comments --- docs/System.md | 10 ++++++++-- docs/externs.md | 11 +++++++++++ docs/object_code.md | 10 ++++++++++ 3 files changed, 29 insertions(+), 2 deletions(-) diff --git a/docs/System.md b/docs/System.md index 3106287b..c6869f8b 100644 --- a/docs/System.md +++ b/docs/System.md @@ -4,7 +4,13 @@ This document provides an overview of the Exo compilation process, as illustrate ![System overview](images/system-overview.png) -The Exo compiler consists of a frontend and a backend, with user schedules applied in between. The input to the compiler is a set of Exo source files (`*.py`), and the output is generated C code (`*.c`). +The Exo compiler's frontend starts by parsing the Python AST and constructing the Untyped Exo AST (UAST). +It then runs various frontend checks before converting the UAST into LoopIR, which serves as Exo's primary IR. +Exo supports rewrite-based user-scheduling, where scheduling primitives take a LoopIR and returns another (transformed) LoopIR. +These primitives take the immutable LoopIR and rewrite it into a new LoopIR. +Finally, in the backend, the optimized LoopIR is code-generated into C code. + +The input to the compiler is a set of Exo source files (`*.py`), and the output is generated C code (`*.c`). In this repository, folders are structured as follows: @@ -35,7 +41,7 @@ In this repository, folders are structured as follows: User-defined features like config, externs, and Memory's parent class implementations are in `configs.py`, `extern.py`, and `memory.py`, respectively. -`internal_cursors` defines cursor movements that are used internally by `LoopIR_scheduling` implementations of scheduling primitives. +`internal_cursors` defines primitive cursor movements (see Section 5.2 "Cursor implementation" of our ASPLOS paper) that are used internally by `LoopIR_scheduling` implementations of scheduling primitives. `proc_eqv.py` defines a union-find tree which we use to track the equivalence of procedures. --- diff --git a/docs/externs.md b/docs/externs.md index 28f85bfe..294c0be8 100644 --- a/docs/externs.md +++ b/docs/externs.md @@ -67,6 +67,15 @@ def typecheck(self, args): ##### `compile(self, args, prim_type)` Define how the function is compiled into target code. +- `args`: list of arguments as C strings +- `prim_type`: A C string representing the primitive data type. It could be one of the following C strings, mapping from LoopIR types to C strings: + - `f16` -> `"_Float16"` + - `f32` -> `"float"` + - `f64` -> `"double"` + - `i8` -> `"int8_t"` + - `ui8` -> `"uint8_t"` + - `ui16`-> `"uint16_t"` + - `i32` -> `"int32_t"` ```python def compile(self, args, prim_type): @@ -100,6 +109,8 @@ sin = _Sin() Externs can be used as expressions on the RHS of assignment and reduction statements. This allows you to incorporate external functions seamlessly into your Exo computations. +Note that externs (and Exo procedures) do not allow aliasing in their arguments. This restriction is in place to prevent externs from having side effects on the input arguments. + ### Example: Using `sin` in an Expression Here's a complete example demonstrating how to define and use the `sin` extern function within an expression. diff --git a/docs/object_code.md b/docs/object_code.md index b4bd24ab..086f9198 100644 --- a/docs/object_code.md +++ b/docs/object_code.md @@ -134,6 +134,16 @@ TypeError: Errors occurred during effect checking: y_stride_0 = 1, z_stride_0 = 20, z_stride_1 = 1 ``` +#### Aliasing Limitations + +When passing buffers to procedure arguments, aliasing is not allowed. Concretely, you cannot write something like: + +```python +foo(y, y) +foo(y[0:5], y[2:7]) +``` + +This limitation exists because the analysis would be imprecise if we allowed such aliasing. This is similar to how C++ compilers can perform more optimization when you use the `__restrict__` keyword to explicitly indicate that you're not aliasing your buffers. #### Allocations From 57116ac7f433d762f1d4a9cc1685a151139d84ed Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 5 Nov 2024 12:40:31 -0500 Subject: [PATCH 22/26] fix --- docs/Imports.md | 10 ++++------ docs/object_code.md | 40 ++++++++++++++++++++++++++++++++-------- 2 files changed, 36 insertions(+), 14 deletions(-) diff --git a/docs/Imports.md b/docs/Imports.md index 6bf3ad8d..61eed7f1 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -11,11 +11,9 @@ Therefore, if users wish to utilize Exo constructs, they must import them into t 2. [Core Exo Module](#2-core-exo-module) 3. [Memory Libraries](#3-memory-libraries) 4. [Instruction Libraries](#4-instruction-libraries) -5. [Frontend Syntax Utilities](#5-frontend-syntax-utilities) -6. [Standard Library Modules](#6-standard-library-modules) - - [6.1 Scheduling Utilities](#61-scheduling-utilities) - - [6.2 Standard Library Functions](#62-standard-library-functions) -7. [External Interfaces](#7-external-interfaces) +5. [Extern Libraries](#5-extern-libraries) +6. [Frontend Syntax Utilities](#6-frontend-syntax-utilities) +7. [Standard Library Scheduling Functions](#7-standard-library-scheduling-functions) 8. [API Cursors](#8-api-cursors) @@ -89,7 +87,7 @@ Alternatively, users can define their own scheduling operations by composing sch ## 8. API Cursors -Cursors (see [Cursors.md](./Cursors.md)) are Exo's reference mechanism that allows users to navigate and inspect object code. When users define new scheduling operators using Cursors, they may wish to write their own inspection pass (see [inspection.md](./inspection.md). API Cursors define types that will be useful for user inspection. +Cursors (see [Cursors.md](./Cursors.md)) are Exo's reference mechanism that allows users to navigate and inspect object code. When users define new scheduling operators using Cursors, they may wish to write their own inspection pass (see [inspection.md](./inspection.md)). API Cursors define types that will be useful for user inspection. ```python from exo.API_cursors import ForCursor, AssignCursor, InvalidCursor diff --git a/docs/object_code.md b/docs/object_code.md index 086f9198..44822069 100644 --- a/docs/object_code.md +++ b/docs/object_code.md @@ -31,15 +31,14 @@ def generic_conv1d( - [Annotations and Decorators](#annotations-and-decorators) - [`@proc` Decorator](#proc-decorator) - [Type and Memory Annotations](#type-and-memory-annotations) -- [Procedure Arguments](#procedure-arguments) -- [Variable Declarations](#variable-declarations) -- [Memory Spaces](#memory-spaces) + - [Procedure Arguments](#procedure-arguments) + - [Allocations](#allocations) + - [Memories](#memories) - [Loops](#loops) - [`for` Loop Syntax](#for-loop-syntax) - [Conditional Statements](#conditional-statements) -- [Operations and Assignments](#operations-and-assignments) +- [Assignments](#assignments) - [Understanding the Example](#understanding-the-example) -- [Conclusion](#conclusion) ## Annotations and Decorators @@ -67,7 +66,7 @@ name: type[size] @ memory - **`@ memory`**: The memory space where the variable resides. -#### Procedure Arguments +### Procedure Arguments Procedure arguments are declared with their types, sizes, and memory spaces. They can have sizes that depend on other arguments. @@ -145,7 +144,30 @@ foo(y[0:5], y[2:7]) This limitation exists because the analysis would be imprecise if we allowed such aliasing. This is similar to how C++ compilers can perform more optimization when you use the `__restrict__` keyword to explicitly indicate that you're not aliasing your buffers. -#### Allocations + +#### Passing Tensor Window Slices to Functions Expecting Non-Window Tensors + +It is not allowed to pass _window_ to a function that expects a non-window tensor as an argument. Consider the following example: + +```python +@proc +def callee(x: f32[10]): + pass + +@proc +def caller(x: f32[2, 10]): + callee(x[0]) # Error: Passing a window slice to a function expecting a non-window tensor + callee(x[1, :]) # Error: Passing a window slice to a function expecting a non-window tensor +``` + +In this code snippet, the `callee` function expects a non-window tensor `x` of shape `f32[10]`. However, in the `caller` function, we attempt to pass slices of the `x` tensor (`x[0]` and `x[1]`) to the `callee` function. These slices are windows of the original tensor, and passing them to a function expecting a non-window tensor is not allowed. + +To resolve this issue, you can either: +1. Modify the `callee` function to accept a window tensor as an argument, or +2. Create a new non-window tensor from the slice before passing it to the `callee` function. + + +### Allocations Variables within the procedure are declared similarly to arguments. @@ -159,12 +181,14 @@ y: i32 - **`i32`**: The data type (32-bit integer). - **No memory annotation**: Defaults to `DRAM` if memory is unspecified. -#### Memories +### Memories Memory annotations in Exo are used to model different hardware memory regions, such as DRAM, caches, or specialized memories. The `@` symbol is used to specify the memory space, for example: `@DRAM`, `@AVX2`, or `@Neon`. Memory annotations for your custom hardware accelerators can be defined externally to Exo and can be used as annotations in the same way. While Exo provides default memory (`DRAM`) and some library memory definitions for convenience (`AVX2`, `AVX512`, `Neon`, `GEMM_SCRATCH`, etc.), it is recommended and encouraged that users define their own memory annotations for their specific hardware. For more information on defining custom memory annotations, refer to [memories.md](./memories.md). + + ## Loops ### `for` Loop Syntax From 2aaac54b71d60187f898f83c29c08e05dc919746 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 5 Nov 2024 13:05:55 -0500 Subject: [PATCH 23/26] update memories --- docs/memories.md | 6 +++--- docs/object_code.md | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/memories.md b/docs/memories.md index 262e6f6f..2cc946df 100644 --- a/docs/memories.md +++ b/docs/memories.md @@ -32,7 +32,7 @@ class AVX512(Memory): raise MemGenError(f"{srcinfo}: AVX512 vectors are not scalar values") if not prim_type == "float": raise MemGenError(f"{srcinfo}: AVX512 vectors must be f32 (for now)") - if not _is_const_size(shape[-1], 16): + if not shape[-1].isdecimal() and int(shape[-1]) == 16: raise MemGenError(f"{srcinfo}: AVX512 vectors must be 16-wide") shape = shape[:-1] if shape: @@ -85,7 +85,7 @@ class AVX512(Memory): - `i32` -> `"int32_t"` - `shape`: A list of C strings representing the shape of each dimension. In the example above, it would be `["N", "M"]`. - For `AVX512` memory, the `alloc` method ensures that the allocated memory represents 16-wide vectors of the `float` type. + For `AVX512` memory, the `alloc` method ensures that the allocated memory represents 16-wide vectors (`shape[-1].isdecimal() and int(shape[-1]) == 16`) of the `float` type (`prim_type == "float"`). - **`free(cls, new_name, prim_type, shape, srcinfo)`**: Handles memory deallocation. For `AVX512`, no action is needed. @@ -156,7 +156,7 @@ To interact with the memory, you must use specific instructions or operations de x: f32[16] @ AVX512 mm512_loadu_ps(x, inp[16*i : 16*i+16]) ``` -- **Instructions Documentation**: [instructions.md](instructions.md) +To learn more about how to define and use instructions in Exo, see [instructions.md](./instructions.md). ## Using Custom Memories diff --git a/docs/object_code.md b/docs/object_code.md index 44822069..86328826 100644 --- a/docs/object_code.md +++ b/docs/object_code.md @@ -147,7 +147,7 @@ This limitation exists because the analysis would be imprecise if we allowed suc #### Passing Tensor Window Slices to Functions Expecting Non-Window Tensors -It is not allowed to pass _window_ to a function that expects a non-window tensor as an argument. Consider the following example: +It is not allowed to pass a _window_ to a function that expects a non-window tensor as an argument. Consider the following example: ```python @proc From a5a2193509c666436f721d6f5195555ab6d0f9bb Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 5 Nov 2024 13:31:27 -0500 Subject: [PATCH 24/26] update instructions --- docs/instructions.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/instructions.md b/docs/instructions.md index 46561376..15c54f8a 100644 --- a/docs/instructions.md +++ b/docs/instructions.md @@ -42,12 +42,13 @@ def neon_vld_4xf32(dst: [f32][4] @ Neon, src: [f32][4] @ DRAM): dst[i] = src[i] ``` -- **`@instr("{dst_data} = vld1q_f32(&{src_data});")`**: Specifies the C code to emit when this instruction is called. - - `{dst_data}` and `{src_data}` are format strings that will be replaced with the actual arguments during codegen. +- **`@instr` decorators**: Specifies the semantics of the hardware instruction and the C code to emit. + - `{dst_data}` and `{src_data}` are format strings that will be replaced with the actual arguments during codegen. You can put `_data` after the function argument names and surround them with curly braces (`{dst_data}`). + - `"{dst_data} = vld1q_f32(&{src_data});"`: The argument to `@instr` decorators specifies the C code to emit for this instruction. - **`dst: [f32][4] @ Neon`**: Declares `dst` as a 4-element array of `f32` in `Neon` memory. - **`src: [f32][4] @ DRAM`**: Declares `src` as a 4-element array of `f32` in `DRAM`. - **Assertions**: Ensure that the strides of `src` and `dst` are 1 for correct memory access. -- **Loop**: The loop specifies the semantics of the instruction, copying elements from `src` to `dst`. +- **Body**: The function body specifies the semantics of the instruction (written in Exo object code), copying elements from `src` to `dst`. ### Defining the Memory Annotation `Neon` From 2d970494717fd206ec3fd06296eb7c1963d94239 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 5 Nov 2024 13:56:49 -0500 Subject: [PATCH 25/26] Update extern doc --- docs/externs.md | 6 ++++-- docs/instructions.md | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/externs.md b/docs/externs.md index 294c0be8..4fa436f7 100644 --- a/docs/externs.md +++ b/docs/externs.md @@ -41,7 +41,7 @@ def __init__(self): super().__init__("sin") ``` -- `"sin"`: The name of the external function as it will appear in the generated code. +- `"sin"`: The name of the external function as it will appear in the Exo object code. ##### `typecheck(self, args)` @@ -94,6 +94,7 @@ def globl(self, prim_type): ``` - Includes necessary headers required for the external function (e.g., `` for mathematical functions). +- `globl` is called and is instantiated for every `prim_type`s. #### 4. Instantiate the Extern Function @@ -109,7 +110,8 @@ sin = _Sin() Externs can be used as expressions on the RHS of assignment and reduction statements. This allows you to incorporate external functions seamlessly into your Exo computations. -Note that externs (and Exo procedures) do not allow aliasing in their arguments. This restriction is in place to prevent externs from having side effects on the input arguments. +Unlike Exo procedures that do not allow aliasing in their arguments, you _can_ pass the same buffer to external arguments (e.g., `select(xi, xi, xi, xi)`). +This is because there is no concern about aliasing since all external arguments are read-only, as opposed to Exo procedure arguments which can have write effects on the input arguments. ### Example: Using `sin` in an Expression diff --git a/docs/instructions.md b/docs/instructions.md index 15c54f8a..ca9fc228 100644 --- a/docs/instructions.md +++ b/docs/instructions.md @@ -42,7 +42,7 @@ def neon_vld_4xf32(dst: [f32][4] @ Neon, src: [f32][4] @ DRAM): dst[i] = src[i] ``` -- **`@instr` decorators**: Specifies the semantics of the hardware instruction and the C code to emit. +- **`@instr(...)`**: Specifies the semantics of the hardware instruction and the C code to emit. - `{dst_data}` and `{src_data}` are format strings that will be replaced with the actual arguments during codegen. You can put `_data` after the function argument names and surround them with curly braces (`{dst_data}`). - `"{dst_data} = vld1q_f32(&{src_data});"`: The argument to `@instr` decorators specifies the C code to emit for this instruction. - **`dst: [f32][4] @ Neon`**: Declares `dst` as a 4-element array of `f32` in `Neon` memory. From 4c99d64c893ef253a460de5b52492963eec1fcea Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 5 Nov 2024 14:00:00 -0500 Subject: [PATCH 26/26] . --- docs/externs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/externs.md b/docs/externs.md index 4fa436f7..69adc849 100644 --- a/docs/externs.md +++ b/docs/externs.md @@ -41,7 +41,7 @@ def __init__(self): super().__init__("sin") ``` -- `"sin"`: The name of the external function as it will appear in the Exo object code. +- `"sin"`: The name of the external function as it will appear in the printed Exo object code. ##### `typecheck(self, args)`