-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial custom ops introductory recipe.
- Loading branch information
1 parent
18d4767
commit 8afd653
Showing
11 changed files
with
650 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# pixi environments | ||
.pixi | ||
*.egg-info | ||
# magic environments | ||
.magic | ||
.env | ||
# build products | ||
operations.mojopkg |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
# Custom Operations: An Introduction to Programming GPUs and CPUs with Mojo | ||
|
||
In this recipe, we will cover: | ||
|
||
* How to extend a MAX Graph using custom operations. | ||
* Using Mojo to write high-performance calculations that run on GPUs and CPUs. | ||
* The basics of GPU programming in MAX. | ||
|
||
We'll walk through running three examples that show | ||
|
||
* adding one to every number in an input tensor | ||
* performing hardware-specific addition of two vectors | ||
* and calculating the Mandelbrot set on CPU and GPU. | ||
|
||
Let's get started. | ||
|
||
## Requirements | ||
|
||
Please make sure your system meets our | ||
[system requirements](https://docs.modular.com/max/get-started). | ||
|
||
To proceed, ensure you have the `magic` CLI installed: | ||
|
||
```bash | ||
curl -ssL https://magic.modular.com/ | bash | ||
``` | ||
|
||
or update it via: | ||
|
||
```bash | ||
magic self-update | ||
``` | ||
|
||
### GPU requirements | ||
|
||
These examples can all be run on either a CPU or GPU. To run them on a GPU, | ||
ensure your system meets | ||
[these GPU requirements](https://docs.modular.com/max/faq/#gpu-requirements): | ||
|
||
* Officially supported GPUs: NVIDIA Ampere A-series (A100/A10), or Ada | ||
L4-series (L4/L40) data center GPUs. Unofficially, RTX 30XX and 40XX series | ||
GPUs have been reported to work well with MAX. | ||
* NVIDIA GPU driver version 555 or higher. [Installation guide here](https://www.nvidia.com/download/index.aspx). | ||
|
||
## Quick start | ||
|
||
1. Download the code for this recipe using git: | ||
|
||
```bash | ||
git clone https://github.com/modular/max-recipes.git | ||
cd max-recipes/custom-ops-introduction | ||
``` | ||
|
||
2. Run each of the examples: | ||
|
||
```bash | ||
magic run add_one | ||
magic run vector_addition | ||
magic run mandelbrot | ||
``` | ||
|
||
3. Browse through the commented source code to see how they work. | ||
|
||
## Custom operation examples | ||
|
||
Graphs in MAX can be extended to use custom operations written in Mojo. The | ||
following examples are shown here: | ||
|
||
* **add_one**: Adding 1 to every element of an input tensor. | ||
* **vector_addition**: Performing vector addition using a manual GPU function. | ||
* **mandelbrot**: Calculating the Mandelbrot set. | ||
|
||
Custom operations have been written in Mojo to carry out these calculations. For | ||
each example, a simple graph containing a single operation is constructed | ||
in Python. This graph is compiled and dispatched onto a supported GPU if one is | ||
available, or the CPU if not. Input tensors, if there are any, are moved from | ||
the host to the device on which the graph is running. The graph then runs and | ||
the results are copied back to the host for display. | ||
|
||
One thing to note is that this same Mojo code runs on CPU as well as GPU. In | ||
the construction of the graph, it runs on a supported accelerator if one is | ||
available or falls back to the CPU if not. No code changes for either path. | ||
The `vector_addition` example shows how this works under the hood for common | ||
MAX abstractions, where compile-time specialization lets MAX choose the optimal | ||
code path for a given hardware architecture. | ||
|
||
The `operations/` directory contains the custom kernel implementations, and the | ||
graph construction occurs in the Python files in the base directory. These | ||
examples are designed to stand on their own, so that they can be used as | ||
templates for experimentation. | ||
|
||
The execution has two phases: first an `operations.mojopkg` is compiled from the | ||
custom Mojo kernel, and then the graph is constructed and run in Python. The | ||
inference session is pointed to the `operations.mojopkg` in order to load the | ||
custom operations. | ||
|
||
## Conclusion | ||
|
||
In this recipe, we've introduced the basics of how to write custom MAX Graph | ||
operations using Mojo, place them in a one-operation graph in Python, and run | ||
them on an available CPU or GPU. | ||
|
||
## Next Steps | ||
|
||
* Follow [our tutorial for building a custom operation from scratch](https://docs.modular.com/max/tutorials/build-custom-ops). | ||
|
||
* Explore MAX's [documentation](https://docs.modular.com/max/) for additional | ||
features. The [`gpu`](https://docs.modular.com/mojo/stdlib/gpu/) module has | ||
detail on Mojo's GPU programming functions and types, and the documentation | ||
on [`@compiler.register`](https://docs.modular.com/max/api/mojo-decorators/compiler-register/) | ||
shows how to register custom graph operations. | ||
|
||
* Join our [Modular Forum](https://forum.modular.com/) and [Discord community](https://discord.gg/modular) to share your experiences and get support. | ||
|
||
We're excited to see what you'll build with MAX! Share your projects and experiences with us using `#ModularAI` on social media. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# ===----------------------------------------------------------------------=== # | ||
# Copyright (c) 2025, Modular Inc. All rights reserved. | ||
# | ||
# Licensed under the Apache License v2.0 with LLVM Exceptions: | ||
# https://llvm.org/LICENSE.txt | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ===----------------------------------------------------------------------=== # | ||
|
||
from pathlib import Path | ||
|
||
import numpy as np | ||
from max.driver import CPU, Accelerator, Tensor, accelerator_count | ||
from max.dtype import DType | ||
from max.engine import InferenceSession | ||
from max.graph import Graph, TensorType, ops | ||
|
||
if __name__ == "__main__": | ||
path = Path(__file__).parent / "operations.mojopkg" | ||
|
||
rows = 5 | ||
columns = 10 | ||
dtype = DType.float32 | ||
|
||
# Configure our simple one-operation graph. | ||
graph = Graph( | ||
"addition", | ||
# The custom Mojo operation is referenced by its string name, and we | ||
# need to provide inputs as a list as well as expected output types. | ||
forward=lambda x: ops.custom( | ||
name="add_one", | ||
values=[x], | ||
out_types=[TensorType(dtype=x.dtype, shape=x.tensor.shape)], | ||
)[0].tensor, | ||
input_types=[ | ||
TensorType(dtype, shape=[rows, columns]), | ||
], | ||
) | ||
|
||
# Place the graph on a GPU, if available. Fall back to CPU if not. | ||
device = CPU() if accelerator_count() == 0 else Accelerator() | ||
|
||
# Set up an inference session for running the graph. | ||
session = InferenceSession( | ||
devices=[device], | ||
custom_extensions=path, | ||
) | ||
|
||
# Compile the graph. | ||
model = session.load(graph) | ||
|
||
# Fill an input matrix with random values. | ||
x_values = np.random.uniform(size=(rows, columns)).astype(np.float32) | ||
|
||
# Create a driver tensor from this, and move it to the accelerator. | ||
x = Tensor.from_numpy(x_values).to(device) | ||
|
||
# Perform the calculation on the target device. | ||
result = model.execute(x)[0] | ||
|
||
# Copy values back to the CPU to be read. | ||
assert isinstance(result, Tensor) | ||
result = result.to(CPU()) | ||
|
||
print("Graph result:") | ||
print(result.to_numpy()) | ||
print() | ||
|
||
print("Expected result:") | ||
print(x_values + 1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# ===----------------------------------------------------------------------=== # | ||
# Copyright (c) 2025, Modular Inc. All rights reserved. | ||
# | ||
# Licensed under the Apache License v2.0 with LLVM Exceptions: | ||
# https://llvm.org/LICENSE.txt | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ===----------------------------------------------------------------------=== # | ||
|
||
from pathlib import Path | ||
|
||
from max.driver import CPU, Accelerator, Tensor, accelerator_count | ||
from max.dtype import DType | ||
from max.engine import InferenceSession | ||
from max.graph import Graph, TensorType, ops | ||
|
||
|
||
def create_mandelbrot_graph( | ||
width: int, | ||
height: int, | ||
min_x: float, | ||
min_y: float, | ||
scale_x: float, | ||
scale_y: float, | ||
max_iterations: int, | ||
) -> Graph: | ||
"""Configure a graph to run a Mandelbrot kernel.""" | ||
output_dtype = DType.int32 | ||
with Graph( | ||
"mandelbrot", | ||
) as graph: | ||
# The custom Mojo operation is referenced by its string name, and we | ||
# need to provide inputs as a list as well as expected output types. | ||
result = ops.custom( | ||
name="mandelbrot", | ||
values=[ | ||
ops.constant(min_x, dtype=DType.float32), | ||
ops.constant(min_y, dtype=DType.float32), | ||
ops.constant(scale_x, dtype=DType.float32), | ||
ops.constant(scale_y, dtype=DType.float32), | ||
ops.constant(max_iterations, dtype=DType.int32), | ||
], | ||
out_types=[TensorType(dtype=output_dtype, shape=[height, width])], | ||
)[0].tensor | ||
|
||
# Return the result of the custom operation as the output of the graph. | ||
graph.output(result) | ||
return graph | ||
|
||
|
||
if __name__ == "__main__": | ||
path = Path(__file__).parent / "operations.mojopkg" | ||
|
||
# Establish Mandelbrot set ranges. | ||
WIDTH = 15 | ||
HEIGHT = 15 | ||
MAX_ITERATIONS = 100 | ||
MIN_X = -1.5 | ||
MAX_X = 0.7 | ||
MIN_Y = -1.12 | ||
MAX_Y = 1.12 | ||
|
||
# Configure our simple graph. | ||
scale_x = (MAX_X - MIN_X) / WIDTH | ||
scale_y = (MAX_Y - MIN_Y) / HEIGHT | ||
graph = create_mandelbrot_graph( | ||
WIDTH, HEIGHT, MIN_X, MIN_Y, scale_x, scale_y, MAX_ITERATIONS | ||
) | ||
|
||
# Place the graph on a GPU, if available. Fall back to CPU if not. | ||
device = CPU() if accelerator_count() == 0 else Accelerator() | ||
|
||
# Set up an inference session that runs the graph on a GPU, if available. | ||
session = InferenceSession( | ||
devices=[device], | ||
custom_extensions=path, | ||
) | ||
# Compile the graph. | ||
model = session.load(graph) | ||
|
||
# Perform the calculation on the target device. | ||
result = model.execute()[0] | ||
|
||
# Copy values back to the CPU to be read. | ||
assert isinstance(result, Tensor) | ||
result = result.to(CPU()) | ||
|
||
print("Iterations to escape:") | ||
print(result.to_numpy()) | ||
print() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
version: 1.0 | ||
long_title: "Custom Operations: An Introduction to Programming GPUs and CPUs with Mojo" | ||
short_title: "Custom Operations: An Introduction" | ||
author: "Brad Larson" | ||
author_image: "author/bradlarson.jpg" | ||
author_url: "https://www.linkedin.com/in/brad-larson-3549a5291/" | ||
github_repo: "https://github.com/modular/max-recipes/tree/main/custom-ops-introduction" | ||
date: "23-02-2025" | ||
difficulty: "beginner" | ||
tags: | ||
- max-graph | ||
- gpu-programming | ||
|
||
tasks: | ||
- magic run add_one | ||
- magic run vector_addition | ||
- magic run mandelbrot |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# ===----------------------------------------------------------------------=== # | ||
# Copyright (c) 2025, Modular Inc. All rights reserved. | ||
# | ||
# Licensed under the Apache License v2.0 with LLVM Exceptions: | ||
# https://llvm.org/LICENSE.txt | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ===----------------------------------------------------------------------=== # |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# ===----------------------------------------------------------------------=== # | ||
# Copyright (c) 2025, Modular Inc. All rights reserved. | ||
# | ||
# Licensed under the Apache License v2.0 with LLVM Exceptions: | ||
# https://llvm.org/LICENSE.txt | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ===----------------------------------------------------------------------=== # | ||
|
||
import compiler | ||
from max.tensor import ManagedTensorSlice, foreach | ||
from runtime.asyncrt import DeviceContextPtr | ||
|
||
from utils.index import IndexList | ||
|
||
|
||
@compiler.register("add_one", num_dps_outputs=1) | ||
struct AddOne: | ||
@staticmethod | ||
fn execute[ | ||
# The kind of device this will be run on: "cpu" or "gpu" | ||
target: StringLiteral, | ||
]( | ||
# as num_dps_outputs=1, the first argument is the "output" | ||
out: ManagedTensorSlice, | ||
# starting here are the list of inputs | ||
x: ManagedTensorSlice[type = out.type, rank = out.rank], | ||
# the context is needed for some GPU calls | ||
ctx: DeviceContextPtr, | ||
): | ||
@parameter | ||
@always_inline | ||
fn elementwise_add_one[ | ||
width: Int | ||
](idx: IndexList[x.rank]) -> SIMD[x.type, width]: | ||
return x.load[width](idx) + 1 | ||
|
||
foreach[elementwise_add_one, target=target](out, ctx) |
Oops, something went wrong.