-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add recipe: an introduction to custom ops #7
Conversation
…lean up two function names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments. In the text, some of the initial experimentation could be in playing with larger data frame sizes to really show the benefits of moving operations to GPUs, scaling well beyond the small examples that are being shown (Mandelbrot is probably a good example for this)
result = result.to(CPU()) | ||
|
||
print("Iterations to escape:") | ||
print(result.to_numpy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the numpy isclose
method to demonstrate equality in this, and other examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Mandelbrot set, I've replaced the tensor output with an ASCII art display that should allow better scaling of output, and I've added a call-to-action in the text for people to experiment with different resolutions of the grid and different parts of the complex number space. The new output looks something like:
...................................,,,,c@8cc,,,.............
...............................,,,,,,cc8M @Mjc,,,,..........
............................,,,,,,,ccccM@aQaM8c,,,,,........
..........................,,,,,,,ccc88g.o. Owg8ccc,,,,......
.......................,,,,,,,,c8888M@j, ,wMM8cccc,,.....
.....................,,,,,,cccMQOPjjPrgg, OrwrwMMMjjc,....
..................,,,,cccccc88MaP @ ,pGa.g8c,...
...............,,cccccccc888MjQp. o@8cc,..
..........,,,,c8jjMMMMMMMMM@@w. aj8c,,.
.....,,,,,,ccc88@QEJwr.wPjjjwG w8c,,.
..,,,,,,,cccccMMjwQ EpQ .8c,,,
.,,,,,,cc888MrajwJ MMcc,,,
.cc88jMMM@@jaG. oM8cc,,,
.cc88jMMM@@jaG. oM8cc,,,
.,,,,,,cc888MrajwJ MMcc,,,
..,,,,,,,cccccMMjwQ EpQ .8c,,,
.....,,,,,,ccc88@QEJwr.wPjjjwG w8c,,.
..........,,,,c8jjMMMMMMMMM@@w. aj8c,,.
...............,,cccccccc888MjQp. o@8cc,..
..................,,,,cccccc88MaP @ ,pGa.g8c,...
.....................,,,,,,cccMQOEjjPrgg, OrwrwMMMjjc,....
.......................,,,,,,,,c8888M@j, ,wMM8cccc,,.....
..........................,,,,,,,ccc88g.o. Owg8ccc,,,,......
............................,,,,,,,ccccM@aQaM8c,,,,,........
...............................,,,,,,cc8M @Mjc,,,,..........
I'll work on better results for the others.
) | ||
|
||
# Place the graph on a GPU, if available. Fall back to CPU if not. | ||
device = CPU() if accelerator_count() == 0 else Accelerator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all of the examples, it would be nice to output some status text, including output that states which device the code is actually being run on, and any other information that might be interesting (like tick-tock timing when compared to NumPy calculations, if they actually result in better performance).
Being a bit more verbose will make the magic run
commands feel a bit more interactive.
"https://conda.modular.com/max", | ||
"https://repo.prefix.dev/modular-community", | ||
] | ||
platforms = ["linux-64", "osx-arm64", "linux-aarch64"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove osx-arm64
Per our offline discussion yesterday, I'm going to merge to place an initial version of the recipes in the repository, which will make it easier for others to follow-on with enhancements like those suggested above. |
Adds the first of a series of three planned recipes on custom operations (an introduction, optimizing matrix multiplications, and advanced AI operations: top-K sampling and FlashAttention 2).
The examples here are drawn from the public MAX custom operation examples with some cleanups that will be brought back to those original examples.