Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recipe: an introduction to custom ops #7

Merged
merged 3 commits into from
Mar 4, 2025

Conversation

BradLarson
Copy link
Collaborator

Adds the first of a series of three planned recipes on custom operations (an introduction, optimizing matrix multiplications, and advanced AI operations: top-K sampling and FlashAttention 2).

The examples here are drawn from the public MAX custom operation examples with some cleanups that will be brought back to those original examples.

Copy link

@hogepodge hogepodge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments. In the text, some of the initial experimentation could be in playing with larger data frame sizes to really show the benefits of moving operations to GPUs, scaling well beyond the small examples that are being shown (Mandelbrot is probably a good example for this)

result = result.to(CPU())

print("Iterations to escape:")
print(result.to_numpy())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the numpy isclose method to demonstrate equality in this, and other examples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Mandelbrot set, I've replaced the tensor output with an ASCII art display that should allow better scaling of output, and I've added a call-to-action in the text for people to experiment with different resolutions of the grid and different parts of the complex number space. The new output looks something like:

...................................,,,,c@8cc,,,.............
...............................,,,,,,cc8M @Mjc,,,,..........
............................,,,,,,,ccccM@aQaM8c,,,,,........
..........................,,,,,,,ccc88g.o. Owg8ccc,,,,......
.......................,,,,,,,,c8888M@j,    ,wMM8cccc,,.....
.....................,,,,,,cccMQOPjjPrgg,   OrwrwMMMjjc,....
..................,,,,cccccc88MaP  @            ,pGa.g8c,...
...............,,cccccccc888MjQp.                   o@8cc,..
..........,,,,c8jjMMMMMMMMM@@w.                      aj8c,,.
.....,,,,,,ccc88@QEJwr.wPjjjwG                        w8c,,.
..,,,,,,,cccccMMjwQ       EpQ                         .8c,,,
.,,,,,,cc888MrajwJ                                   MMcc,,,
.cc88jMMM@@jaG.                                     oM8cc,,,
.cc88jMMM@@jaG.                                     oM8cc,,,
.,,,,,,cc888MrajwJ                                   MMcc,,,
..,,,,,,,cccccMMjwQ       EpQ                         .8c,,,
.....,,,,,,ccc88@QEJwr.wPjjjwG                        w8c,,.
..........,,,,c8jjMMMMMMMMM@@w.                      aj8c,,.
...............,,cccccccc888MjQp.                   o@8cc,..
..................,,,,cccccc88MaP  @            ,pGa.g8c,...
.....................,,,,,,cccMQOEjjPrgg,   OrwrwMMMjjc,....
.......................,,,,,,,,c8888M@j,    ,wMM8cccc,,.....
..........................,,,,,,,ccc88g.o. Owg8ccc,,,,......
............................,,,,,,,ccccM@aQaM8c,,,,,........
...............................,,,,,,cc8M @Mjc,,,,..........

I'll work on better results for the others.

)

# Place the graph on a GPU, if available. Fall back to CPU if not.
device = CPU() if accelerator_count() == 0 else Accelerator()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of the examples, it would be nice to output some status text, including output that states which device the code is actually being run on, and any other information that might be interesting (like tick-tock timing when compared to NumPy calculations, if they actually result in better performance).

Being a bit more verbose will make the magic run commands feel a bit more interactive.

"https://conda.modular.com/max",
"https://repo.prefix.dev/modular-community",
]
platforms = ["linux-64", "osx-arm64", "linux-aarch64"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove osx-arm64

@BradLarson
Copy link
Collaborator Author

Per our offline discussion yesterday, I'm going to merge to place an initial version of the recipes in the repository, which will make it easier for others to follow-on with enhancements like those suggested above.

@BradLarson BradLarson merged commit 3835e6a into modular:main Mar 4, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants