Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow training w/ standardized trainer - epochs take over a minute #517

Open
TobykoMusic opened this issue Dec 12, 2024 · 8 comments
Open
Labels
bug:mps Bug related to Metal Performance Shaders bug Something isn't working priority:low Low-priority issues

Comments

@TobykoMusic
Copy link

Thanks for taking the time to write a bug report! Use the following prompts to help you describe what's going on. The more info you provide, the easier it'll be to understand how to fix it for you without having to come back and ask you questions.

Describe the bug
When running NAM on my Mac (M1 Pro MacBook Pro), i get the error

NAM encountered a bug in PyTorch's MPS backend and will switch to a fallback.
Your version of PyTorch is 2.5.1.
Please report this in an Issue at:
https://github.com/sdatkinson/neural-amp-modeler/issues/new/choose
so that NAM's dependencies can avoid buggy versions of PyTorch and the associated performance hit.
MRSTFT failed on device; falling back to CPU
image

I read somewhere that downgrading PyTorch to <2.0 can help but it seemed to do nothing and is still running with pytorch 2.5.1.
Im sure its an easy fix but im technologically illiterate when it comes to any of this stuff.

Cheers!

Desktop (please complete the following information):

  • OS macOS 14.0
  • Local?
  • Version 0.11.0
@TobykoMusic TobykoMusic added bug Something isn't working priority:low Low-priority issues unread This issue is new and hasn't been seen by the maintainers yet labels Dec 12, 2024
@sdatkinson
Copy link
Owner

Hi!

Did it crash, or did it just continue to run? The progress bar in your snapshot suggests that it's doing fine.

This may just be an old error message I need to clean up

@sdatkinson sdatkinson removed the unread This issue is new and hasn't been seen by the maintainers yet label Dec 12, 2024
@sdatkinson
Copy link
Owner

print(
"===WARNING===\n"
"NAM encountered a bug in PyTorch's MPS backend and will "
"switch to a fallback.\n"
f"Your version of PyTorch is {torch.__version__}.\n"
"Please report this in an Issue at:\n"
"https://github.com/sdatkinson/neural-amp-modeler/issues/new/choose"
"\n"
"so that NAM's dependencies can avoid buggy versions of "
"PyTorch and the associated performance hit."
)

@TobykoMusic
Copy link
Author

TobykoMusic commented Dec 12, 2024 via email

@binky100
Copy link

binky100 commented Dec 18, 2024

M3 Pro NAM not working
I have had the same issue on my 2023 macbook pro running osx sonoma with an M3 Pro chip . The local trainer doesnt do anything after plotting the latency graph. The local trainer wont start until I save or close the latency graph and then the terminal gives me the same error as the person who started this thread where my mac says its falling back to the CPU.
Screenshot shows the terminal output

@sdatkinson
Copy link
Owner

Regarding the original issue here: what we can do is modify the warning message.

It would be interesting to warn people that the version of PyTorch that they're using has this MPS conv false positive, and what versions have this Issue.

As I said in #507 (comment), I don't want to constrain this package not to use those versions because overly-constrained packages aren't fun to work with, but I'm happy to keep a list of versions that folks could use as a stronger requirement to have better functionality.

I'm going to make a new Issue to improve the warning message: #520.

Still, over a minute per epoch is slow, even for a Mac. Let me think about that for a bit.

@binky100

This comment was marked as off-topic.

@sdatkinson sdatkinson changed the title [BUG] MacOS 14.0 "NAM encountered a bug in PyTorch's MPS backend" Very slow training w/ standardized trainer - epochs take over a minute Dec 19, 2024
@sdatkinson
Copy link
Owner

sdatkinson commented Dec 19, 2024

@binky100 Please make a separate issue with your problem including the details that the template asks for.

@sdatkinson
Copy link
Owner

Renamed this Issue to focus on OP's ">1 minute epoch" behavior.

@sdatkinson sdatkinson added the bug:mps Bug related to Metal Performance Shaders label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:mps Bug related to Metal Performance Shaders bug Something isn't working priority:low Low-priority issues
Projects
None yet
Development

No branches or pull requests

3 participants