-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(closes #2105) Use maxval2loop and atomics to parallelise NEMO stpctl.f90 #2439
Conversation
@arporter @rupertford This is ready to review. (if the reviewer doesn't do it before I will launch the integration tests at night to try to get GLaDos emptier as this PR should give a new baseline for perf comparisons) |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2439 +/- ##
=======================================
Coverage 99.85% 99.85%
=======================================
Files 352 352
Lines 47335 47337 +2
=======================================
+ Hits 47266 47268 +2
Misses 69 69 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks OK, always good to go faster! I'd prefer a Transformation but that can be a separate Issue :-)
We also need to see about moving code out of examples
and into e.g. domain/nemo
so that we can add tests. Again, a separate Issue.
I note that the integration tests failed last time but I think that's because of #2440 so will have another go once that's on.
Unfortunately it was not/only this, it is because a maxval2loop in the normalisation step fails for the esmwf version of nemo.
To fix it I modified the generic implementation to return False, with matches the specialised implementation in that if doesn't have enough information (e.g. DeferedType, no partial datatype) it also returns false. |
There is also this issue:
|
The one that fails is |
Also fails in ecmwf's where zcflnow is: |
@arporter This is ready for another review. It was more complex to fix than I thought. I could have disabled the maxval2loop on the ecmwf version since the CPU will not parallelise them, but I think its good to have them on to expose the potential issues. |
This shouldn't matter. The use of a variable or array should result in the same value of HUGE as they have the same datatype and it is the precision of the datatype that determines the value of HUGE. |
These were 2 different problems:
|
Hi @sergisiso, can you remember why this was? It seems like something we should fix (in a separate PR)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good now, thanks.
I'll re-run the integration tests, just to make sure.
Integration tests were fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All fine now. Will proceed to merge once CI complete.
I noticed that CodeCov was reporting newly-missed lines. This was because Symbol.is_array no longer raises an exception so I've removed the try...except around the call to it. I'll re-run the integration tests to be on the safe side. |
Happily, integration tests were all fine. |
Updates the nemo scripts with the new maxval2loop and atomic psyclone functionality, this makes it ~10% faster (for my 10its elapsed time test) - although there was a lot of variability today in Glados.
If I look with the profiler:
![image](https://private-user-images.githubusercontent.com/1449555/290582005-e7228fa5-e734-4c42-b914-369d524495dd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDU2NzQsIm5iZiI6MTczOTA0NTM3NCwicGF0aCI6Ii8xNDQ5NTU1LzI5MDU4MjAwNS1lNzIyOGZhNS1lNzM0LTRjNDItYjkxNC0zNjlkNTI0NDk1ZGQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDhUMjAwOTM0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZjZkNWVmYmFlZmJiM2IyNjQ2MDkyMDY1ZjU5MWE3ODQ5MDMxMmMwNTBjZmQwNTczOGRmNTU5YjY2OTdjZTI5YSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.a8LKh-Y2gWQj3H98p0-yKQ1BA3OJIR7-Cbf7HdT6ND4)
![image](https://private-user-images.githubusercontent.com/1449555/290582832-24a54b18-bc96-401c-928f-df989fbc3e50.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDU2NzQsIm5iZiI6MTczOTA0NTM3NCwicGF0aCI6Ii8xNDQ5NTU1LzI5MDU4MjgzMi0yNGE1NGIxOC1iYzk2LTQwMWMtOTI4Zi1kZjk4OWZiYzNlNTAucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDhUMjAwOTM0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MjE1OTgxYjQ4ZDQ1NmNjYWY3Mzk2YzRhMTIxYTBlZjMzMGQzN2YzYzU0MTY5OTQ1ODA0NTBkN2M5YjhiYjhlZiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.jQdrtZxVw21ZUCM_d8jfVpvJnoEvZLwcWN2sFA9f0LY)
Before:
After:
The benefit comes from keeping the arrays in the GPU, not so much from stpctl itself as it is dominated by a syscall (IO) and sending this new data to the GPU.