Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(closes #2105) Use maxval2loop and atomics to parallelise NEMO stpctl.f90 #2439

Merged
merged 12 commits into from
Jan 26, 2024

Conversation

sergisiso
Copy link
Collaborator

Updates the nemo scripts with the new maxval2loop and atomic psyclone functionality, this makes it ~10% faster (for my 10its elapsed time test) - although there was a lot of variability today in Glados.

If I look with the profiler:
Before:
image
After:
image
The benefit comes from keeping the arrays in the GPU, not so much from stpctl itself as it is dominated by a syscall (IO) and sending this new data to the GPU.

@sergisiso
Copy link
Collaborator Author

@arporter @rupertford This is ready to review. (if the reviewer doesn't do it before I will launch the integration tests at night to try to get GLaDos emptier as this PR should give a new baseline for perf comparisons)

Copy link

codecov bot commented Dec 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (106543d) 99.85% compared to head (8bd137c) 99.85%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2439   +/-   ##
=======================================
  Coverage   99.85%   99.85%           
=======================================
  Files         352      352           
  Lines       47335    47337    +2     
=======================================
+ Hits        47266    47268    +2     
  Misses         69       69           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@arporter arporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks OK, always good to go faster! I'd prefer a Transformation but that can be a separate Issue :-)
We also need to see about moving code out of examples and into e.g. domain/nemo so that we can add tests. Again, a separate Issue.
I note that the integration tests failed last time but I think that's because of #2440 so will have another go once that's on.

examples/nemo/scripts/omp_gpu_trans.py Show resolved Hide resolved
examples/nemo/scripts/utils.py Show resolved Hide resolved
examples/nemo/scripts/omp_gpu_trans.py Show resolved Hide resolved
@sergisiso
Copy link
Collaborator Author

I note that the integration tests failed last time but I think that's because of #2440 so will have another go once that's on.

Unfortunately it was not/only this, it is because a maxval2loop in the normalisation step fails for the esmwf version of nemo.

Invokes found in psy_bdyini_psy:
bdy_def
raised the following exception during execution...
{
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/omp_cpu_trans.py", line 92, in trans
    normalise_loops(
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/utils.py", line 179, in normalise_loops
    Maxval2LoopTrans().apply(intr)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/intrinsics/array_reduction_base_trans.py", line 224, in apply
    reference2arrayrange.apply(reference)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/reference2arrayrange_trans.py", line 177, in apply
    self.validate(node, options=None)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/reference2arrayrange_trans.py", line 149, in validate
    if not node.symbol.is_array:
           ^^^^^^^^^^^^^^^^^^^^
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/symbols/symbol.py", line 440, in is_array
    raise ValueError(f"No array information is available for the "
    ValueError: No array information is available for the symbol 'nb_bdy'.

To fix it I modified the generic implementation to return False, with matches the specialised implementation in that if doesn't have enough information (e.g. DeferedType, no partial datatype) it also returns false.

@sergisiso
Copy link
Collaborator Author

There is also this issue:

    Maxval2LoopTrans().apply(intr)
      File "/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.12/site-packages/psyclone/psyir/transformations/intrinsics/array_reduction_base_trans.py", line 349, in apply
    outer_loop.parent.children.insert(outer_loop.position, assignment)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
    AttributeError: 'NoneType' object has no attribute 'children'

@sergisiso
Copy link
Collaborator Author

The one that fails is jpbdta = MAXVAL(nblendta(:,:nb_bdy))
Also when doing: x(3) = maxval(a) it initialises with x(3) = -HUGE(x) but should be -HUGE(x(3))?

@sergisiso
Copy link
Collaborator Author

Also fails in ecmwf's zcflnow(1) = MAX( zcflnow(1), MAXVAL( ABS( pv_ice(:,:) ) * rdt_ice * r1_e2v(:,:) ) ) with the compile time error The shapes of the array expressions do not conform. [ZCFLNOW] zcflnow(1) = MAX(zcflnow(1), tmp_var)

where zcflnow is:
REAL(wp), DIMENSION(1) :: zcflprv, zcflnow

@sergisiso
Copy link
Collaborator Author

@arporter This is ready for another review. It was more complex to fix than I thought. I could have disabled the maxval2loop on the ecmwf version since the CPU will not parallelise them, but I think its good to have them on to expose the potential issues.

@rupertford
Copy link
Collaborator

The one that fails is jpbdta = MAXVAL(nblendta(:,:nb_bdy)) Also when doing: x(3) = maxval(a) it initialises with x(3) = -HUGE(x) but should be -HUGE(x(3))?

This shouldn't matter. The use of a variable or array should result in the same value of HUGE as they have the same datatype and it is the precision of the datatype that determines the value of HUGE.

@sergisiso
Copy link
Collaborator Author

The one that fails is jpbdta = MAXVAL(nblendta(:,:nb_bdy)) Also when doing: x(3) = maxval(a) it initialises with x(3) = -HUGE(x) but should be -HUGE(x(3))?

This shouldn't matter. The use of a variable or array should result in the same value of HUGE as they have the same datatype and it is the precision of the datatype that determines the value of HUGE.

These were 2 different problems:

  • The first is that nblendta(:,:nb_bdy) was silently not converted to a loop by the arrayrange2loop transformation and the apply continued with the wrong nodes. This is now catch and an error is raised.
  • The second is it true that for an array HUGE does the right thing, although it is not documented in the standard, but I still think it had to be fixed because the lhs can be a complex expression e.g. "mystryct(3)%myvar" so the huge needs to be initialised by type of the resulting access. And the change was quite simple.

@arporter
Copy link
Member

  • nblendta(:,:nb_bdy) was silently not converted to a loop by the arrayrange2loop transformation

Hi @sergisiso, can you remember why this was? It seems like something we should fix (in a separate PR)?

Copy link
Member

@arporter arporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good now, thanks.
I'll re-run the integration tests, just to make sure.

@arporter
Copy link
Member

Integration tests were fine.

Copy link
Member

@arporter arporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fine now. Will proceed to merge once CI complete.

@arporter
Copy link
Member

I noticed that CodeCov was reporting newly-missed lines. This was because Symbol.is_array no longer raises an exception so I've removed the try...except around the call to it. I'll re-run the integration tests to be on the safe side.

@arporter
Copy link
Member

Happily, integration tests were all fine.

@arporter arporter merged commit 77b28ff into master Jan 26, 2024
12 checks passed
@arporter arporter deleted the 2105_nemo_maxval branch January 26, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants