-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix non bfb issues with latent heat in p3 on PM-GPU #2998
base: master
Are you sure you want to change the base?
Fix non bfb issues with latent heat in p3 on PM-GPU #2998
Conversation
This reverts commit 9438a98.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this temporary fix, but I would encourage to chase down the bug. Honestly, I'm not sure that the issue is with view-vs-scalar itself. I'm more concerned about some deeper bugs, such as scratch views actually aliasing each other, or stuff like that. Maybe you could keep the extra views in the buffers as well as the fcn signatures, but use C::latice and C::latvap in the whole code, and see if you are bfb.
Being BFB with what we had earlier is not necessarily a good thing.
I've narrowed down to the offending usage of |
@@ -78,9 +76,9 @@ ::update_prognostic_ice( | |||
qv.set(context, qv + (-qv2qi_vapdep_tend+qi2qv_sublim_tend-qv2qi_nucleat_tend)*dt); | |||
|
|||
constexpr Scalar INV_CP = C::INV_CP; | |||
th_atm.set(context, th_atm + inv_exner * ((qv2qi_vapdep_tend - qi2qv_sublim_tend + qv2qi_nucleat_tend) * (latvap+latice) * INV_CP + | |||
(qr2qi_collect_tend + qc2qi_collect_tend + qc2qi_hetero_freeze_tend + qr2qi_immers_freeze_tend - | |||
qi2qr_melt_tend + qc2qi_berg_tend) * latice * INV_CP) * dt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This usage of the constant C::LatIce
is what is causing non-bfb to latent_heat_fusion
. Investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: LatIce
is exactly BFB with latent_heat_fusion
for the entire run. Tested with:
for (int p=0; p<Spack::n; ++p) {
uint64_t latice_bits = *reinterpret_cast<const uint64_t*>(&latice);
uint64_t fusion_bits = *reinterpret_cast<const uint64_t*>(&latent_heat_fusion[p]);
if (latice_bits != fusion_bits)
printf("latice:%.17f (%" PRIu64 "):"
"latent_heat_fusion:%.17f (%" PRIu64 ")\n",
latice,latice_bits,latent_heat_fusion[p],fusion_bits);
}
directly after the call to th_atm.set()
.
Not really sure what could be happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You said on weaver we are bfb right? If you haven't already, you could try to run a SMS_Ln1 CIME case, forcing mac_aero_mic::subcycling=1, save a bunch of output fields; do this for both master and master+PR, and see which ones actually diff.
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: tcclevenger |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Weaver # 6060 FAILED (click to see last 100 lines of console output)
|
@tcclevenger did you add WIP to this PR because the AT wasn't working? If so, I think we can unWIP it and add the RETEST label. |
No, I put this WIP since we don't want to merge it, I was just using it to track the issue. I could close it, but I didn't in case it turns out we actually need it. But I don't think that will be the case. |
7114e81
to
579850c
Compare
6318f41
to
db5b35d
Compare
This PR...
I don't know why the previous version was non-BFB, but investigating may take some time and PM-GPU is needed for PR testing, so I suggest merging this PR. I've added "TODO" statements to track that eventually these should just be constants and can create an issue once this is merged (if others agree). The only downside is we keep the 3 temp views.
Testing
I ran the following
and passed all baselines for CPU and GPU.