Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test configuration for the "merge" of NESP2pt9 and CABLE-POP_TRENDY #526

Open
ccarouge opened this issue Jan 14, 2025 · 10 comments
Open

Test configuration for the "merge" of NESP2pt9 and CABLE-POP_TRENDY #526

ccarouge opened this issue Jan 14, 2025 · 10 comments
Labels
priority:medium Medium priority issues to become high priority issues after a release.
Milestone

Comments

@ccarouge
Copy link
Member

ccarouge commented Jan 14, 2025

At the end of the "BIOS merge", we will need to test the resulting code with a BIOS configuration. We will need to setup the configuration based on the pseudo-parallel TRENDY config.

BIOS config:

  • ACT9
  • 1000pts
  • 0.25° Australia-wide
  • 0.05° Australia-wide
  • need some historical and future tests
  • no LUC, no BLAZE.

Success criteria: good if bitwise comparable but very likely not achievable.

@ccarouge
Copy link
Member Author

First step, keep the inputs for TRENDY and change the science configs to mirror the BIOS config.

@ccarouge ccarouge added this to the BIOS3 milestone Jan 14, 2025
@ccarouge ccarouge added the priority:medium Medium priority issues to become high priority issues after a release. label Jan 14, 2025
@har917
Copy link
Collaborator

har917 commented Jan 14, 2025

Some more detail/thoughts around this:

The three (four) tests cases should be simply differentiated by the specification of a landmask:

  • BIOS (AGCD) meteorology data is provided at 0.05 degrees - aligned with the revised gridinfo
  • ACT9 case is then defined uniquely via the specification of a landmask file (serial only needed)
  • 0.05 degree case needs no additional landmask file.
  • 1000pts and 0.25 degree cases are potentially more complicated as they require the combination of an externally provided landmask and the TRENDY partitioning/recombination.

So the problematic cases - 2 elements. 1) Can the partitioning part of the process work on top of an externally specified landmask file? i.e. the landmasks created/used by the serial jobs need to be the overlay of the external landmask and the act of splitting it up. 2) Can the recombination step provide output on an externally specified landmask (or does it default to the input meteorology)?

Another aspect to think through - the TRENDY partitioning process used some kind of randomisation of grid cells to ensure a reasonable mix of fast and slow (to compute) grid cells in each job. This allows for efficient kSU usage and through flow during the multi-stage process. Do we need to do the same randomisation of cells for BIOS or can we use a geographically defined (lat-lon box) as the means to split things up? How would this impact the recombination script?

@Whyborn
Copy link

Whyborn commented Jan 30, 2025

I have got the ACT9 test case running with the TRENDY pseudo-parallel configuration. Some minor adjustments of the original BIOS landmask were required:

  1. Change the mask variable name to 'land'
  2. Change the type of the mask variable to Int8

The TRENDY partitioning process uses a "tiling" approach- effectively walks through the the points in the order they appear in memory, and assigns them cyclically to each process. Say we wanted to run the ACT9 test case with 4 parallel jobs, it would assign:

Job Points
1 1, 5, 9
2 2, 5
3 3, 6
4 4, 8

I've also managed to at least begin the Australia-wide 0.05 degree case without them all crashing on start-up, but I figured I would do some more testing with the cheaper cases before burning compute resources on this.

@har917
Copy link
Collaborator

har917 commented Jan 30, 2025

Good news - I think the same process should work for the 0.25 and 1000pts cases as well. What's perhaps less clear is whether the recombination script will work without modification.

I've also managed to at least begin the Australia-wide 0.05 degree case.

An aside - we haven't run a 0.05 degree case in living memory. We certainly don't have a test case that we could compare against. Indeed I'm not sure that the MPI code could actually run this given the need to bring everything back on to one processor (for output). Last I heard on this - we were running a reduced science configuration and still running out of memory & walltime (on raijin).

@Whyborn
Copy link

Whyborn commented Jan 30, 2025

We don't currently have netCDF meteorology for the 0.25 degree case, only at 0.05 degree resolution, which is the only barrier there I think.

Is the intention to eventually run that 0.05 degree case? I'm surprised that it's so expensive. Getting to real performance and memory improvements for CABLE is a fair way down the track I think.

@har917
Copy link
Collaborator

har917 commented Jan 30, 2025

We shouldn't need the 0.25 meteorology in netCDF - we define a 0.25 degree landmask, then partition that across the TRENDY poor man's processors, that should be able to pick up the met information using the 0.05 degree netCDF files.

The question that's left is the recombination - I suspect that TRENDY would try to recombine back to 0.05 degrees and we really only want the recombined version at 0.25 degree.

@Whyborn
Copy link

Whyborn commented Jan 30, 2025

Unfortunately not, the TRENDY configuration doesn't do any interpolation. It practically requires that the landmask to be at the same resolution as the meteorology, since it takes the IDs of the land points to extract meteorology rather than the actual (lon, lat) coordinates.

@har917
Copy link
Collaborator

har917 commented Jan 30, 2025

the TRENDY configuration doesn't do any interpolation.

True - but similarly the 0.25 degree run is a subsample of the 0.05 degree. So if we create a land mask (at 0.05 degrees) that selects the 0.25 degree land points we should get the same answer.

We don't have that land mask at hand but it should be relatively easy to create. The key tasks would be

  • during the TRENDY decomposition - overlaying the 0.25 degree land mask with the processor land mask to get the actual land mask to be used on that processor (should be a simple array operation)
  • recombining things at the end.

@Whyborn
Copy link

Whyborn commented Jan 30, 2025

Ah yep, you could select points at 0.25 degree resolution from the 0.05 landmask.

@ccarouge
Copy link
Member Author

Test simulations done with the CABLE-POP_TRENDY branch using a file mask to select the points:

  • ACT9 S0 and S3 configurations: run, pseudo-parallel with 4 cpus and recombination. Evaluation: quick sanity check
  • 1000 pts, S0 config: run, 50 processors. Evaluation: none
  • 0.25° , S0 config: to run with full meteorology only
  • ACT9, S2 config: to run
  • 1000 pts, S2 config: to run
  • 0.25°, S2 config: to run with full meteorology only

Note:

  • outputs as a grid for 1000 pts. Change from NESP2pt9_BLAZE branch.

To do: Need to convert full meteorology to netcdf.

Evaluation of the simulations:

  • using monthly outputs, max/min/mean of surface fluxes and carbon pools, at grid points and global average.

Spin up:
How many spin ups to do? With analytic spin up, doing 4 + 4 in both sections but could be over the top.

Next steps:

  • look in more details at the 3 simulations we have so far.
  • run the conversion of the meteorology (@har917 look for conversion script)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:medium Medium priority issues to become high priority issues after a release.
Projects
None yet
Development

No branches or pull requests

3 participants