Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(closes #2787) Add NEMOv5 GPU tests in the integration test and guarantee full reproducibility #2859

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
98b873b
FIX: added omp declare target to sbc_phy.f90
addy419 Oct 9, 2024
9796cbc
Merge latest changes
sergisiso Oct 18, 2024
44a2efa
Merge remote-tracking branch 'origin/2671_array_privatisation' into n…
sergisiso Oct 18, 2024
74e4a22
Bring to head of array_privatisation branch
sergisiso Nov 27, 2024
f8560d3
#2671 Update NEMO OpenMP GPU script with array privatisation
sergisiso Nov 27, 2024
7ed6313
Add more NEMO GPU exclusions and NEMO_FUNCTIONS values
sergisiso Dec 13, 2024
fb235fb
Uncomment NEMOv5 for GPU test
sergisiso Dec 13, 2024
336e814
Merge remote-tracking branch 'origin/master' into nemo_v5
sergisiso Dec 13, 2024
ee43ac9
Fix flake8 issues
sergisiso Dec 13, 2024
d12df3d
Fix NEMO_FUNCTIONS
sergisiso Dec 13, 2024
ccf8bf1
Fix NEMO_FUNCTIONS
sergisiso Dec 13, 2024
4743af7
More fixes in the NEMOv5 scripts
sergisiso Dec 17, 2024
77a9c96
Do not run the integration tests passtrough in the first attempt
sergisiso Dec 17, 2024
b479973
Merge the import_during_frontend branch
sergisiso Dec 17, 2024
4f7776a
Reduce NEMOv5 integration test timesteps
sergisiso Dec 17, 2024
0f44e28
Add NEMOv5 10-timestep KGOs
sergisiso Dec 17, 2024
873be3b
Some more updates for NEMOv5
sergisiso Dec 17, 2024
a64e1da
add OMP DECLARE TARGET to sbc_phy and solfrac_mod
addy419 Dec 18, 2024
28ca0df
Add NEMO GPU exclusions
sergisiso Jan 13, 2025
8dfc709
Merge remote-tracking branch 'refs/remotes/origin/nemo_v5' into nemo_v5
sergisiso Jan 13, 2025
80ccb50
Exlcude more files for increase NEMOv5 accuracy in ORCA1
sergisiso Jan 15, 2025
f781831
removed exclusions by moving to omp teams loop
addy419 Jan 15, 2025
1eaf4c1
added OMPTeamsLoop Directive
addy419 Jan 15, 2025
10a55d9
removed lib_mpp from exclusions
addy419 Jan 15, 2025
39f84be
removed zdftke from exclusion list
addy419 Jan 15, 2025
92c82ce
Add MERGE intrinsic as available for GPU offloading
sergisiso Jan 15, 2025
2d12fa4
added zdftke to exclusion list - Incorrect results
addy419 Jan 15, 2025
9e983b1
removed dynzdf from offloading issues and moved to performance issues
addy419 Jan 16, 2025
d8072d6
removed dynspg_ts.f90 and geo2ocean.f90 from exclude list
addy419 Jan 16, 2025
fa53724
removed excluded files affected by no -Kieee flag
addy419 Jan 17, 2025
5ffaea8
removed zdftke from direct exclusions (SQRT function)
addy419 Jan 17, 2025
94b70c4
Re-introduce dynspg_ts with math issues for full reproducibility
sergisiso Jan 17, 2025
4a10d0f
Mark the files that have parenthesis that matter for full reproducibi…
sergisiso Jan 20, 2025
7179f0e
Merge remote-tracking branch 'origin/master' into nemo_v5
sergisiso Jan 20, 2025
450d154
Bring to master
sergisiso Jan 24, 2025
ef2345d
Add NEMOv5 ORCA1 test to the integration test
sergisiso Jan 24, 2025
953d115
Fix some test issues
sergisiso Jan 24, 2025
68b117e
Add more intrinsics to GPUs
sergisiso Jan 28, 2025
71db378
Merge remote-tracking branch 'refs/remotes/origin/nemo_v5' into nemo_v5
sergisiso Jan 28, 2025
46703fe
Merge remote-tracking branch 'origin/master' into nemo_v5
sergisiso Jan 28, 2025
0bfa065
Update NEMO CI tests
sergisiso Jan 28, 2025
cad20f3
Fix test for intrinsics available on GPU
sergisiso Jan 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/nemo_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ jobs:

# PSyclone passthrough for MetOffice NEMO
- name: NEMO MetOffice Passthrough
if: ${{ github.run_attempt != '1' }}
run: |
. .runner_venv/bin/activate
export PSYCLONE_NEMO_DIR=${GITHUB_WORKSPACE}/examples/nemo/scripts
Expand Down
88 changes: 65 additions & 23 deletions .github/workflows/nemo_v5_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ jobs:
run_if_on_mirror:
if: ${{ github.repository == 'stfc/PSyclone-mirror' }}
runs-on: self-hosted
env:
NEMODIR_NAME: NEMOv5_Jan25

steps:
- uses: actions/checkout@v3
Expand Down Expand Up @@ -72,14 +74,15 @@ jobs:

# PSyclone passthrough for 5.0-beta of NEMO.
- name: NEMO 5.0 gfortran passthrough
if: ${{ github.run_attempt != '1' }}
run: |
# Set up environment
source /apps/spack/psyclone-spack/spack-repo/share/spack/setup-env.sh
spack unload && spack load nemo-build-environment%gcc@14
source .runner_venv/bin/activate
export PSYCLONE_NEMO_DIR=${GITHUB_WORKSPACE}/examples/nemo/scripts
export PSYCLONE_HOME=${PWD}/.runner_venv
export NEMO_DIR=${HOME}/NEMOv5
export NEMO_DIR=${HOME}/${NEMODIR_NAME}
export TEST_DIR=BENCH_PASSTHROUGH_GCC

# Set up FCM: PATHs are loaded from SPACK, we only need to set the FCFLAGS
Expand All @@ -101,20 +104,21 @@ jobs:
diff $PSYCLONE_NEMO_DIR/KGOs/run.stat.bench.gfortran.small.100steps run.stat

- name: NEMO 5.0 nvidia passthrough
if: ${{ github.run_attempt != '1' }}
run: |
# Set up environment
source /apps/spack/psyclone-spack/spack-repo/share/spack/setup-env.sh
spack unload && spack load nemo-build-environment%nvhpc@24.5
source .runner_venv/bin/activate
export PSYCLONE_NEMO_DIR=${GITHUB_WORKSPACE}/examples/nemo/scripts
export PSYCLONE_HOME=${PWD}/.runner_venv
export NEMO_DIR=${HOME}/NEMOv5
export NEMO_DIR=${HOME}/${NEMODIR_NAME}
export TEST_DIR=BENCH_PASSTHROUGH_NVHPC

# Set up FCM: PATHs are loaded from SPACK, we only need to set the FCFLAGS
cd $NEMO_DIR
cp $PSYCLONE_NEMO_DIR/KGOs/arch-linux_spack.fcm arch/arch-linux_spack.fcm
export FCFLAGS="-i4 -Mr8 -O1 -Kieee -nofma -Mnovect"
export FCFLAGS="-i4 -Mr8 -O1 -nofma -Mnovect"

# Clean up and compile
# Without key_mpi_off it fails to compile (even without psyclone)
Expand All @@ -132,14 +136,15 @@ jobs:
echo "Time-stepping duration = " $VAR_TIME

- name: NEMO 5.0 Intel passthrough
if: ${{ github.run_attempt != '1' }}
run: |
# Set up environment
source /apps/spack/psyclone-spack/spack-repo/share/spack/setup-env.sh
spack unload && spack load nemo-build-environment%oneapi
source .runner_venv/bin/activate
export PSYCLONE_NEMO_DIR=${GITHUB_WORKSPACE}/examples/nemo/scripts
export PSYCLONE_HOME=${PWD}/.runner_venv
export NEMO_DIR=${HOME}/NEMOv5
export NEMO_DIR=${HOME}/${NEMODIR_NAME}
export TEST_DIR=BENCH_PASSTHROUGH_ONEAPI

# Set up FCM: PATHs are loaded from SPACK, we only need to set the FCFLAGS
Expand Down Expand Up @@ -167,7 +172,7 @@ jobs:
source .runner_venv/bin/activate
export PSYCLONE_NEMO_DIR=${GITHUB_WORKSPACE}/examples/nemo/scripts
export PSYCLONE_HOME=${PWD}/.runner_venv
export NEMO_DIR=${HOME}/NEMOv5
export NEMO_DIR=${HOME}/${NEMODIR_NAME}
export TEST_DIR=BENCH_OMP_THREADING_GCC

# Set up FCM: PATHs are loaded from SPACK, we only need to set the FCFLAGS
Expand All @@ -183,10 +188,10 @@ jobs:

# Run test
cd $NEMO_DIR/tests/${TEST_DIR}/EXP00
cp $PSYCLONE_NEMO_DIR/KGOs/namelist_cfg_bench_small namelist_cfg
cp $PSYCLONE_NEMO_DIR/KGOs/namelist_cfg_bench_small_10 namelist_cfg
OMP_NUM_THREADS=4 mpirun -np 1 ./nemo
tail run.stat
diff $PSYCLONE_NEMO_DIR/KGOs/run.stat.bench.gfortran.small.100steps run.stat
diff $PSYCLONE_NEMO_DIR/KGOs/run.stat.bench.gfortran.small.10steps run.stat
export TIME_sec=$(grep "local proces" timing.output | head -n 1 | awk '{print $4}' | tr -d s)
${HOME}/mongosh-2.1.1-linux-x64/bin/mongosh \
"mongodb+srv://cluster0.x8ncpxi.mongodb.net/PerformanceMonitoring" \
Expand All @@ -197,22 +202,22 @@ jobs:
ci_test: "NEMOv5 OpenMP for CPU", nemo_version: "NEMOv5", system: "GlaDos",
compiler:"gfortran-14" , date: new Date(), elapsed_time: '"${TIME_sec}"'})'

- name: NEMO 5.0 nvidia OpenMP for GPUs (managed memory)
- name: NEMO 5.0 nvidia OpenMP for GPUs (BENCH - managed memory)
run: |
# Set up environment
source /apps/spack/psyclone-spack/spack-repo/share/spack/setup-env.sh
spack unload && spack load nemo-build-environment%nvhpc@24.5
source .runner_venv/bin/activate
export PSYCLONE_NEMO_DIR=${GITHUB_WORKSPACE}/examples/nemo/scripts
export PSYCLONE_HOME=${PWD}/.runner_venv
export NEMO_DIR=${HOME}/NEMOv5
export NEMO_DIR=${HOME}/${NEMODIR_NAME}
export TEST_DIR=BENCH_OMP_OFFLOAD_NVHPC

# Set up FCM: PATHs are loaded from SPACK, we only need to set the FCFLAGS
# We compile at -O1 to permit comparison of the results.
cd $NEMO_DIR
cp $PSYCLONE_NEMO_DIR/KGOs/arch-linux_spack.fcm arch/arch-linux_spack.fcm
export FCFLAGS="-i4 -Mr8 -O1 -Kieee -nofma -Mnovect -g -mp=gpu -gpu=managed"
export FCFLAGS="-i4 -Mr8 -O1 -nofma -Mnovect -g -mp=gpu -gpu=managed,math_uniform"

# Clean up and compile
# Without key_mpi_off it fails to compile (even without psyclone)
Expand All @@ -221,17 +226,54 @@ jobs:
add_key "key_mpi_off key_nosignedzero" -j 4 -v 1

# Run test (disabled because it is currently too slow)
# cd $NEMO_DIR/tests/${TEST_DIR}/EXP00
# cp $PSYCLONE_NEMO_DIR/KGOs/namelist_cfg_bench_small namelist_cfg
# ./nemo
cd $NEMO_DIR/tests/${TEST_DIR}/EXP00
cp $PSYCLONE_NEMO_DIR/KGOs/namelist_cfg_bench_small_10 namelist_cfg
./nemo
# tail run.stat
# diff $PSYCLONE_NEMO_DIR/KGOs/run.stat.bench.nvhpc.small.100steps run.stat
# export TIME_sec=$(grep "local proces" timing.output | head -n 1 | awk '{print $4}' | tr -d s)
# ${HOME}/mongosh-2.1.1-linux-x64/bin/mongosh \
# "mongodb+srv://cluster0.x8ncpxi.mongodb.net/PerformanceMonitoring" \
# --quiet --apiVersion 1 --username ${{ secrets.MONGODB_USERNAME }} \
# --password ${{ secrets.MONGODB_PASSWORD }} \
# --eval 'db.GitHub_CI.insertOne({branch_name: "'"$GITHUB_REF_NAME"'", commit: "'"$GITHUB_SHA"'",
# github_job: "'"$GITHUB_RUN_ID"'"-"'"$GITHUB_RUN_ATTEMPT"'",
# ci_test: "NEMOv5 OpenMP for GPU", nemo_version: "NEMOv5", system: "GlaDos",
# compiler:"nvhpc-24.5" , date: new Date(), elapsed_time: '"${TIME_sec}"'})'
diff $PSYCLONE_NEMO_DIR/KGOs/run.stat.bench.nvhpc.small.10steps run.stat
export TIME_sec=$(grep "local proces" timing.output | head -n 1 | awk '{print $4}' | tr -d s)
${HOME}/mongosh-2.1.1-linux-x64/bin/mongosh \
"mongodb+srv://cluster0.x8ncpxi.mongodb.net/PerformanceMonitoring" \
--quiet --apiVersion 1 --username ${{ secrets.MONGODB_USERNAME }} \
--password ${{ secrets.MONGODB_PASSWORD }} \
--eval 'db.GitHub_CI.insertOne({branch_name: "'"$GITHUB_REF_NAME"'", commit: "'"$GITHUB_SHA"'",
github_job: "'"$GITHUB_RUN_ID"'"-"'"$GITHUB_RUN_ATTEMPT"'",
ci_test: "NEMOv5 OpenMP for GPU (BENCH)", nemo_version: "NEMOv5", system: "GlaDos",
compiler:"nvhpc-24.5" , date: new Date(), elapsed_time: '"${TIME_sec}"'})'

- name: NEMO 5.0 nvidia OpenMP for GPUs (UKMO ORCA1 - managed memory)
run: |
# Set up environment
source /apps/spack/psyclone-spack/spack-repo/share/spack/setup-env.sh
spack unload && spack load nemo-build-environment%nvhpc@24.5
source .runner_venv/bin/activate
export PSYCLONE_NEMO_DIR=${GITHUB_WORKSPACE}/examples/nemo/scripts
export PSYCLONE_HOME=${PWD}/.runner_venv
export NEMO_DIR=${HOME}/${NEMODIR_NAME}
export TEST_DIR=ORCA1_OMP_OFFLOAD_NVHPC

# Set up FCM: PATHs are loaded from SPACK, we only need to set the FCFLAGS
# We compile at "-O1 -nofma -Mnovect -gpu=math_uniform" to permit comparison of the results.
cd $NEMO_DIR
cp $PSYCLONE_NEMO_DIR/KGOs/arch-linux_spack.fcm arch/arch-linux_spack.fcm
export FCFLAGS="-i4 -Mr8 -O1 -nofma -Mnovect -g -mp=gpu -gpu=managed,math_uniform"

# Clean up and compile
# Without key_mpi_off it fails to compile (even without psyclone)
./makenemo -r GOSI10p0.0_like_eORCA1 -m linux_spack -n ${TEST_DIR} clean -y
./makenemo -r GOSI10p0.0_like_eORCA1 -m linux_spack -n ${TEST_DIR} -p ${PSYCLONE_NEMO_DIR}/omp_gpu_trans.py \
add_key "key_mpi_off key_nosignedzero" -j 4 -v 1

# Run test (disabled because it is currently too slow)
cd $NEMO_DIR/tests/${TEST_DIR}/EXP00
./nemo
diff $PSYCLONE_NEMO_DIR/KGOs/run.stat.orca1.nvhpc.10steps run.stat
export TIME_sec=$(grep "local proces" timing.output | head -n 1 | awk '{print $4}' | tr -d s)
${HOME}/mongosh-2.1.1-linux-x64/bin/mongosh \
"mongodb+srv://cluster0.x8ncpxi.mongodb.net/PerformanceMonitoring" \
--quiet --apiVersion 1 --username ${{ secrets.MONGODB_USERNAME }} \
--password ${{ secrets.MONGODB_PASSWORD }} \
--eval 'db.GitHub_CI.insertOne({branch_name: "'"$GITHUB_REF_NAME"'", commit: "'"$GITHUB_SHA"'",
github_job: "'"$GITHUB_RUN_ID"'"-"'"$GITHUB_RUN_ATTEMPT"'",
ci_test: "NEMOv5 OpenMP for GPU (ORCA1)", nemo_version: "NEMOv5", system: "GlaDos",
compiler:"nvhpc-24.5" , date: new Date(), elapsed_time: '"${TIME_sec}"'})'
Loading
Loading