diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst new file mode 100644 index 00000000..28d11e3d --- /dev/null +++ b/Docs/source/MPMD_Tutorials.rst @@ -0,0 +1,180 @@ +.. _tutorials_mpmd: + +AMReX-MPMD +========== + +AMReX-MPMD utilizes the Multiple Program Multiple Data (MPMD) feature of MPI to provide cross-functionality for AMReX-based codes. +The framework enables data transfer across two different applications through **MPMD::Copier** class, which typically takes **BoxArray** of its application as an argument. +**Copier** instances created in both the applications together identify the overlapping cells for which the data transfer must occur. +**Copier::send** & **Copier::recv** functions, which take a **MultiFab** as an argument, are used to transfer the desired data of overlapping regions. + +Case-1 +------ + +The current case demonstrates the MPMD capability across two C++ applications. + +Contents +^^^^^^^^ + +**Source_1** subfolder contains ``main_1.cpp`` which will be treated as the first application. +Similarly, **Source_2** subfolder contains ``main_2.cpp`` that will be treated as the second application. + +Overview +^^^^^^^^ + +The domain in ``main_1.cpp`` is set to ``lo = {0, 0, 0}`` and ``hi = {31, 31, 31}``, while the domain in ``main_2.cpp`` is set to ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``. +Hence, the data transfer will occur for the region ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``. +Furthermore, the domain in ``main_1.cpp`` is split into boxes using ``max_grid_size=16``, while the domain in ``main_2.cpp`` is split using ``max_grid_size=8``. +Therefore, the **BoxArray**, and moreover, the number of boxes for the overlapping region are different across the two applications. +The data transfer demonstration is performed using a two component *MultiFab*. +The first component is populated in ``main_1.cpp`` before it is transferred to ``main_2.cpp``. +The second component is populated in ``main_2.cpp`` based on the received first component. +Finally, the second component is transferred from ``main_2.cpp`` to ``main_1.cpp``. +It can be seen from the plotfile generated by ``main_1.cpp`` that the second component is non-zero only for the overlapping region. + +Compile +^^^^^^^ + +The compile process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-1/``. + +.. code-block:: bash + + # cd into Source_1 to compile the first application + cd Source_1/ + # Include USE_CUDA=TRUE for CUDA GPUs + make USE_MPI=TRUE + + # cd into Source_2 to compile the second application + cd ../Source_2/ + # Include USE_CUDA=TRUE for CUDA GPUs + make USE_MPI=TRUE + +Run +^^^ + +Here, the current case is being run using a total of 12 MPI ranks with 8 allocated to ``main_1.cpp`` and the rest for ``main_2.cpp``. +Please note that MPI ranks attributed to each application/code need to be continuous, i.e., MPI ranks 0-7 are for ``main_1.cpp`` and 8-11 are for ``main_2.cpp``. +This may be default behaviour on several systems. +Furthermore, the run process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-1/``. + +.. code-block:: bash + + # Running the MPMD process with 12 ranks + mpirun -np 8 Source_1/main3d.gnu.DEBUG.MPI.ex : -np 4 Source_2/main3d.gnu.DEBUG.MPI.ex + +Running on Perlmutter (NERSC) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This section presents information regarding sample scripts that can be used to run this case on `Perlmutter (NERSC) `_. +The scripts ``mpmd_cpu.sh`` and ``mpmd_cpu.conf`` can be used to run the CPU version. +Similarly, ``mpmd_gpu.sh`` and ``mpmd_gpu.conf`` can be used to run the GPU version. +Please note that ``perlmutter_gpu.profile`` must be leveraged to compile the GPU version of the applications. + +The content presented here is based on the following references: + + * `NERSC documentation `_ + * `WarpX documentation `_ + +Case-2 +------ + +The current case demonstrates the MPMD capability across C++ and python applications. +This language interoperability is achieved through the python bindings of AMReX, `pyAMReX `_. + +Contents +^^^^^^^^ + +``main.cpp`` will be the C++ application and ``main.py`` will be the python application. + +Overview +^^^^^^^^ + +In the previous case (Case-1) of MPMD each application has its own domain, and therefore, different **BoxArray**. +However, there exist scenarios where both applications deal with the same **BoxArray**. +The current case presents such a scenario where the **BoxArray** is defined only in the ``main.cpp`` application, but this information is relayed to ``main.py`` application through the **MPMD::Copier**. + +**Please ensure that the same AMReX source code is used to compile both the applications.** + +Compiling and Running on a local system +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The compile process for pyAMReX is only briefly described here. +Please refer to the `pyAMReX documentation `_ for more details. +**It must be mentioned that mpi4py is an important dependency that should exist in the utilized environment**. +For pyAMReX conda environment, it can be installed using ``conda install -c conda-forge mpi4py``, `resource `_. + +.. code-block:: bash + + # find dependencies & configure + # Include -DAMReX_GPU_BACKEND=CUDA for gpu version + cmake -S . -B build -DAMReX_SPACEDIM="1;2;3" -DAMReX_MPI=ON -DpyAMReX_amrex_src=/path/to/amrex + + # compile & install, here we use four threads + cmake --build build -j 4 --target pip_install + +main.cpp compile +^^^^^^^^^^^^^^^^ + +The compile process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``. + +.. code-block:: bash + + # Include USE_CUDA=TRUE for CUDA GPUs + make USE_MPI=TRUE + +Run +^^^ + +Here, the current case is being run using a total of 12 MPI ranks with 8 allocated to ``main.cpp`` and the rest for ``main.py``. +As mentioned earlier, the MPI ranks attributed to each application/code need to be continuous, i.e., MPI ranks 0-7 are for ``main.cpp`` and 8-11 are for ``main.py``. +This may be default behaviour on several systems. +Furthermore, the run process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``. + +.. code-block:: bash + + # Running the MPMD process with 12 ranks + mpirun -np 8 ./main3d.gnu.DEBUG.MPI.ex : -np 4 python main.py + +Compiling and Running on Perlmutter (NERSC) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Running this case on perlmutter involves creating a python virtual environment. +pyAMReX must be compiled and installed into this virtual environment after its creation. +Similar to the previous case, this case also has supporting scripts to run on CPUs and GPUs. + +The process detailed below assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``. + +Creating a virtual environment +"""""""""""""""""""""""""""""" + +.. code-block:: bash + + # Setup the required environment variables + source perlmutter_gpu.profile + + # BEFORE PERFORMING THE FOLLOWING COMMANDS + # MOVE TO A DIRECTORY WHERE THE PYTHON VIRTUAL ENVIRONMENT MUST EXIST + + python3 -m pip install --upgrade pip + python3 -m pip install --upgrade virtualenv + python3 -m pip cache purge + python3 -m venv pyamrex-gpu + source pyamrex-gpu/bin/activate + python3 -m pip install --upgrade pip + python3 -m pip install --upgrade build + python3 -m pip install --upgrade packaging + python3 -m pip install --upgrade wheel + python3 -m pip install --upgrade setuptools + python3 -m pip install --upgrade cython + python3 -m pip install --upgrade numpy + python3 -m pip install --upgrade pandas + python3 -m pip install --upgrade scipy + MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py + python3 -m pip install --upgrade openpmd-api + python3 -m pip install --upgrade matplotlib + python3 -m pip install --upgrade yt + python3 -m pip install --upgrade cupy-cuda12x # CUDA 12 compatible wheel + +The content presented here is based on the following reference: + + * `WarpX documentation `_ diff --git a/Docs/source/index.rst b/Docs/source/index.rst index 2674a601..374e1d04 100644 --- a/Docs/source/index.rst +++ b/Docs/source/index.rst @@ -51,6 +51,7 @@ sorted by the following categories: - :ref:`heFFTe` -- heFFTe distributed tutorials. - :ref:`Linear Solvers` -- Examples of several linear solvers. - :ref:`ML/PYTORCH` -- Use of pytorch models to replace point-wise computational kernels. +- :ref:`MPMD` -- Usage of AMReX-MPMD (Multiple Program Multiple Data) framework. - :ref:`MUI` -- Incorporates the MxUI/MUI (Multiscale Universal interface) frame into AMReX. - :ref:`Particles` -- Basic usage of AMReX's particle data structures. - :ref:`Python` -- Using AMReX and interfacing with AMReX applications form Python - via `pyAMReX `__ @@ -75,6 +76,7 @@ sorted by the following categories: heFFTe_Tutorial LinearSolvers_Tutorial ML_Tutorial + MPMD_Tutorials MUI_Tutorial Particles_Tutorial Python_Tutorial @@ -102,6 +104,8 @@ sorted by the following categories: .. _`Linear Solvers`: LinearSolvers_Tutorial.html +.. _`MPMD`: MPMD_Tutorials.html + .. _`MUI`: MUI_Tutorial.html .. _`Particles`: Particles_Tutorial.html diff --git a/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile b/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile new file mode 100644 index 00000000..b6b95e7b --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile @@ -0,0 +1,20 @@ +AMREX_HOME ?= ../../../../../amrex + +DEBUG = TRUE + +DIM = 3 + +COMP = gcc + +USE_MPI = TRUE + +USE_OMP = FALSE +USE_CUDA = FALSE +USE_HIP = FALSE + +include $(AMREX_HOME)/Tools/GNUMake/Make.defs + +include ./Make.package +include $(AMREX_HOME)/Src/Base/Make.package + +include $(AMREX_HOME)/Tools/GNUMake/Make.rules diff --git a/ExampleCodes/MPMD/Case-1/Source_1/Make.package b/ExampleCodes/MPMD/Case-1/Source_1/Make.package new file mode 100644 index 00000000..16bde740 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_1/Make.package @@ -0,0 +1 @@ +CEXE_sources += main_1.cpp diff --git a/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp b/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp new file mode 100644 index 00000000..8097faef --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp @@ -0,0 +1,69 @@ + +#include +#include +#include +#include +#include +#include + +int main(int argc, char* argv[]) +{ + // Initialize amrex::MPMD to establish communication across the two apps + MPI_Comm comm = amrex::MPMD::Initialize(argc, argv); + amrex::Initialize(argc,argv,true,comm); + { + amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n"; + // Number of data components at each grid point in the MultiFab + int ncomp = 2; + // how many grid cells in each direction over the problem domain + int n_cell = 32; + // how many grid cells are allowed in each direction over each box + int max_grid_size = 16; + //BoxArray -- Abstract Domain Setup + // integer vector indicating the lower coordindate bounds + amrex::IntVect dom_lo(0,0,0); + // integer vector indicating the upper coordindate bounds + amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1); + // box containing the coordinates of this domain + amrex::Box domain(dom_lo, dom_hi); + // will contain a list of boxes describing the problem domain + amrex::BoxArray ba(domain); + // chop the single grid into many small boxes + ba.maxSize(max_grid_size); + // Distribution Mapping + amrex::DistributionMapping dm(ba); + // Create an MPMD Copier based on current ba & dm + auto copr = amrex::MPMD::Copier(ba,dm,false); + //Define MuliFab + amrex::MultiFab mf(ba, dm, ncomp, 0); + //Geometry -- Physical Properties for data on our domain + amrex::RealBox real_box ({0., 0., 0.}, {1. , 1., 1.}); + amrex::Geometry geom(domain, &real_box); + //Calculate Cell Sizes + amrex::GpuArray dx = geom.CellSizeArray(); //dx[0] = dx dx[1] = dy dx[2] = dz + //Fill only the first component of the MultiFab + for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){ + const amrex::Box& bx = mfi.validbox(); + const amrex::Array4& mf_array = mf.array(mfi); + + amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){ + + amrex::Real x = (i+0.5) * dx[0]; + amrex::Real y = (j+0.5) * dx[1]; + amrex::Real z = (k+0.5) * dx[2]; + amrex::Real r_squared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01; + + mf_array(i,j,k,0) = 1.0 + std::exp(-r_squared); + + }); + } + // Send ONLY the first populated MultiFab component to main_2.cpp + copr.send(mf,0,1); + // Receive ONLY the second MultiFab component from main_2.cpp + copr.recv(mf,1,1); + //Plot MultiFab Data + WriteSingleLevelPlotfile("plt_cpp_1", mf, {"comp0","comp1"}, geom, 0., 0); + } + amrex::Finalize(); + amrex::MPMD::Finalize(); +} diff --git a/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile b/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile new file mode 100644 index 00000000..b6b95e7b --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile @@ -0,0 +1,20 @@ +AMREX_HOME ?= ../../../../../amrex + +DEBUG = TRUE + +DIM = 3 + +COMP = gcc + +USE_MPI = TRUE + +USE_OMP = FALSE +USE_CUDA = FALSE +USE_HIP = FALSE + +include $(AMREX_HOME)/Tools/GNUMake/Make.defs + +include ./Make.package +include $(AMREX_HOME)/Src/Base/Make.package + +include $(AMREX_HOME)/Tools/GNUMake/Make.rules diff --git a/ExampleCodes/MPMD/Case-1/Source_2/Make.package b/ExampleCodes/MPMD/Case-1/Source_2/Make.package new file mode 100644 index 00000000..b4266b6f --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_2/Make.package @@ -0,0 +1 @@ +CEXE_sources += main_2.cpp diff --git a/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp b/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp new file mode 100644 index 00000000..847421fe --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp @@ -0,0 +1,62 @@ + +#include +#include +#include +#include +#include +#include + +int main(int argc, char* argv[]) +{ + // Initialize amrex::MPMD to establish communication across the two apps + MPI_Comm comm = amrex::MPMD::Initialize(argc, argv); + amrex::Initialize(argc,argv,true,comm); + { + amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n"; + // Number of data components at each grid point in the MultiFab + int ncomp = 2; + // how many grid cells in each direction over the problem domain + int n_cell = 32; + // how many grid cells are allowed in each direction over each box + int max_grid_size = 8; + //BoxArray -- Abstract Domain Setup + // integer vector indicating the lower coordindate bounds + amrex::IntVect dom_lo(n_cell/2, n_cell/2, n_cell/2); + // integer vector indicating the upper coordindate bounds + amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1); + // box containing the coordinates of this domain + amrex::Box domain(dom_lo, dom_hi); + // will contain a list of boxes describing the problem domain + amrex::BoxArray ba(domain); + // chop the single grid into many small boxes + ba.maxSize(max_grid_size); + // Distribution Mapping + amrex::DistributionMapping dm(ba); + // Create an MPMD Copier based on current ba & dm + auto copr = amrex::MPMD::Copier(ba,dm,false); + //Define MuliFab + amrex::MultiFab mf(ba, dm, ncomp, 0); + //Geometry -- Physical Properties for data on our domain + amrex::RealBox real_box ({0.5, 0.5, 0.5}, {1. , 1., 1.}); + amrex::Geometry geom(domain, &real_box); + // Receive ONLY the first populated MultiFab component from main_1.cpp + copr.recv(mf,0,1); + //Fill the second component of the MultiFab + for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){ + const amrex::Box& bx = mfi.validbox(); + const amrex::Array4& mf_array = mf.array(mfi); + + amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){ + + mf_array(i,j,k,1) = amrex::Real(10.)*mf_array(i,j,k,0); + + }); + } + // Send ONLY the second MultiFab component (populated here) to main_1.cpp + copr.send(mf,1,1); + //Plot MultiFab Data + WriteSingleLevelPlotfile("plt_cpp_2", mf, {"comp0","comp1"}, geom, 0., 0); + } + amrex::Finalize(); + amrex::MPMD::Finalize(); +} diff --git a/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf b/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf new file mode 100644 index 00000000..cdb07f28 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf @@ -0,0 +1,2 @@ +0-7 ./Source_1/main3d.gnu.x86-milan.DEBUG.MPI.ex +8-11 ./Source_2/main3d.gnu.x86-milan.DEBUG.MPI.ex diff --git a/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh new file mode 100644 index 00000000..d7ed7832 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh @@ -0,0 +1,17 @@ +#!/bin/bash +#SBATCH -N 1 # Total number of nodes +#SBATCH -n 12 # Total number of tasks +#SBATCH -c 4 # number of processors per MPI task +#SBATCH -C cpu +#SBATCH -q debug +#SBATCH -J mpmd_test +#SBATCH -t 00:05:00 +#SBATCH -A mpxxx + +#OpenMP settings: +export OMP_NUM_THREADS=1 +export OMP_PLACES=threads +export OMP_PROC_BIND=spread + +#run the application: +srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf b/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf new file mode 100644 index 00000000..6cc3f336 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf @@ -0,0 +1,2 @@ +0-7 ./Source_1/main3d.gnu.DEBUG.MPI.CUDA.ex +8-11 ./Source_2/main3d.gnu.DEBUG.MPI.CUDA.ex diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh new file mode 100644 index 00000000..f45611a3 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh @@ -0,0 +1,23 @@ +#!/bin/bash +#SBATCH -N 3 # Total number of nodes +#SBATCH -n 12 # Total number of tasks +#SBATCH -c 4 # number of processors per MPI task +#SBATCH -C gpu +#SBATCH -G 12 # Total number of GPUs +#SBATCH -q debug +#SBATCH -J mpmd_test +#SBATCH -t 00:05:00 +#SBATCH -A mpxxx + +source ./perlmutter_gpu.profile + +#OpenMP settings: +export OMP_NUM_THREADS=1 +export OMP_PLACES=threads +export OMP_PROC_BIND=spread +# Taken from WarpX +export MPICH_OFI_NIC_POLICY=GPU +GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1" + +#run the application: +srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf diff --git a/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile b/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile new file mode 100644 index 00000000..e371e849 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile @@ -0,0 +1,26 @@ +# required dependencies +module load gpu +module load PrgEnv-gnu +module load craype +module load craype-x86-milan +module load craype-accel-nvidia80 +module load cudatoolkit +module load cmake/3.24.3 + +# necessary to use CUDA-Aware MPI and run a job +export CRAY_ACCEL_TARGET=nvidia80 + +# optimize CUDA compilation for A100 +export AMREX_CUDA_ARCH=8.0 + +# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3) +# note: the cc/CC/ftn wrappers below add those +export CXXFLAGS="-march=znver3" +export CFLAGS="-march=znver3" + +# compiler environment hints +export CC=cc +export CXX=CC +export FC=ftn +export CUDACXX=$(which nvcc) +export CUDAHOSTCXX=CC diff --git a/ExampleCodes/MPMD/Case-2/GNUmakefile b/ExampleCodes/MPMD/Case-2/GNUmakefile new file mode 100644 index 00000000..a68d0980 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/GNUmakefile @@ -0,0 +1,20 @@ +AMREX_HOME ?= ../../../../amrex + +DEBUG = TRUE + +DIM = 3 + +COMP = gcc + +USE_MPI = TRUE + +USE_OMP = FALSE +USE_CUDA = FALSE +USE_HIP = FALSE + +include $(AMREX_HOME)/Tools/GNUMake/Make.defs + +include ./Make.package +include $(AMREX_HOME)/Src/Base/Make.package + +include $(AMREX_HOME)/Tools/GNUMake/Make.rules diff --git a/ExampleCodes/MPMD/Case-2/Make.package b/ExampleCodes/MPMD/Case-2/Make.package new file mode 100644 index 00000000..6b4b865e --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/Make.package @@ -0,0 +1 @@ +CEXE_sources += main.cpp diff --git a/ExampleCodes/MPMD/Case-2/main.cpp b/ExampleCodes/MPMD/Case-2/main.cpp new file mode 100644 index 00000000..20ecda64 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/main.cpp @@ -0,0 +1,70 @@ + +#include +#include +#include +#include +#include +#include + +int main(int argc, char* argv[]) +{ + // Initialize amrex::MPMD to establish communication across the two apps + MPI_Comm comm = amrex::MPMD::Initialize(argc, argv); + amrex::Initialize(argc,argv,true,comm); + { + amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n"; + // Number of data components at each grid point in the MultiFab + int ncomp = 2; + // how many grid cells in each direction over the problem domain + int n_cell = 32; + // how many grid cells are allowed in each direction over each box + int max_grid_size = 16; + //BoxArray -- Abstract Domain Setup + // integer vector indicating the lower coordindate bounds + amrex::IntVect dom_lo(0,0,0); + // integer vector indicating the upper coordindate bounds + amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1); + // box containing the coordinates of this domain + amrex::Box domain(dom_lo, dom_hi); + // will contain a list of boxes describing the problem domain + amrex::BoxArray ba(domain); + // chop the single grid into many small boxes + ba.maxSize(max_grid_size); + // Distribution Mapping + amrex::DistributionMapping dm(ba); + // Create an MPMD Copier that + // sends the BoxArray information to the other (python) application + auto copr = amrex::MPMD::Copier(ba,dm,true); + //Define MuliFab + amrex::MultiFab mf(ba, dm, ncomp, 0); + //Geometry -- Physical Properties for data on our domain + amrex::RealBox real_box ({0., 0., 0.}, {1. , 1., 1.}); + amrex::Geometry geom(domain, &real_box); + //Calculate Cell Sizes + amrex::GpuArray dx = geom.CellSizeArray(); //dx[0] = dx dx[1] = dy dx[2] = dz + //Fill only the first component of the MultiFab + for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){ + const amrex::Box& bx = mfi.validbox(); + const amrex::Array4& mf_array = mf.array(mfi); + + amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){ + + amrex::Real x = (i+0.5) * dx[0]; + amrex::Real y = (j+0.5) * dx[1]; + amrex::Real z = (k+0.5) * dx[2]; + amrex::Real r_squared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01; + + mf_array(i,j,k,0) = 1.0 + std::exp(-r_squared); + + }); + } + // Send ONLY the first populated MultiFab component to the other app + copr.send(mf,0,1); + // Receive ONLY the second MultiFab component from the other app + copr.recv(mf,1,1); + //Plot MultiFab Data + WriteSingleLevelPlotfile("plt_cpp_001", mf, {"comp0","comp1"}, geom, 0., 0); + } + amrex::Finalize(); + amrex::MPMD::Finalize(); +} diff --git a/ExampleCodes/MPMD/Case-2/main.py b/ExampleCodes/MPMD/Case-2/main.py new file mode 100644 index 00000000..542761f6 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/main.py @@ -0,0 +1,56 @@ +from mpi4py import MPI + +import amrex.space3d as amr + +# Initialize amrex::MPMD to establish communication across the two apps +# However, leverage MPMD_Initialize_without_split +# so that communication split can be performed using mpi4py.MPI +amr.MPMD_Initialize_without_split([]) +# Leverage MPI from mpi4py to perform communication split +app_comm_py = MPI.COMM_WORLD.Split(amr.MPMD_AppNum(), amr.MPMD_MyProc()) +# Initialize AMReX +amr.initialize_when_MPMD([], app_comm_py) + +amr.Print(f"Hello world from pyAMReX version {amr.__version__}\n") +# Create a MPMD Copier that gets the BoxArray information from the other (C++) app +copr = amr.MPMD_Copier(True) +# Number of data components at each grid point in the MultiFab +ncomp = 2 +# Define a MultiFab using the created MPMD_Copier +mf = amr.MultiFab(copr.box_array(), copr.distribution_map(), ncomp, 0) +mf.set_val(0.0) + +# Receive ONLY the FIRST MultiFab component populated in the other (C++) app +copr.recv(mf, 0, 1) + +# Fill the second MultiFab component based on the first component +for mfi in mf: + # Convert Array4 to numpy/cupy array + mf_array = mf.array(mfi).to_xp(copy=False, order="F") + mf_array[:, :, :, 1] = 10.0 * mf_array[:, :, :, 0] + +# Send ONLY the second MultiFab component to the other (C++) app +copr.send(mf, 1, 1) +# Plot MultiFab data +# HERE THE DOMAIN INFORMATION IS ONLY BEING UTILIZED TO SAVE A PLOTFILE +# How many grid cells in each direction over the problem domain +n_cell = 32 +# Integer vector indicating the lower coordinate bounds +dom_lo = amr.IntVect(0, 0, 0) +# Integer vector indicating the upper coordinate bounds +dom_hi = amr.IntVect(n_cell - 1, n_cell - 1, n_cell - 1) +# Box containing the coordinates of this domain +domain = amr.Box(dom_lo, dom_hi) +# Geometry: physical properties for data on our domain +real_box = amr.RealBox([0.0, 0.0, 0.0], [1.0, 1.0, 1.0]) +coord = 0 # Cartesian +is_per = [0, 0, 0] # periodicity +geom = amr.Geometry(domain, real_box, coord, is_per) +plotfile = amr.concatenate(root="plt_py_", num=1, mindigits=3) +varnames = amr.Vector_string(["comp0", "comp1"]) +amr.write_single_level_plotfile(plotfile, mf, varnames, geom, time=0.0, level_step=0) + +# Finalize AMReX +amr.finalize() +# Finalize AMReX::MPMD +amr.MPMD_Finalize() diff --git a/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf b/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf new file mode 100644 index 00000000..0a4d6f0f --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf @@ -0,0 +1,2 @@ +0-7 main3d.gnu.x86-milan.DEBUG.MPI.ex +8-11 python main.py diff --git a/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh b/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh new file mode 100644 index 00000000..236e1fa2 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh @@ -0,0 +1,22 @@ +#!/bin/bash +#SBATCH -N 1 # Total number of nodes +#SBATCH -n 12 # Total number of tasks +#SBATCH -c 4 # number of processors per MPI task +#SBATCH -C cpu +#SBATCH -q debug +#SBATCH -J mpmd_test +#SBATCH -t 00:05:00 +#SBATCH -A mpxxx + +# Activate the virtual environment +source /path/to/pyamrex-gpu/bin/activate + +#OpenMP settings: +export OMP_NUM_THREADS=1 +export OMP_PLACES=threads +export OMP_PROC_BIND=spread + +#run the application: +srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf + +deactivate diff --git a/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf b/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf new file mode 100644 index 00000000..d748c3ca --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf @@ -0,0 +1,2 @@ +0-7 main3d.gnu.DEBUG.MPI.CUDA.ex +8-11 python main.py diff --git a/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh new file mode 100644 index 00000000..571d6cf1 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh @@ -0,0 +1,27 @@ +#!/bin/bash +#SBATCH -N 3 # Total number of nodes +#SBATCH -n 12 # Total number of tasks +#SBATCH -c 4 # number of processors per MPI task +#SBATCH -C gpu +#SBATCH -G 12 # Total number of GPUs +#SBATCH -q debug +#SBATCH -J mpmd_test +#SBATCH -t 00:05:00 +#SBATCH -A mpxxx + +source ./perlmutter_gpu.profile +# Activate the virtual environment +source /path/to/pyamrex-gpu/bin/activate + +#OpenMP settings: +export OMP_NUM_THREADS=1 +export OMP_PLACES=threads +export OMP_PROC_BIND=spread +# Taken from WarpX +export MPICH_OFI_NIC_POLICY=GPU +GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1" + +#run the application: +srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf + +deactivate diff --git a/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile b/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile new file mode 100644 index 00000000..b720c1fc --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile @@ -0,0 +1,28 @@ +# required dependencies +module load gpu +module load PrgEnv-gnu +module load craype +module load craype-x86-milan +module load craype-accel-nvidia80 +module load cudatoolkit +module load cmake/3.24.3 +# Required for pyAMReX +module load cray-python/3.11.5 + +# necessary to use CUDA-Aware MPI and run a job +export CRAY_ACCEL_TARGET=nvidia80 + +# optimize CUDA compilation for A100 +export AMREX_CUDA_ARCH=8.0 + +# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3) +# note: the cc/CC/ftn wrappers below add those +export CXXFLAGS="-march=znver3" +export CFLAGS="-march=znver3" + +# compiler environment hints +export CC=cc +export CXX=CC +export FC=ftn +export CUDACXX=$(which nvcc) +export CUDAHOSTCXX=CC