From aa792b152256e3ac7222ec987ec7ea1e416af37a Mon Sep 17 00:00:00 2001 From: Bhargav Siddani Date: Thu, 9 May 2024 15:44:55 -0700 Subject: [PATCH 1/8] MPMD cases - first commit --- Docs/source/MPMD_Tutorials.rst | 117 ++++++++++++++++++ ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile | 20 +++ .../MPMD/Case-1/Source_1/Make.package | 1 + ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp | 69 +++++++++++ ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile | 20 +++ .../MPMD/Case-1/Source_2/Make.package | 1 + ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp | 62 ++++++++++ ExampleCodes/MPMD/Case-2/GNUmakefile | 20 +++ ExampleCodes/MPMD/Case-2/Make.package | 1 + ExampleCodes/MPMD/Case-2/main.cpp | 70 +++++++++++ ExampleCodes/MPMD/Case-2/main.py | 56 +++++++++ 11 files changed, 437 insertions(+) create mode 100644 Docs/source/MPMD_Tutorials.rst create mode 100644 ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile create mode 100644 ExampleCodes/MPMD/Case-1/Source_1/Make.package create mode 100644 ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp create mode 100644 ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile create mode 100644 ExampleCodes/MPMD/Case-1/Source_2/Make.package create mode 100644 ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp create mode 100644 ExampleCodes/MPMD/Case-2/GNUmakefile create mode 100644 ExampleCodes/MPMD/Case-2/Make.package create mode 100644 ExampleCodes/MPMD/Case-2/main.cpp create mode 100644 ExampleCodes/MPMD/Case-2/main.py diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst new file mode 100644 index 00000000..464211ad --- /dev/null +++ b/Docs/source/MPMD_Tutorials.rst @@ -0,0 +1,117 @@ +AMReX-MPMD +========== + +AMReX-MPMD utilizes the Multiple Program Multiple Data (MPMD) feature of MPI to provide cross-functionality for AMReX-based codes. +The framework enables data transfer across two different applications through **MPMD::Copier** class, which takes **BoxArray** of its application as an argument. +**Copier** instances created in both the applications together identify the overlapping cells for which the data transfer must occur. +**Copier::send** & **Copier::recv** functions, which take a **MultiFab** as an argument, are used to transfer the desired data of overlapping regions. + +Case-1 +------ + +The current case demonstrates the MPMD capability across two C++ applications. + +Contents +^^^^^^^^ + +**Source_1** subfolder contains ``main_1.cpp`` which will be treated as the first application. +Similarly, **Source_2** subfolder contains ``main_2.cpp`` that will be treated as the second application. + +Overview +^^^^^^^^ + +The domain in ``main_1.cpp`` is set to ``lo = {0, 0, 0}`` and ``hi = {31, 31, 31}``, while the domain in ``main_2.cpp`` is set to ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``. +Hence, the data transfer will occur for the region ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``. +The data transfer demonstration is performed using a two component *MultiFab*. +The first component is populated in ``main_1.cpp`` before it is transferred to ``main_2.cpp``. +The second component is populated in ``main_2.cpp`` based on the received first component. +Finally, the second component is transferred from ``main_2.cpp`` to ``main_1.cpp``. +It can be seen from the plotfile generated by ``main_1.cpp`` that the second component is non-zero only for the overlapping region. + +Compile +^^^^^^^ + +The compile process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-1/``. + +.. code-block:: bash + + # cd into Source_1 to compile the first application + cd Source_1/ + # Include USE_CUDA=TRUE for CUDA GPUs + make USE_MPI=TRUE + + # cd into Source_2 to compile the second application + cd ../Source_2/ + # Include USE_CUDA=TRUE for CUDA GPUs + make USE_MPI=TRUE + +Run +^^^ + +Here, the current case is being run using a total of 12 MPI ranks with 8 allocated to ``main_1.cpp`` and the rest for ``main_2.cpp``. +Please note that MPI ranks attributed to each application/code need to be continuous, i.e., MPI ranks 0-7 are for ``main_1.cpp`` and 8-11 are for ``main_2.cpp``. +This may be default behaviour on several systems. +Furthermore, the run process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-1/``. + +.. code-block:: bash + + # Running the MPMD process with 12 ranks + mpirun -np 8 Source_1/main3d.gnu.DEBUG.MPI.ex : -np 4 Source_2/main3d.gnu.DEBUG.MPI.ex + +Case-2 +------ + +The current case demonstrates the MPMD capability across C++ and python applications. +This language interoperability is achieved through the python bindings of AMReX, `pyAMReX `_. + +Contents +^^^^^^^^ + +``main.cpp`` will be the C++ application and ``main.py`` will be the python application. + +Overview +^^^^^^^^ + +In the previous case (Case-1) of MPMD each application has its own domain, and, therefore, different **BoxArray**. +However, there exist scenarios where both applications deal with the same **BoxArray**. +The current case presents such a scenario where the **BoxArray** is defined only in the ``main.cpp`` application, but this information is relayed to ``main.py`` application through the **MPMD::Copier**. + +**Please ensure that the same AMReX source code is used to compile both the applications.** + +pyAMReX compile +^^^^^^^^^^^^^^^ +The compile process for pyAMReX is only briefly described here. +Please refer to the `pyAMReX documentation `_ for more details. +It must be mentioned that **mpi4py** is an important dependency. + +.. code-block:: bash + + # find dependencies & configure + # Include -DAMReX_GPU_BACKEND=CUDA for gpu version + cmake -S . -B build -DAMReX_SPACEDIM="1;2;3" -DAMReX_MPI=ON -DpyAMReX_amrex_src=/path/to/amrex + + # compile & install, here we use four threads + cmake --build build -j 4 --target pip_install + +main.cpp compile +^^^^^^^^^^^^^^^^ + +The compile process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``. + +.. code-block:: bash + + # Include USE_CUDA=TRUE for CUDA GPUs + make USE_MPI=TRUE + +Run +^^^ + +Here, the current case is being run using a total of 12 MPI ranks with 8 allocated to ``main.cpp`` and the rest for ``main.py``. +As mentioned earlier, the MPI ranks attributed to each application/code need to be continuous, i.e., MPI ranks 0-7 are for ``main.cpp`` and 8-11 are for ``main.py``. +This may be default behaviour on several systems. +Furthermore, the run process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``. + +.. code-block:: bash + + # Running the MPMD process with 12 ranks + mpirun -np 8 ./main3d.gnu.DEBUG.MPI.ex : -np 4 python main.py diff --git a/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile b/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile new file mode 100644 index 00000000..b6b95e7b --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile @@ -0,0 +1,20 @@ +AMREX_HOME ?= ../../../../../amrex + +DEBUG = TRUE + +DIM = 3 + +COMP = gcc + +USE_MPI = TRUE + +USE_OMP = FALSE +USE_CUDA = FALSE +USE_HIP = FALSE + +include $(AMREX_HOME)/Tools/GNUMake/Make.defs + +include ./Make.package +include $(AMREX_HOME)/Src/Base/Make.package + +include $(AMREX_HOME)/Tools/GNUMake/Make.rules diff --git a/ExampleCodes/MPMD/Case-1/Source_1/Make.package b/ExampleCodes/MPMD/Case-1/Source_1/Make.package new file mode 100644 index 00000000..16bde740 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_1/Make.package @@ -0,0 +1 @@ +CEXE_sources += main_1.cpp diff --git a/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp b/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp new file mode 100644 index 00000000..8097faef --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp @@ -0,0 +1,69 @@ + +#include +#include +#include +#include +#include +#include + +int main(int argc, char* argv[]) +{ + // Initialize amrex::MPMD to establish communication across the two apps + MPI_Comm comm = amrex::MPMD::Initialize(argc, argv); + amrex::Initialize(argc,argv,true,comm); + { + amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n"; + // Number of data components at each grid point in the MultiFab + int ncomp = 2; + // how many grid cells in each direction over the problem domain + int n_cell = 32; + // how many grid cells are allowed in each direction over each box + int max_grid_size = 16; + //BoxArray -- Abstract Domain Setup + // integer vector indicating the lower coordindate bounds + amrex::IntVect dom_lo(0,0,0); + // integer vector indicating the upper coordindate bounds + amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1); + // box containing the coordinates of this domain + amrex::Box domain(dom_lo, dom_hi); + // will contain a list of boxes describing the problem domain + amrex::BoxArray ba(domain); + // chop the single grid into many small boxes + ba.maxSize(max_grid_size); + // Distribution Mapping + amrex::DistributionMapping dm(ba); + // Create an MPMD Copier based on current ba & dm + auto copr = amrex::MPMD::Copier(ba,dm,false); + //Define MuliFab + amrex::MultiFab mf(ba, dm, ncomp, 0); + //Geometry -- Physical Properties for data on our domain + amrex::RealBox real_box ({0., 0., 0.}, {1. , 1., 1.}); + amrex::Geometry geom(domain, &real_box); + //Calculate Cell Sizes + amrex::GpuArray dx = geom.CellSizeArray(); //dx[0] = dx dx[1] = dy dx[2] = dz + //Fill only the first component of the MultiFab + for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){ + const amrex::Box& bx = mfi.validbox(); + const amrex::Array4& mf_array = mf.array(mfi); + + amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){ + + amrex::Real x = (i+0.5) * dx[0]; + amrex::Real y = (j+0.5) * dx[1]; + amrex::Real z = (k+0.5) * dx[2]; + amrex::Real r_squared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01; + + mf_array(i,j,k,0) = 1.0 + std::exp(-r_squared); + + }); + } + // Send ONLY the first populated MultiFab component to main_2.cpp + copr.send(mf,0,1); + // Receive ONLY the second MultiFab component from main_2.cpp + copr.recv(mf,1,1); + //Plot MultiFab Data + WriteSingleLevelPlotfile("plt_cpp_1", mf, {"comp0","comp1"}, geom, 0., 0); + } + amrex::Finalize(); + amrex::MPMD::Finalize(); +} diff --git a/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile b/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile new file mode 100644 index 00000000..b6b95e7b --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile @@ -0,0 +1,20 @@ +AMREX_HOME ?= ../../../../../amrex + +DEBUG = TRUE + +DIM = 3 + +COMP = gcc + +USE_MPI = TRUE + +USE_OMP = FALSE +USE_CUDA = FALSE +USE_HIP = FALSE + +include $(AMREX_HOME)/Tools/GNUMake/Make.defs + +include ./Make.package +include $(AMREX_HOME)/Src/Base/Make.package + +include $(AMREX_HOME)/Tools/GNUMake/Make.rules diff --git a/ExampleCodes/MPMD/Case-1/Source_2/Make.package b/ExampleCodes/MPMD/Case-1/Source_2/Make.package new file mode 100644 index 00000000..b4266b6f --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_2/Make.package @@ -0,0 +1 @@ +CEXE_sources += main_2.cpp diff --git a/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp b/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp new file mode 100644 index 00000000..847421fe --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp @@ -0,0 +1,62 @@ + +#include +#include +#include +#include +#include +#include + +int main(int argc, char* argv[]) +{ + // Initialize amrex::MPMD to establish communication across the two apps + MPI_Comm comm = amrex::MPMD::Initialize(argc, argv); + amrex::Initialize(argc,argv,true,comm); + { + amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n"; + // Number of data components at each grid point in the MultiFab + int ncomp = 2; + // how many grid cells in each direction over the problem domain + int n_cell = 32; + // how many grid cells are allowed in each direction over each box + int max_grid_size = 8; + //BoxArray -- Abstract Domain Setup + // integer vector indicating the lower coordindate bounds + amrex::IntVect dom_lo(n_cell/2, n_cell/2, n_cell/2); + // integer vector indicating the upper coordindate bounds + amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1); + // box containing the coordinates of this domain + amrex::Box domain(dom_lo, dom_hi); + // will contain a list of boxes describing the problem domain + amrex::BoxArray ba(domain); + // chop the single grid into many small boxes + ba.maxSize(max_grid_size); + // Distribution Mapping + amrex::DistributionMapping dm(ba); + // Create an MPMD Copier based on current ba & dm + auto copr = amrex::MPMD::Copier(ba,dm,false); + //Define MuliFab + amrex::MultiFab mf(ba, dm, ncomp, 0); + //Geometry -- Physical Properties for data on our domain + amrex::RealBox real_box ({0.5, 0.5, 0.5}, {1. , 1., 1.}); + amrex::Geometry geom(domain, &real_box); + // Receive ONLY the first populated MultiFab component from main_1.cpp + copr.recv(mf,0,1); + //Fill the second component of the MultiFab + for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){ + const amrex::Box& bx = mfi.validbox(); + const amrex::Array4& mf_array = mf.array(mfi); + + amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){ + + mf_array(i,j,k,1) = amrex::Real(10.)*mf_array(i,j,k,0); + + }); + } + // Send ONLY the second MultiFab component (populated here) to main_1.cpp + copr.send(mf,1,1); + //Plot MultiFab Data + WriteSingleLevelPlotfile("plt_cpp_2", mf, {"comp0","comp1"}, geom, 0., 0); + } + amrex::Finalize(); + amrex::MPMD::Finalize(); +} diff --git a/ExampleCodes/MPMD/Case-2/GNUmakefile b/ExampleCodes/MPMD/Case-2/GNUmakefile new file mode 100644 index 00000000..a68d0980 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/GNUmakefile @@ -0,0 +1,20 @@ +AMREX_HOME ?= ../../../../amrex + +DEBUG = TRUE + +DIM = 3 + +COMP = gcc + +USE_MPI = TRUE + +USE_OMP = FALSE +USE_CUDA = FALSE +USE_HIP = FALSE + +include $(AMREX_HOME)/Tools/GNUMake/Make.defs + +include ./Make.package +include $(AMREX_HOME)/Src/Base/Make.package + +include $(AMREX_HOME)/Tools/GNUMake/Make.rules diff --git a/ExampleCodes/MPMD/Case-2/Make.package b/ExampleCodes/MPMD/Case-2/Make.package new file mode 100644 index 00000000..6b4b865e --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/Make.package @@ -0,0 +1 @@ +CEXE_sources += main.cpp diff --git a/ExampleCodes/MPMD/Case-2/main.cpp b/ExampleCodes/MPMD/Case-2/main.cpp new file mode 100644 index 00000000..20ecda64 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/main.cpp @@ -0,0 +1,70 @@ + +#include +#include +#include +#include +#include +#include + +int main(int argc, char* argv[]) +{ + // Initialize amrex::MPMD to establish communication across the two apps + MPI_Comm comm = amrex::MPMD::Initialize(argc, argv); + amrex::Initialize(argc,argv,true,comm); + { + amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n"; + // Number of data components at each grid point in the MultiFab + int ncomp = 2; + // how many grid cells in each direction over the problem domain + int n_cell = 32; + // how many grid cells are allowed in each direction over each box + int max_grid_size = 16; + //BoxArray -- Abstract Domain Setup + // integer vector indicating the lower coordindate bounds + amrex::IntVect dom_lo(0,0,0); + // integer vector indicating the upper coordindate bounds + amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1); + // box containing the coordinates of this domain + amrex::Box domain(dom_lo, dom_hi); + // will contain a list of boxes describing the problem domain + amrex::BoxArray ba(domain); + // chop the single grid into many small boxes + ba.maxSize(max_grid_size); + // Distribution Mapping + amrex::DistributionMapping dm(ba); + // Create an MPMD Copier that + // sends the BoxArray information to the other (python) application + auto copr = amrex::MPMD::Copier(ba,dm,true); + //Define MuliFab + amrex::MultiFab mf(ba, dm, ncomp, 0); + //Geometry -- Physical Properties for data on our domain + amrex::RealBox real_box ({0., 0., 0.}, {1. , 1., 1.}); + amrex::Geometry geom(domain, &real_box); + //Calculate Cell Sizes + amrex::GpuArray dx = geom.CellSizeArray(); //dx[0] = dx dx[1] = dy dx[2] = dz + //Fill only the first component of the MultiFab + for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){ + const amrex::Box& bx = mfi.validbox(); + const amrex::Array4& mf_array = mf.array(mfi); + + amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){ + + amrex::Real x = (i+0.5) * dx[0]; + amrex::Real y = (j+0.5) * dx[1]; + amrex::Real z = (k+0.5) * dx[2]; + amrex::Real r_squared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01; + + mf_array(i,j,k,0) = 1.0 + std::exp(-r_squared); + + }); + } + // Send ONLY the first populated MultiFab component to the other app + copr.send(mf,0,1); + // Receive ONLY the second MultiFab component from the other app + copr.recv(mf,1,1); + //Plot MultiFab Data + WriteSingleLevelPlotfile("plt_cpp_001", mf, {"comp0","comp1"}, geom, 0., 0); + } + amrex::Finalize(); + amrex::MPMD::Finalize(); +} diff --git a/ExampleCodes/MPMD/Case-2/main.py b/ExampleCodes/MPMD/Case-2/main.py new file mode 100644 index 00000000..542761f6 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/main.py @@ -0,0 +1,56 @@ +from mpi4py import MPI + +import amrex.space3d as amr + +# Initialize amrex::MPMD to establish communication across the two apps +# However, leverage MPMD_Initialize_without_split +# so that communication split can be performed using mpi4py.MPI +amr.MPMD_Initialize_without_split([]) +# Leverage MPI from mpi4py to perform communication split +app_comm_py = MPI.COMM_WORLD.Split(amr.MPMD_AppNum(), amr.MPMD_MyProc()) +# Initialize AMReX +amr.initialize_when_MPMD([], app_comm_py) + +amr.Print(f"Hello world from pyAMReX version {amr.__version__}\n") +# Create a MPMD Copier that gets the BoxArray information from the other (C++) app +copr = amr.MPMD_Copier(True) +# Number of data components at each grid point in the MultiFab +ncomp = 2 +# Define a MultiFab using the created MPMD_Copier +mf = amr.MultiFab(copr.box_array(), copr.distribution_map(), ncomp, 0) +mf.set_val(0.0) + +# Receive ONLY the FIRST MultiFab component populated in the other (C++) app +copr.recv(mf, 0, 1) + +# Fill the second MultiFab component based on the first component +for mfi in mf: + # Convert Array4 to numpy/cupy array + mf_array = mf.array(mfi).to_xp(copy=False, order="F") + mf_array[:, :, :, 1] = 10.0 * mf_array[:, :, :, 0] + +# Send ONLY the second MultiFab component to the other (C++) app +copr.send(mf, 1, 1) +# Plot MultiFab data +# HERE THE DOMAIN INFORMATION IS ONLY BEING UTILIZED TO SAVE A PLOTFILE +# How many grid cells in each direction over the problem domain +n_cell = 32 +# Integer vector indicating the lower coordinate bounds +dom_lo = amr.IntVect(0, 0, 0) +# Integer vector indicating the upper coordinate bounds +dom_hi = amr.IntVect(n_cell - 1, n_cell - 1, n_cell - 1) +# Box containing the coordinates of this domain +domain = amr.Box(dom_lo, dom_hi) +# Geometry: physical properties for data on our domain +real_box = amr.RealBox([0.0, 0.0, 0.0], [1.0, 1.0, 1.0]) +coord = 0 # Cartesian +is_per = [0, 0, 0] # periodicity +geom = amr.Geometry(domain, real_box, coord, is_per) +plotfile = amr.concatenate(root="plt_py_", num=1, mindigits=3) +varnames = amr.Vector_string(["comp0", "comp1"]) +amr.write_single_level_plotfile(plotfile, mf, varnames, geom, time=0.0, level_step=0) + +# Finalize AMReX +amr.finalize() +# Finalize AMReX::MPMD +amr.MPMD_Finalize() From 1fde1ab5985ebecc2c68a51a58cb3439b884d5ca Mon Sep 17 00:00:00 2001 From: Bhargav Sriram Siddani Date: Tue, 14 May 2024 17:11:48 -0700 Subject: [PATCH 2/8] AMReX-MPMD - scripts for Perlmutter --- Docs/source/MPMD_Tutorials.rst | 63 ++++++++++++++++++- Docs/source/index.rst | 4 ++ ExampleCodes/MPMD/Case-1/mpmd_cpu.conf | 2 + ExampleCodes/MPMD/Case-1/mpmd_cpu.sh | 17 +++++ ExampleCodes/MPMD/Case-1/mpmd_gpu.conf | 2 + ExampleCodes/MPMD/Case-1/mpmd_gpu.sh | 23 +++++++ .../MPMD/Case-1/perlmutter_gpu.profile | 26 ++++++++ ExampleCodes/MPMD/Case-2/mpmd_cpu.conf | 2 + ExampleCodes/MPMD/Case-2/mpmd_cpu.sh | 22 +++++++ ExampleCodes/MPMD/Case-2/mpmd_gpu.conf | 2 + ExampleCodes/MPMD/Case-2/mpmd_gpu.sh | 27 ++++++++ .../MPMD/Case-2/perlmutter_gpu.profile | 28 +++++++++ 12 files changed, 216 insertions(+), 2 deletions(-) create mode 100644 ExampleCodes/MPMD/Case-1/mpmd_cpu.conf create mode 100644 ExampleCodes/MPMD/Case-1/mpmd_cpu.sh create mode 100644 ExampleCodes/MPMD/Case-1/mpmd_gpu.conf create mode 100644 ExampleCodes/MPMD/Case-1/mpmd_gpu.sh create mode 100644 ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile create mode 100644 ExampleCodes/MPMD/Case-2/mpmd_cpu.conf create mode 100644 ExampleCodes/MPMD/Case-2/mpmd_cpu.sh create mode 100644 ExampleCodes/MPMD/Case-2/mpmd_gpu.conf create mode 100644 ExampleCodes/MPMD/Case-2/mpmd_gpu.sh create mode 100644 ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst index 464211ad..d6be28f3 100644 --- a/Docs/source/MPMD_Tutorials.rst +++ b/Docs/source/MPMD_Tutorials.rst @@ -1,8 +1,10 @@ +.. _tutorials_mpmd: + AMReX-MPMD ========== AMReX-MPMD utilizes the Multiple Program Multiple Data (MPMD) feature of MPI to provide cross-functionality for AMReX-based codes. -The framework enables data transfer across two different applications through **MPMD::Copier** class, which takes **BoxArray** of its application as an argument. +The framework enables data transfer across two different applications through **MPMD::Copier** class, which typically takes **BoxArray** of its application as an argument. **Copier** instances created in both the applications together identify the overlapping cells for which the data transfer must occur. **Copier::send** & **Copier::recv** functions, which take a **MultiFab** as an argument, are used to transfer the desired data of overlapping regions. @@ -22,6 +24,8 @@ Overview The domain in ``main_1.cpp`` is set to ``lo = {0, 0, 0}`` and ``hi = {31, 31, 31}``, while the domain in ``main_2.cpp`` is set to ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``. Hence, the data transfer will occur for the region ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``. +Furthermore, the domain in ``main_1.cpp`` is split into boxes using ``max_grid_size=16``, while the domain in ``main_2.cpp`` is split using ``max_grid_size=8``. +Therefore, the **BoxArray**, and moreover, the number of boxes for the overlapping region are different across the two applications. The data transfer demonstration is performed using a two component *MultiFab*. The first component is populated in ``main_1.cpp`` before it is transferred to ``main_2.cpp``. The second component is populated in ``main_2.cpp`` based on the received first component. @@ -58,6 +62,19 @@ Furthermore, the run process here assumes that the current working directory is # Running the MPMD process with 12 ranks mpirun -np 8 Source_1/main3d.gnu.DEBUG.MPI.ex : -np 4 Source_2/main3d.gnu.DEBUG.MPI.ex +Running on Perlmutter (NERSC) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This section presents information regarding sample scripts that can be used to run this case on `Perlmutter (NERSC) `_. +The scripts ``mpmd_cpu.sh`` and ``mpmd_cpu.conf`` can be used to run the CPU version. +Similarly, ``mpmd_gpu.sh`` and ``mpmd_gpu.conf`` can be used to run the GPU version. +Please note that ``perlmutter_gpu.profile`` must be leveraged to compile the GPU version of the applications. + +The content presented here is based on the following references: + + * `NERSC documentation `_ + * `WarpX documentation `_ + Case-2 ------ @@ -72,7 +89,7 @@ Contents Overview ^^^^^^^^ -In the previous case (Case-1) of MPMD each application has its own domain, and, therefore, different **BoxArray**. +In the previous case (Case-1) of MPMD each application has its own domain, and therefore, different **BoxArray**. However, there exist scenarios where both applications deal with the same **BoxArray**. The current case presents such a scenario where the **BoxArray** is defined only in the ``main.cpp`` application, but this information is relayed to ``main.py`` application through the **MPMD::Copier**. @@ -115,3 +132,45 @@ Furthermore, the run process here assumes that the current working directory is # Running the MPMD process with 12 ranks mpirun -np 8 ./main3d.gnu.DEBUG.MPI.ex : -np 4 python main.py + +Running on Perlmutter (NERSC) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Running this case on perlmutter involves creating a python virtual environment. +pyAMReX must be compiled and installed into this virtual environment after its creation. +Similar to the previous case, this case also has supporting scripts to run on CPUs and GPUs. + +Creating a virtual environment +"""""""""""""""""""""""""""""" + +.. code-block:: bash + + # Setup the required environment variables + source perlmutter_gpu.profile + + # BEFORE PERFORMING THE FOLLOWING COMMANDS + # MOVE TO A DIRECTORY WHERE THE PYTHON VIRTUAL ENVIRONMENT MUST EXIST + + python3 -m pip install --upgrade pip + python3 -m pip install --upgrade virtualenv + python3 -m pip cache purge + python3 -m venv pyamrex-gpu + source pyamrex-gpu/bin/activate + python3 -m pip install --upgrade pip + python3 -m pip install --upgrade build + python3 -m pip install --upgrade packaging + python3 -m pip install --upgrade wheel + python3 -m pip install --upgrade setuptools + python3 -m pip install --upgrade cython + python3 -m pip install --upgrade numpy + python3 -m pip install --upgrade pandas + python3 -m pip install --upgrade scipy + MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py + python3 -m pip install --upgrade openpmd-api + python3 -m pip install --upgrade matplotlib + python3 -m pip install --upgrade yt + python3 -m pip install --upgrade cupy-cuda12x # CUDA 12 compatible wheel + +The content presented here is based on the following reference: + + * `WarpX documentation `_ diff --git a/Docs/source/index.rst b/Docs/source/index.rst index 2674a601..374e1d04 100644 --- a/Docs/source/index.rst +++ b/Docs/source/index.rst @@ -51,6 +51,7 @@ sorted by the following categories: - :ref:`heFFTe` -- heFFTe distributed tutorials. - :ref:`Linear Solvers` -- Examples of several linear solvers. - :ref:`ML/PYTORCH` -- Use of pytorch models to replace point-wise computational kernels. +- :ref:`MPMD` -- Usage of AMReX-MPMD (Multiple Program Multiple Data) framework. - :ref:`MUI` -- Incorporates the MxUI/MUI (Multiscale Universal interface) frame into AMReX. - :ref:`Particles` -- Basic usage of AMReX's particle data structures. - :ref:`Python` -- Using AMReX and interfacing with AMReX applications form Python - via `pyAMReX `__ @@ -75,6 +76,7 @@ sorted by the following categories: heFFTe_Tutorial LinearSolvers_Tutorial ML_Tutorial + MPMD_Tutorials MUI_Tutorial Particles_Tutorial Python_Tutorial @@ -102,6 +104,8 @@ sorted by the following categories: .. _`Linear Solvers`: LinearSolvers_Tutorial.html +.. _`MPMD`: MPMD_Tutorials.html + .. _`MUI`: MUI_Tutorial.html .. _`Particles`: Particles_Tutorial.html diff --git a/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf b/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf new file mode 100644 index 00000000..cdb07f28 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf @@ -0,0 +1,2 @@ +0-7 ./Source_1/main3d.gnu.x86-milan.DEBUG.MPI.ex +8-11 ./Source_2/main3d.gnu.x86-milan.DEBUG.MPI.ex diff --git a/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh new file mode 100644 index 00000000..d7ed7832 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh @@ -0,0 +1,17 @@ +#!/bin/bash +#SBATCH -N 1 # Total number of nodes +#SBATCH -n 12 # Total number of tasks +#SBATCH -c 4 # number of processors per MPI task +#SBATCH -C cpu +#SBATCH -q debug +#SBATCH -J mpmd_test +#SBATCH -t 00:05:00 +#SBATCH -A mpxxx + +#OpenMP settings: +export OMP_NUM_THREADS=1 +export OMP_PLACES=threads +export OMP_PROC_BIND=spread + +#run the application: +srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf b/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf new file mode 100644 index 00000000..6cc3f336 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf @@ -0,0 +1,2 @@ +0-7 ./Source_1/main3d.gnu.DEBUG.MPI.CUDA.ex +8-11 ./Source_2/main3d.gnu.DEBUG.MPI.CUDA.ex diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh new file mode 100644 index 00000000..89b88a99 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh @@ -0,0 +1,23 @@ +#!/bin/bash +#SBATCH -N 3 # Total number of nodes +#SBATCH -n 12 # Total number of tasks +#SBATCH -c 4 # number of processors per MPI task +#SBATCH -C gpu +#SBATCH -G 12 # Total number of GPUs +#SBATCH -q debug +#SBATCH -J mpmd_test +#SBATCH -t 00:05:00 +#SBATCH -A mpxxx + +source ./perlmutter_gpu.profile + +#OpenMP settings: +export OMP_NUM_THREADS=1 +export OMP_PLACES=threads +export OMP_PROC_BIND=spread +# Taken from WarpX +export MPICH_OFI_NIC_POLICY=GPU +GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1" + +#run the application: +srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf diff --git a/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile b/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile new file mode 100644 index 00000000..e371e849 --- /dev/null +++ b/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile @@ -0,0 +1,26 @@ +# required dependencies +module load gpu +module load PrgEnv-gnu +module load craype +module load craype-x86-milan +module load craype-accel-nvidia80 +module load cudatoolkit +module load cmake/3.24.3 + +# necessary to use CUDA-Aware MPI and run a job +export CRAY_ACCEL_TARGET=nvidia80 + +# optimize CUDA compilation for A100 +export AMREX_CUDA_ARCH=8.0 + +# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3) +# note: the cc/CC/ftn wrappers below add those +export CXXFLAGS="-march=znver3" +export CFLAGS="-march=znver3" + +# compiler environment hints +export CC=cc +export CXX=CC +export FC=ftn +export CUDACXX=$(which nvcc) +export CUDAHOSTCXX=CC diff --git a/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf b/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf new file mode 100644 index 00000000..0a4d6f0f --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf @@ -0,0 +1,2 @@ +0-7 main3d.gnu.x86-milan.DEBUG.MPI.ex +8-11 python main.py diff --git a/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh b/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh new file mode 100644 index 00000000..236e1fa2 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh @@ -0,0 +1,22 @@ +#!/bin/bash +#SBATCH -N 1 # Total number of nodes +#SBATCH -n 12 # Total number of tasks +#SBATCH -c 4 # number of processors per MPI task +#SBATCH -C cpu +#SBATCH -q debug +#SBATCH -J mpmd_test +#SBATCH -t 00:05:00 +#SBATCH -A mpxxx + +# Activate the virtual environment +source /path/to/pyamrex-gpu/bin/activate + +#OpenMP settings: +export OMP_NUM_THREADS=1 +export OMP_PLACES=threads +export OMP_PROC_BIND=spread + +#run the application: +srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf + +deactivate diff --git a/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf b/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf new file mode 100644 index 00000000..d748c3ca --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf @@ -0,0 +1,2 @@ +0-7 main3d.gnu.DEBUG.MPI.CUDA.ex +8-11 python main.py diff --git a/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh new file mode 100644 index 00000000..571d6cf1 --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh @@ -0,0 +1,27 @@ +#!/bin/bash +#SBATCH -N 3 # Total number of nodes +#SBATCH -n 12 # Total number of tasks +#SBATCH -c 4 # number of processors per MPI task +#SBATCH -C gpu +#SBATCH -G 12 # Total number of GPUs +#SBATCH -q debug +#SBATCH -J mpmd_test +#SBATCH -t 00:05:00 +#SBATCH -A mpxxx + +source ./perlmutter_gpu.profile +# Activate the virtual environment +source /path/to/pyamrex-gpu/bin/activate + +#OpenMP settings: +export OMP_NUM_THREADS=1 +export OMP_PLACES=threads +export OMP_PROC_BIND=spread +# Taken from WarpX +export MPICH_OFI_NIC_POLICY=GPU +GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1" + +#run the application: +srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf + +deactivate diff --git a/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile b/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile new file mode 100644 index 00000000..b720c1fc --- /dev/null +++ b/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile @@ -0,0 +1,28 @@ +# required dependencies +module load gpu +module load PrgEnv-gnu +module load craype +module load craype-x86-milan +module load craype-accel-nvidia80 +module load cudatoolkit +module load cmake/3.24.3 +# Required for pyAMReX +module load cray-python/3.11.5 + +# necessary to use CUDA-Aware MPI and run a job +export CRAY_ACCEL_TARGET=nvidia80 + +# optimize CUDA compilation for A100 +export AMREX_CUDA_ARCH=8.0 + +# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3) +# note: the cc/CC/ftn wrappers below add those +export CXXFLAGS="-march=znver3" +export CFLAGS="-march=znver3" + +# compiler environment hints +export CC=cc +export CXX=CC +export FC=ftn +export CUDACXX=$(which nvcc) +export CUDAHOSTCXX=CC From fdeb926f1b012a7987b364a346f929bc2cf8c68d Mon Sep 17 00:00:00 2001 From: Bhargav Sriram Siddani Date: Wed, 15 May 2024 12:12:57 -0700 Subject: [PATCH 3/8] MPMD - mpi4py install info in doc --- Docs/source/MPMD_Tutorials.rst | 3 ++- ExampleCodes/MPMD/Case-1/mpmd_gpu.sh | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst index d6be28f3..54d6473f 100644 --- a/Docs/source/MPMD_Tutorials.rst +++ b/Docs/source/MPMD_Tutorials.rst @@ -99,7 +99,8 @@ pyAMReX compile ^^^^^^^^^^^^^^^ The compile process for pyAMReX is only briefly described here. Please refer to the `pyAMReX documentation `_ for more details. -It must be mentioned that **mpi4py** is an important dependency. +**It must be mentioned that mpi4py is an important dependency that should exist in the utilized environment**. +For pyAMReX conda environment, it can be installed using ``conda install -c conda-forge mpi4py``, `resource `_. .. code-block:: bash diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh index 89b88a99..f45611a3 100644 --- a/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh +++ b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh @@ -9,7 +9,7 @@ #SBATCH -t 00:05:00 #SBATCH -A mpxxx -source ./perlmutter_gpu.profile +source ./perlmutter_gpu.profile #OpenMP settings: export OMP_NUM_THREADS=1 From bc8773157d80f12b73b4a0d81dec4edb0af34e2f Mon Sep 17 00:00:00 2001 From: Andy Nonaka Date: Thu, 16 May 2024 14:48:41 -0700 Subject: [PATCH 4/8] Update Docs/source/MPMD_Tutorials.rst --- Docs/source/MPMD_Tutorials.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst index 54d6473f..1f179c41 100644 --- a/Docs/source/MPMD_Tutorials.rst +++ b/Docs/source/MPMD_Tutorials.rst @@ -95,6 +95,7 @@ The current case presents such a scenario where the **BoxArray** is defined only **Please ensure that the same AMReX source code is used to compile both the applications.** +Compiling and Running on a local system pyAMReX compile ^^^^^^^^^^^^^^^ The compile process for pyAMReX is only briefly described here. From 5e84ceb2c74443a42f4985676c8139be4ca40403 Mon Sep 17 00:00:00 2001 From: Andy Nonaka Date: Thu, 16 May 2024 14:48:49 -0700 Subject: [PATCH 5/8] Update Docs/source/MPMD_Tutorials.rst --- Docs/source/MPMD_Tutorials.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst index 1f179c41..2ec2e467 100644 --- a/Docs/source/MPMD_Tutorials.rst +++ b/Docs/source/MPMD_Tutorials.rst @@ -135,7 +135,7 @@ Furthermore, the run process here assumes that the current working directory is # Running the MPMD process with 12 ranks mpirun -np 8 ./main3d.gnu.DEBUG.MPI.ex : -np 4 python main.py -Running on Perlmutter (NERSC) +Compiling and Running on Perlmutter (NERSC) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Running this case on perlmutter involves creating a python virtual environment. From 73d722bd1aec0ca03badf98f71969c5b9363c31e Mon Sep 17 00:00:00 2001 From: Andy Nonaka Date: Thu, 16 May 2024 14:50:49 -0700 Subject: [PATCH 6/8] Update Docs/source/MPMD_Tutorials.rst --- Docs/source/MPMD_Tutorials.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst index 2ec2e467..5a76ef97 100644 --- a/Docs/source/MPMD_Tutorials.rst +++ b/Docs/source/MPMD_Tutorials.rst @@ -96,8 +96,7 @@ The current case presents such a scenario where the **BoxArray** is defined only **Please ensure that the same AMReX source code is used to compile both the applications.** Compiling and Running on a local system -pyAMReX compile -^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The compile process for pyAMReX is only briefly described here. Please refer to the `pyAMReX documentation `_ for more details. **It must be mentioned that mpi4py is an important dependency that should exist in the utilized environment**. From 23f64b3098c3c31f9e61672c98c70e220d14fc93 Mon Sep 17 00:00:00 2001 From: Andy Nonaka Date: Thu, 16 May 2024 14:51:20 -0700 Subject: [PATCH 7/8] Update Docs/source/MPMD_Tutorials.rst --- Docs/source/MPMD_Tutorials.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst index 5a76ef97..259f5289 100644 --- a/Docs/source/MPMD_Tutorials.rst +++ b/Docs/source/MPMD_Tutorials.rst @@ -135,7 +135,7 @@ Furthermore, the run process here assumes that the current working directory is mpirun -np 8 ./main3d.gnu.DEBUG.MPI.ex : -np 4 python main.py Compiling and Running on Perlmutter (NERSC) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Running this case on perlmutter involves creating a python virtual environment. pyAMReX must be compiled and installed into this virtual environment after its creation. From 7aaa8b8ca351c3b004f20db8f5aea0b051577bfd Mon Sep 17 00:00:00 2001 From: Bhargav Sriram Siddani Date: Thu, 16 May 2024 16:07:32 -0700 Subject: [PATCH 8/8] MPMD - Doc update --- Docs/source/MPMD_Tutorials.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst index 259f5289..28d11e3d 100644 --- a/Docs/source/MPMD_Tutorials.rst +++ b/Docs/source/MPMD_Tutorials.rst @@ -97,6 +97,7 @@ The current case presents such a scenario where the **BoxArray** is defined only Compiling and Running on a local system ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + The compile process for pyAMReX is only briefly described here. Please refer to the `pyAMReX documentation `_ for more details. **It must be mentioned that mpi4py is an important dependency that should exist in the utilized environment**. @@ -141,6 +142,8 @@ Running this case on perlmutter involves creating a python virtual environment. pyAMReX must be compiled and installed into this virtual environment after its creation. Similar to the previous case, this case also has supporting scripts to run on CPUs and GPUs. +The process detailed below assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``. + Creating a virtual environment """"""""""""""""""""""""""""""