diff --git a/Docs/source/MPMD_Tutorials.rst b/Docs/source/MPMD_Tutorials.rst
new file mode 100644
index 00000000..28d11e3d
--- /dev/null
+++ b/Docs/source/MPMD_Tutorials.rst
@@ -0,0 +1,180 @@
+.. _tutorials_mpmd:
+
+AMReX-MPMD
+==========
+
+AMReX-MPMD utilizes the Multiple Program Multiple Data (MPMD) feature of MPI to provide cross-functionality for AMReX-based codes.
+The framework enables data transfer across two different applications through **MPMD::Copier** class, which typically takes **BoxArray** of its application as an argument.
+**Copier** instances created in both the applications together identify the overlapping cells for which the data transfer must occur.
+**Copier::send** & **Copier::recv** functions, which take a **MultiFab** as an argument, are used to transfer the desired data of overlapping regions.
+
+Case-1
+------
+
+The current case demonstrates the MPMD capability across two C++ applications.
+
+Contents
+^^^^^^^^
+
+**Source_1** subfolder contains ``main_1.cpp`` which will be treated as the first application.
+Similarly, **Source_2** subfolder contains ``main_2.cpp`` that will be treated as the second application.
+
+Overview
+^^^^^^^^
+
+The domain in ``main_1.cpp`` is set to ``lo = {0, 0, 0}`` and ``hi = {31, 31, 31}``, while the domain in ``main_2.cpp`` is set to ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``.
+Hence, the data transfer will occur for the region ``lo = {16, 16, 16}`` and ``hi = {31, 31, 31}``.
+Furthermore, the domain in ``main_1.cpp`` is split into boxes using ``max_grid_size=16``, while the domain in ``main_2.cpp`` is split using ``max_grid_size=8``.
+Therefore, the **BoxArray**, and moreover, the number of boxes for the overlapping region are different across the two applications.
+The data transfer demonstration is performed using a two component *MultiFab*.
+The first component is populated in ``main_1.cpp`` before it is transferred to ``main_2.cpp``.
+The second component is populated in ``main_2.cpp`` based on the received first component.
+Finally, the second component is transferred from ``main_2.cpp`` to ``main_1.cpp``.
+It can be seen from the plotfile generated by ``main_1.cpp`` that the second component is non-zero only for the overlapping region.
+
+Compile
+^^^^^^^
+
+The compile process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-1/``.
+
+.. code-block:: bash
+
+ # cd into Source_1 to compile the first application
+ cd Source_1/
+ # Include USE_CUDA=TRUE for CUDA GPUs
+ make USE_MPI=TRUE
+
+ # cd into Source_2 to compile the second application
+ cd ../Source_2/
+ # Include USE_CUDA=TRUE for CUDA GPUs
+ make USE_MPI=TRUE
+
+Run
+^^^
+
+Here, the current case is being run using a total of 12 MPI ranks with 8 allocated to ``main_1.cpp`` and the rest for ``main_2.cpp``.
+Please note that MPI ranks attributed to each application/code need to be continuous, i.e., MPI ranks 0-7 are for ``main_1.cpp`` and 8-11 are for ``main_2.cpp``.
+This may be default behaviour on several systems.
+Furthermore, the run process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-1/``.
+
+.. code-block:: bash
+
+ # Running the MPMD process with 12 ranks
+ mpirun -np 8 Source_1/main3d.gnu.DEBUG.MPI.ex : -np 4 Source_2/main3d.gnu.DEBUG.MPI.ex
+
+Running on Perlmutter (NERSC)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This section presents information regarding sample scripts that can be used to run this case on `Perlmutter (NERSC) `_.
+The scripts ``mpmd_cpu.sh`` and ``mpmd_cpu.conf`` can be used to run the CPU version.
+Similarly, ``mpmd_gpu.sh`` and ``mpmd_gpu.conf`` can be used to run the GPU version.
+Please note that ``perlmutter_gpu.profile`` must be leveraged to compile the GPU version of the applications.
+
+The content presented here is based on the following references:
+
+ * `NERSC documentation `_
+ * `WarpX documentation `_
+
+Case-2
+------
+
+The current case demonstrates the MPMD capability across C++ and python applications.
+This language interoperability is achieved through the python bindings of AMReX, `pyAMReX `_.
+
+Contents
+^^^^^^^^
+
+``main.cpp`` will be the C++ application and ``main.py`` will be the python application.
+
+Overview
+^^^^^^^^
+
+In the previous case (Case-1) of MPMD each application has its own domain, and therefore, different **BoxArray**.
+However, there exist scenarios where both applications deal with the same **BoxArray**.
+The current case presents such a scenario where the **BoxArray** is defined only in the ``main.cpp`` application, but this information is relayed to ``main.py`` application through the **MPMD::Copier**.
+
+**Please ensure that the same AMReX source code is used to compile both the applications.**
+
+Compiling and Running on a local system
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The compile process for pyAMReX is only briefly described here.
+Please refer to the `pyAMReX documentation `_ for more details.
+**It must be mentioned that mpi4py is an important dependency that should exist in the utilized environment**.
+For pyAMReX conda environment, it can be installed using ``conda install -c conda-forge mpi4py``, `resource `_.
+
+.. code-block:: bash
+
+ # find dependencies & configure
+ # Include -DAMReX_GPU_BACKEND=CUDA for gpu version
+ cmake -S . -B build -DAMReX_SPACEDIM="1;2;3" -DAMReX_MPI=ON -DpyAMReX_amrex_src=/path/to/amrex
+
+ # compile & install, here we use four threads
+ cmake --build build -j 4 --target pip_install
+
+main.cpp compile
+^^^^^^^^^^^^^^^^
+
+The compile process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``.
+
+.. code-block:: bash
+
+ # Include USE_CUDA=TRUE for CUDA GPUs
+ make USE_MPI=TRUE
+
+Run
+^^^
+
+Here, the current case is being run using a total of 12 MPI ranks with 8 allocated to ``main.cpp`` and the rest for ``main.py``.
+As mentioned earlier, the MPI ranks attributed to each application/code need to be continuous, i.e., MPI ranks 0-7 are for ``main.cpp`` and 8-11 are for ``main.py``.
+This may be default behaviour on several systems.
+Furthermore, the run process here assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``.
+
+.. code-block:: bash
+
+ # Running the MPMD process with 12 ranks
+ mpirun -np 8 ./main3d.gnu.DEBUG.MPI.ex : -np 4 python main.py
+
+Compiling and Running on Perlmutter (NERSC)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Running this case on perlmutter involves creating a python virtual environment.
+pyAMReX must be compiled and installed into this virtual environment after its creation.
+Similar to the previous case, this case also has supporting scripts to run on CPUs and GPUs.
+
+The process detailed below assumes that the current working directory is ``ExampleCodes/MPMD/Case-2/``.
+
+Creating a virtual environment
+""""""""""""""""""""""""""""""
+
+.. code-block:: bash
+
+ # Setup the required environment variables
+ source perlmutter_gpu.profile
+
+ # BEFORE PERFORMING THE FOLLOWING COMMANDS
+ # MOVE TO A DIRECTORY WHERE THE PYTHON VIRTUAL ENVIRONMENT MUST EXIST
+
+ python3 -m pip install --upgrade pip
+ python3 -m pip install --upgrade virtualenv
+ python3 -m pip cache purge
+ python3 -m venv pyamrex-gpu
+ source pyamrex-gpu/bin/activate
+ python3 -m pip install --upgrade pip
+ python3 -m pip install --upgrade build
+ python3 -m pip install --upgrade packaging
+ python3 -m pip install --upgrade wheel
+ python3 -m pip install --upgrade setuptools
+ python3 -m pip install --upgrade cython
+ python3 -m pip install --upgrade numpy
+ python3 -m pip install --upgrade pandas
+ python3 -m pip install --upgrade scipy
+ MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-cache-dir --no-build-isolation --no-binary mpi4py
+ python3 -m pip install --upgrade openpmd-api
+ python3 -m pip install --upgrade matplotlib
+ python3 -m pip install --upgrade yt
+ python3 -m pip install --upgrade cupy-cuda12x # CUDA 12 compatible wheel
+
+The content presented here is based on the following reference:
+
+ * `WarpX documentation `_
diff --git a/Docs/source/index.rst b/Docs/source/index.rst
index 2674a601..374e1d04 100644
--- a/Docs/source/index.rst
+++ b/Docs/source/index.rst
@@ -51,6 +51,7 @@ sorted by the following categories:
- :ref:`heFFTe` -- heFFTe distributed tutorials.
- :ref:`Linear Solvers` -- Examples of several linear solvers.
- :ref:`ML/PYTORCH` -- Use of pytorch models to replace point-wise computational kernels.
+- :ref:`MPMD` -- Usage of AMReX-MPMD (Multiple Program Multiple Data) framework.
- :ref:`MUI` -- Incorporates the MxUI/MUI (Multiscale Universal interface) frame into AMReX.
- :ref:`Particles` -- Basic usage of AMReX's particle data structures.
- :ref:`Python` -- Using AMReX and interfacing with AMReX applications form Python - via `pyAMReX `__
@@ -75,6 +76,7 @@ sorted by the following categories:
heFFTe_Tutorial
LinearSolvers_Tutorial
ML_Tutorial
+ MPMD_Tutorials
MUI_Tutorial
Particles_Tutorial
Python_Tutorial
@@ -102,6 +104,8 @@ sorted by the following categories:
.. _`Linear Solvers`: LinearSolvers_Tutorial.html
+.. _`MPMD`: MPMD_Tutorials.html
+
.. _`MUI`: MUI_Tutorial.html
.. _`Particles`: Particles_Tutorial.html
diff --git a/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile b/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile
new file mode 100644
index 00000000..b6b95e7b
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/Source_1/GNUmakefile
@@ -0,0 +1,20 @@
+AMREX_HOME ?= ../../../../../amrex
+
+DEBUG = TRUE
+
+DIM = 3
+
+COMP = gcc
+
+USE_MPI = TRUE
+
+USE_OMP = FALSE
+USE_CUDA = FALSE
+USE_HIP = FALSE
+
+include $(AMREX_HOME)/Tools/GNUMake/Make.defs
+
+include ./Make.package
+include $(AMREX_HOME)/Src/Base/Make.package
+
+include $(AMREX_HOME)/Tools/GNUMake/Make.rules
diff --git a/ExampleCodes/MPMD/Case-1/Source_1/Make.package b/ExampleCodes/MPMD/Case-1/Source_1/Make.package
new file mode 100644
index 00000000..16bde740
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/Source_1/Make.package
@@ -0,0 +1 @@
+CEXE_sources += main_1.cpp
diff --git a/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp b/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp
new file mode 100644
index 00000000..8097faef
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/Source_1/main_1.cpp
@@ -0,0 +1,69 @@
+
+#include
+#include
+#include
+#include
+#include
+#include
+
+int main(int argc, char* argv[])
+{
+ // Initialize amrex::MPMD to establish communication across the two apps
+ MPI_Comm comm = amrex::MPMD::Initialize(argc, argv);
+ amrex::Initialize(argc,argv,true,comm);
+ {
+ amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n";
+ // Number of data components at each grid point in the MultiFab
+ int ncomp = 2;
+ // how many grid cells in each direction over the problem domain
+ int n_cell = 32;
+ // how many grid cells are allowed in each direction over each box
+ int max_grid_size = 16;
+ //BoxArray -- Abstract Domain Setup
+ // integer vector indicating the lower coordindate bounds
+ amrex::IntVect dom_lo(0,0,0);
+ // integer vector indicating the upper coordindate bounds
+ amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1);
+ // box containing the coordinates of this domain
+ amrex::Box domain(dom_lo, dom_hi);
+ // will contain a list of boxes describing the problem domain
+ amrex::BoxArray ba(domain);
+ // chop the single grid into many small boxes
+ ba.maxSize(max_grid_size);
+ // Distribution Mapping
+ amrex::DistributionMapping dm(ba);
+ // Create an MPMD Copier based on current ba & dm
+ auto copr = amrex::MPMD::Copier(ba,dm,false);
+ //Define MuliFab
+ amrex::MultiFab mf(ba, dm, ncomp, 0);
+ //Geometry -- Physical Properties for data on our domain
+ amrex::RealBox real_box ({0., 0., 0.}, {1. , 1., 1.});
+ amrex::Geometry geom(domain, &real_box);
+ //Calculate Cell Sizes
+ amrex::GpuArray dx = geom.CellSizeArray(); //dx[0] = dx dx[1] = dy dx[2] = dz
+ //Fill only the first component of the MultiFab
+ for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){
+ const amrex::Box& bx = mfi.validbox();
+ const amrex::Array4& mf_array = mf.array(mfi);
+
+ amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){
+
+ amrex::Real x = (i+0.5) * dx[0];
+ amrex::Real y = (j+0.5) * dx[1];
+ amrex::Real z = (k+0.5) * dx[2];
+ amrex::Real r_squared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01;
+
+ mf_array(i,j,k,0) = 1.0 + std::exp(-r_squared);
+
+ });
+ }
+ // Send ONLY the first populated MultiFab component to main_2.cpp
+ copr.send(mf,0,1);
+ // Receive ONLY the second MultiFab component from main_2.cpp
+ copr.recv(mf,1,1);
+ //Plot MultiFab Data
+ WriteSingleLevelPlotfile("plt_cpp_1", mf, {"comp0","comp1"}, geom, 0., 0);
+ }
+ amrex::Finalize();
+ amrex::MPMD::Finalize();
+}
diff --git a/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile b/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile
new file mode 100644
index 00000000..b6b95e7b
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/Source_2/GNUmakefile
@@ -0,0 +1,20 @@
+AMREX_HOME ?= ../../../../../amrex
+
+DEBUG = TRUE
+
+DIM = 3
+
+COMP = gcc
+
+USE_MPI = TRUE
+
+USE_OMP = FALSE
+USE_CUDA = FALSE
+USE_HIP = FALSE
+
+include $(AMREX_HOME)/Tools/GNUMake/Make.defs
+
+include ./Make.package
+include $(AMREX_HOME)/Src/Base/Make.package
+
+include $(AMREX_HOME)/Tools/GNUMake/Make.rules
diff --git a/ExampleCodes/MPMD/Case-1/Source_2/Make.package b/ExampleCodes/MPMD/Case-1/Source_2/Make.package
new file mode 100644
index 00000000..b4266b6f
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/Source_2/Make.package
@@ -0,0 +1 @@
+CEXE_sources += main_2.cpp
diff --git a/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp b/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp
new file mode 100644
index 00000000..847421fe
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/Source_2/main_2.cpp
@@ -0,0 +1,62 @@
+
+#include
+#include
+#include
+#include
+#include
+#include
+
+int main(int argc, char* argv[])
+{
+ // Initialize amrex::MPMD to establish communication across the two apps
+ MPI_Comm comm = amrex::MPMD::Initialize(argc, argv);
+ amrex::Initialize(argc,argv,true,comm);
+ {
+ amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n";
+ // Number of data components at each grid point in the MultiFab
+ int ncomp = 2;
+ // how many grid cells in each direction over the problem domain
+ int n_cell = 32;
+ // how many grid cells are allowed in each direction over each box
+ int max_grid_size = 8;
+ //BoxArray -- Abstract Domain Setup
+ // integer vector indicating the lower coordindate bounds
+ amrex::IntVect dom_lo(n_cell/2, n_cell/2, n_cell/2);
+ // integer vector indicating the upper coordindate bounds
+ amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1);
+ // box containing the coordinates of this domain
+ amrex::Box domain(dom_lo, dom_hi);
+ // will contain a list of boxes describing the problem domain
+ amrex::BoxArray ba(domain);
+ // chop the single grid into many small boxes
+ ba.maxSize(max_grid_size);
+ // Distribution Mapping
+ amrex::DistributionMapping dm(ba);
+ // Create an MPMD Copier based on current ba & dm
+ auto copr = amrex::MPMD::Copier(ba,dm,false);
+ //Define MuliFab
+ amrex::MultiFab mf(ba, dm, ncomp, 0);
+ //Geometry -- Physical Properties for data on our domain
+ amrex::RealBox real_box ({0.5, 0.5, 0.5}, {1. , 1., 1.});
+ amrex::Geometry geom(domain, &real_box);
+ // Receive ONLY the first populated MultiFab component from main_1.cpp
+ copr.recv(mf,0,1);
+ //Fill the second component of the MultiFab
+ for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){
+ const amrex::Box& bx = mfi.validbox();
+ const amrex::Array4& mf_array = mf.array(mfi);
+
+ amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){
+
+ mf_array(i,j,k,1) = amrex::Real(10.)*mf_array(i,j,k,0);
+
+ });
+ }
+ // Send ONLY the second MultiFab component (populated here) to main_1.cpp
+ copr.send(mf,1,1);
+ //Plot MultiFab Data
+ WriteSingleLevelPlotfile("plt_cpp_2", mf, {"comp0","comp1"}, geom, 0., 0);
+ }
+ amrex::Finalize();
+ amrex::MPMD::Finalize();
+}
diff --git a/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf b/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf
new file mode 100644
index 00000000..cdb07f28
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/mpmd_cpu.conf
@@ -0,0 +1,2 @@
+0-7 ./Source_1/main3d.gnu.x86-milan.DEBUG.MPI.ex
+8-11 ./Source_2/main3d.gnu.x86-milan.DEBUG.MPI.ex
diff --git a/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh
new file mode 100644
index 00000000..d7ed7832
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/mpmd_cpu.sh
@@ -0,0 +1,17 @@
+#!/bin/bash
+#SBATCH -N 1 # Total number of nodes
+#SBATCH -n 12 # Total number of tasks
+#SBATCH -c 4 # number of processors per MPI task
+#SBATCH -C cpu
+#SBATCH -q debug
+#SBATCH -J mpmd_test
+#SBATCH -t 00:05:00
+#SBATCH -A mpxxx
+
+#OpenMP settings:
+export OMP_NUM_THREADS=1
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+
+#run the application:
+srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf
diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf b/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf
new file mode 100644
index 00000000..6cc3f336
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/mpmd_gpu.conf
@@ -0,0 +1,2 @@
+0-7 ./Source_1/main3d.gnu.DEBUG.MPI.CUDA.ex
+8-11 ./Source_2/main3d.gnu.DEBUG.MPI.CUDA.ex
diff --git a/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh
new file mode 100644
index 00000000..f45611a3
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/mpmd_gpu.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+#SBATCH -N 3 # Total number of nodes
+#SBATCH -n 12 # Total number of tasks
+#SBATCH -c 4 # number of processors per MPI task
+#SBATCH -C gpu
+#SBATCH -G 12 # Total number of GPUs
+#SBATCH -q debug
+#SBATCH -J mpmd_test
+#SBATCH -t 00:05:00
+#SBATCH -A mpxxx
+
+source ./perlmutter_gpu.profile
+
+#OpenMP settings:
+export OMP_NUM_THREADS=1
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+# Taken from WarpX
+export MPICH_OFI_NIC_POLICY=GPU
+GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"
+
+#run the application:
+srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf
diff --git a/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile b/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile
new file mode 100644
index 00000000..e371e849
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-1/perlmutter_gpu.profile
@@ -0,0 +1,26 @@
+# required dependencies
+module load gpu
+module load PrgEnv-gnu
+module load craype
+module load craype-x86-milan
+module load craype-accel-nvidia80
+module load cudatoolkit
+module load cmake/3.24.3
+
+# necessary to use CUDA-Aware MPI and run a job
+export CRAY_ACCEL_TARGET=nvidia80
+
+# optimize CUDA compilation for A100
+export AMREX_CUDA_ARCH=8.0
+
+# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3)
+# note: the cc/CC/ftn wrappers below add those
+export CXXFLAGS="-march=znver3"
+export CFLAGS="-march=znver3"
+
+# compiler environment hints
+export CC=cc
+export CXX=CC
+export FC=ftn
+export CUDACXX=$(which nvcc)
+export CUDAHOSTCXX=CC
diff --git a/ExampleCodes/MPMD/Case-2/GNUmakefile b/ExampleCodes/MPMD/Case-2/GNUmakefile
new file mode 100644
index 00000000..a68d0980
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/GNUmakefile
@@ -0,0 +1,20 @@
+AMREX_HOME ?= ../../../../amrex
+
+DEBUG = TRUE
+
+DIM = 3
+
+COMP = gcc
+
+USE_MPI = TRUE
+
+USE_OMP = FALSE
+USE_CUDA = FALSE
+USE_HIP = FALSE
+
+include $(AMREX_HOME)/Tools/GNUMake/Make.defs
+
+include ./Make.package
+include $(AMREX_HOME)/Src/Base/Make.package
+
+include $(AMREX_HOME)/Tools/GNUMake/Make.rules
diff --git a/ExampleCodes/MPMD/Case-2/Make.package b/ExampleCodes/MPMD/Case-2/Make.package
new file mode 100644
index 00000000..6b4b865e
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/Make.package
@@ -0,0 +1 @@
+CEXE_sources += main.cpp
diff --git a/ExampleCodes/MPMD/Case-2/main.cpp b/ExampleCodes/MPMD/Case-2/main.cpp
new file mode 100644
index 00000000..20ecda64
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/main.cpp
@@ -0,0 +1,70 @@
+
+#include
+#include
+#include
+#include
+#include
+#include
+
+int main(int argc, char* argv[])
+{
+ // Initialize amrex::MPMD to establish communication across the two apps
+ MPI_Comm comm = amrex::MPMD::Initialize(argc, argv);
+ amrex::Initialize(argc,argv,true,comm);
+ {
+ amrex::Print() << "Hello world from AMReX version " << amrex::Version() << "\n";
+ // Number of data components at each grid point in the MultiFab
+ int ncomp = 2;
+ // how many grid cells in each direction over the problem domain
+ int n_cell = 32;
+ // how many grid cells are allowed in each direction over each box
+ int max_grid_size = 16;
+ //BoxArray -- Abstract Domain Setup
+ // integer vector indicating the lower coordindate bounds
+ amrex::IntVect dom_lo(0,0,0);
+ // integer vector indicating the upper coordindate bounds
+ amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1);
+ // box containing the coordinates of this domain
+ amrex::Box domain(dom_lo, dom_hi);
+ // will contain a list of boxes describing the problem domain
+ amrex::BoxArray ba(domain);
+ // chop the single grid into many small boxes
+ ba.maxSize(max_grid_size);
+ // Distribution Mapping
+ amrex::DistributionMapping dm(ba);
+ // Create an MPMD Copier that
+ // sends the BoxArray information to the other (python) application
+ auto copr = amrex::MPMD::Copier(ba,dm,true);
+ //Define MuliFab
+ amrex::MultiFab mf(ba, dm, ncomp, 0);
+ //Geometry -- Physical Properties for data on our domain
+ amrex::RealBox real_box ({0., 0., 0.}, {1. , 1., 1.});
+ amrex::Geometry geom(domain, &real_box);
+ //Calculate Cell Sizes
+ amrex::GpuArray dx = geom.CellSizeArray(); //dx[0] = dx dx[1] = dy dx[2] = dz
+ //Fill only the first component of the MultiFab
+ for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){
+ const amrex::Box& bx = mfi.validbox();
+ const amrex::Array4& mf_array = mf.array(mfi);
+
+ amrex::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){
+
+ amrex::Real x = (i+0.5) * dx[0];
+ amrex::Real y = (j+0.5) * dx[1];
+ amrex::Real z = (k+0.5) * dx[2];
+ amrex::Real r_squared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01;
+
+ mf_array(i,j,k,0) = 1.0 + std::exp(-r_squared);
+
+ });
+ }
+ // Send ONLY the first populated MultiFab component to the other app
+ copr.send(mf,0,1);
+ // Receive ONLY the second MultiFab component from the other app
+ copr.recv(mf,1,1);
+ //Plot MultiFab Data
+ WriteSingleLevelPlotfile("plt_cpp_001", mf, {"comp0","comp1"}, geom, 0., 0);
+ }
+ amrex::Finalize();
+ amrex::MPMD::Finalize();
+}
diff --git a/ExampleCodes/MPMD/Case-2/main.py b/ExampleCodes/MPMD/Case-2/main.py
new file mode 100644
index 00000000..542761f6
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/main.py
@@ -0,0 +1,56 @@
+from mpi4py import MPI
+
+import amrex.space3d as amr
+
+# Initialize amrex::MPMD to establish communication across the two apps
+# However, leverage MPMD_Initialize_without_split
+# so that communication split can be performed using mpi4py.MPI
+amr.MPMD_Initialize_without_split([])
+# Leverage MPI from mpi4py to perform communication split
+app_comm_py = MPI.COMM_WORLD.Split(amr.MPMD_AppNum(), amr.MPMD_MyProc())
+# Initialize AMReX
+amr.initialize_when_MPMD([], app_comm_py)
+
+amr.Print(f"Hello world from pyAMReX version {amr.__version__}\n")
+# Create a MPMD Copier that gets the BoxArray information from the other (C++) app
+copr = amr.MPMD_Copier(True)
+# Number of data components at each grid point in the MultiFab
+ncomp = 2
+# Define a MultiFab using the created MPMD_Copier
+mf = amr.MultiFab(copr.box_array(), copr.distribution_map(), ncomp, 0)
+mf.set_val(0.0)
+
+# Receive ONLY the FIRST MultiFab component populated in the other (C++) app
+copr.recv(mf, 0, 1)
+
+# Fill the second MultiFab component based on the first component
+for mfi in mf:
+ # Convert Array4 to numpy/cupy array
+ mf_array = mf.array(mfi).to_xp(copy=False, order="F")
+ mf_array[:, :, :, 1] = 10.0 * mf_array[:, :, :, 0]
+
+# Send ONLY the second MultiFab component to the other (C++) app
+copr.send(mf, 1, 1)
+# Plot MultiFab data
+# HERE THE DOMAIN INFORMATION IS ONLY BEING UTILIZED TO SAVE A PLOTFILE
+# How many grid cells in each direction over the problem domain
+n_cell = 32
+# Integer vector indicating the lower coordinate bounds
+dom_lo = amr.IntVect(0, 0, 0)
+# Integer vector indicating the upper coordinate bounds
+dom_hi = amr.IntVect(n_cell - 1, n_cell - 1, n_cell - 1)
+# Box containing the coordinates of this domain
+domain = amr.Box(dom_lo, dom_hi)
+# Geometry: physical properties for data on our domain
+real_box = amr.RealBox([0.0, 0.0, 0.0], [1.0, 1.0, 1.0])
+coord = 0 # Cartesian
+is_per = [0, 0, 0] # periodicity
+geom = amr.Geometry(domain, real_box, coord, is_per)
+plotfile = amr.concatenate(root="plt_py_", num=1, mindigits=3)
+varnames = amr.Vector_string(["comp0", "comp1"])
+amr.write_single_level_plotfile(plotfile, mf, varnames, geom, time=0.0, level_step=0)
+
+# Finalize AMReX
+amr.finalize()
+# Finalize AMReX::MPMD
+amr.MPMD_Finalize()
diff --git a/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf b/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf
new file mode 100644
index 00000000..0a4d6f0f
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/mpmd_cpu.conf
@@ -0,0 +1,2 @@
+0-7 main3d.gnu.x86-milan.DEBUG.MPI.ex
+8-11 python main.py
diff --git a/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh b/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh
new file mode 100644
index 00000000..236e1fa2
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/mpmd_cpu.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+#SBATCH -N 1 # Total number of nodes
+#SBATCH -n 12 # Total number of tasks
+#SBATCH -c 4 # number of processors per MPI task
+#SBATCH -C cpu
+#SBATCH -q debug
+#SBATCH -J mpmd_test
+#SBATCH -t 00:05:00
+#SBATCH -A mpxxx
+
+# Activate the virtual environment
+source /path/to/pyamrex-gpu/bin/activate
+
+#OpenMP settings:
+export OMP_NUM_THREADS=1
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+
+#run the application:
+srun --multi-prog --cpu_bind=cores ./mpmd_cpu.conf
+
+deactivate
diff --git a/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf b/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf
new file mode 100644
index 00000000..d748c3ca
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/mpmd_gpu.conf
@@ -0,0 +1,2 @@
+0-7 main3d.gnu.DEBUG.MPI.CUDA.ex
+8-11 python main.py
diff --git a/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh b/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh
new file mode 100644
index 00000000..571d6cf1
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/mpmd_gpu.sh
@@ -0,0 +1,27 @@
+#!/bin/bash
+#SBATCH -N 3 # Total number of nodes
+#SBATCH -n 12 # Total number of tasks
+#SBATCH -c 4 # number of processors per MPI task
+#SBATCH -C gpu
+#SBATCH -G 12 # Total number of GPUs
+#SBATCH -q debug
+#SBATCH -J mpmd_test
+#SBATCH -t 00:05:00
+#SBATCH -A mpxxx
+
+source ./perlmutter_gpu.profile
+# Activate the virtual environment
+source /path/to/pyamrex-gpu/bin/activate
+
+#OpenMP settings:
+export OMP_NUM_THREADS=1
+export OMP_PLACES=threads
+export OMP_PROC_BIND=spread
+# Taken from WarpX
+export MPICH_OFI_NIC_POLICY=GPU
+GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"
+
+#run the application:
+srun --multi-prog --cpu_bind=cores --gpu-bind=single:1 ./mpmd_gpu.conf
+
+deactivate
diff --git a/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile b/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile
new file mode 100644
index 00000000..b720c1fc
--- /dev/null
+++ b/ExampleCodes/MPMD/Case-2/perlmutter_gpu.profile
@@ -0,0 +1,28 @@
+# required dependencies
+module load gpu
+module load PrgEnv-gnu
+module load craype
+module load craype-x86-milan
+module load craype-accel-nvidia80
+module load cudatoolkit
+module load cmake/3.24.3
+# Required for pyAMReX
+module load cray-python/3.11.5
+
+# necessary to use CUDA-Aware MPI and run a job
+export CRAY_ACCEL_TARGET=nvidia80
+
+# optimize CUDA compilation for A100
+export AMREX_CUDA_ARCH=8.0
+
+# optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3)
+# note: the cc/CC/ftn wrappers below add those
+export CXXFLAGS="-march=znver3"
+export CFLAGS="-march=znver3"
+
+# compiler environment hints
+export CC=cc
+export CXX=CC
+export FC=ftn
+export CUDACXX=$(which nvcc)
+export CUDAHOSTCXX=CC