Update README with reproducibility instructions

kamping-site · Apr 12, 2024 · 2b33992 · 2b33992
1 parent 115ece8
commit 2b33992
Show file tree

Hide file tree

Showing 21 changed files with 3,152 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,62 @@
-# pDCX
+# Comparison of Lines of Code using [KaMPIng][kamping], Plain MPI, and [Thrill][thrill]
+
+This repository contains the code for two suffix array construction algorithms: [Prefix Doubling](https://doi.org/10.1137/0222058) (PD) and [DCX](https://dl.acm.org/doi/10.1145/1217856.1217858).
+These algorithms have been implemented using [KaMPIng][kamping] (PD and DCX), Plain MPI (PD and DCX), and [Thrill][thrill] (PD).
+
+The lines of code (LOC) necessary to implement the two algorithms using the different distributed memory programming frameworks are listed below.
+We measured the lines of code using [`cloc`](https://github.com/AlDanial/cloc) 2.00.
+All code files have been formatted using [`clang-format`](https://releases.llvm.org/14.0.0/tools/clang/docs/ClangFormat.html) 14 using the default Google style (`clang-format --style=Google`).
+
+|    | [KaMPIng][kamping] | plain MPI | [Thrill][thrill] |
+|----|--------------------|-----------|------------------|
+| PD | 163                |426 (+1442)| 266              |
+| DCX| 1264               |1396       | ---              |
+
+## Prefix Doubling
+Note that the prefix doubling implementations are copied from three different projects.
+To keep the overhead of this repository small, we removed all code not directly part of the algorithm, e.g., `main` functions and benchmark utility.
+Therefore, the code included here is not expected to be executed.
+If you want to execute the algorithms, we refer to the corresponding repositories ([KaMPIng implementation](https://github.com/kamping-site/kamping/blob/main/examples/applications/suffix-sorting/prefix_doubling.hpp), [plain MPI implementation](https://github.com/kurpicz/dsss/blob/master/dsss/suffix_sorting/prefix_doubling.hpp), and [Thrill implementation](https://github.com/thrill/thrill/blob/master/examples/suffix_sorting/prefix_doubling.cpp)).
+
+To reproduce the measured lines of code, you can use the following commands:
+
+```bash
+cd prefix_doubling_comparison/
+echo -e "\e[32mLines of Code: PD KaMPIng Implementation\e[0m"
+cloc kamping_prefix_doubling.hpp
+echo -e "\e[32mLines of Code: PD Plain MPI Implementation\e[0m"
+cloc mpi_prefix_doubling.hpp
+echo -e "\e[32mLines of Code: Plain MPI Implementation MPI Wrapper\e[0m"
+cloc dsss_mpi
+echo -e "\e[32mLines of Code: PD Thrill Implementation\e[0m"
+cloc thrill_prefix_doubling.cpp
+cd ..
+```
+
+## DCX
 Implementation of the [DCX](https://dl.acm.org/doi/10.1145/1217856.1217858) suffix array construction algorithm.
-The original [implementation](src/mpi_dc.cpp) is by Timo Bingmann and consists of 2055 lines of code, when removing code shared with the [KaMPIng][kamping] implementation.
+The original [implementation](src/mpi_dc.cpp) is by Timo Bingmann and consists of 1396 lines of code, when removing code shared with the [KaMPIng][kamping] implementation.
 In our [KaMPIng][kamping] [implementation](src/kamping_dc.cpp), we replaced all plain MPI calls with our wrapper.
-This mainly resulted in less boilerplate code and 1832 lines of code, i.e., 10.85% less code.
+This mainly resulted in less boilerplate code and 1264 lines of code, i.e., 9.5% less code.
+
+To reproduce the measured lines of code, you can use the following commands:
+
+```bash
+cd src/
+echo -e "\e[32mLines of Code: DCX KaMPIng Implementation\e[0m"
+cloc kamping_dc.cpp
+echo -e "\e[32mLines of Code: Plain MPI Implementation\e[0m"
+cloc mpi_dc.cpp
+cd ..
+```
+
+Since this repository is based on these two implementations, they can be build and executed.
+
+```bash
+cmake -B build
+cmake --build build
+mpirun -np <# MPI threads> ./build/src/[kampingDCX|pDCX] [3/7/13] <input_file>
+```
 
 [kamping]: https://github.com/kamping-site/kamping "KaMPIng Repository"
+[thrill]: https://project-thrill.org "Thrill's website"
diff --git a/prefix_doubling_comparison/dsss_mpi/allgather.hpp b/prefix_doubling_comparison/dsss_mpi/allgather.hpp
@@ -0,0 +1,100 @@
+/*******************************************************************************
+ * mpi/allgather.hpp
+ *
+ * Copyright (C) 2018 Florian Kurpicz <florian.kurpicz@tu-dortmund.de>
+ *
+ * All rights reserved. Published under the BSD-2 license in the LICENSE file.
+ ******************************************************************************/
+
+#pragma once
+
+#include <mpi.h>
+
+#include <cstdint>
+#include <vector>
+
+#include "mpi/alltoall.hpp"
+#include "mpi/big_type.hpp"
+#include "mpi/environment.hpp"
+#include "mpi/type_mapper.hpp"
+#include "util/string_set.hpp"
+
+namespace dsss::mpi {
+
+template <typename DataType>
+inline std::vector<DataType> allgather(DataType& send_data,
+                                       environment env = environment()) {
+  data_type_mapper<DataType> dtm;
+  std::vector<DataType> receive_data(env.size());
+  MPI_Allgather(&send_data, 1, dtm.get_mpi_type(), receive_data.data(), 1,
+                dtm.get_mpi_type(), env.communicator());
+  return receive_data;
+}
+
+template <typename DataType>
+static inline std::vector<DataType> allgatherv_small(
+    std::vector<DataType>& send_data, environment env = environment()) {
+  int32_t local_size = send_data.size();
+  std::vector<int32_t> receiving_sizes = allgather(local_size);
+
+  std::vector<int32_t> receiving_offsets(env.size(), 0);
+  for (size_t i = 1; i < receiving_sizes.size(); ++i) {
+    receiving_offsets[i] = receiving_offsets[i - 1] + receiving_sizes[i - 1];
+  }
+
+  std::vector<DataType> receiving_data(receiving_sizes.back() +
+                                       receiving_offsets.back());
+
+  data_type_mapper<DataType> dtm;
+  MPI_Allgatherv(send_data.data(), local_size, dtm.get_mpi_type(),
+                 receiving_data.data(), receiving_sizes.data(),
+                 receiving_offsets.data(), dtm.get_mpi_type(),
+                 env.communicator());
+
+  return receiving_data;
+}
+
+template <typename DataType>
+static inline std::vector<DataType> allgatherv(
+    std::vector<DataType>& send_data, environment env = environment()) {
+  size_t local_size = send_data.size();
+  std::vector<size_t> receiving_sizes = allgather(local_size);
+
+  std::vector<size_t> receiving_offsets(env.size(), 0);
+  for (size_t i = 1; i < receiving_sizes.size(); ++i) {
+    receiving_offsets[i] = receiving_offsets[i - 1] + receiving_sizes[i - 1];
+  }
+
+  if (receiving_sizes.back() + receiving_offsets.back() < env.mpi_max_int()) {
+    return allgatherv_small(send_data);
+  } else {
+    std::vector<MPI_Request> mpi_requests(2 * env.size());
+    std::vector<DataType> receiving_data(receiving_sizes.back() +
+                                         receiving_offsets.back());
+
+    for (int32_t i = 0; i < env.size(); ++i) {
+      auto receive_type = get_big_type<DataType>(receiving_sizes[i]);
+      MPI_Irecv(receiving_data.data() + receiving_offsets[i], 1, receive_type,
+                i, 44227, env.communicator(), &mpi_requests[i]);
+    }
+    auto send_type = get_big_type<DataType>(local_size);
+    for (int32_t i = env.rank(); i < env.rank() + env.size(); ++i) {
+      int32_t target = i % env.size();
+      MPI_Isend(send_data.data(), 1, send_type, target, 44227,
+                env.communicator(), &mpi_requests[env.size() + target]);
+    }
+    MPI_Waitall(2 * env.size(), mpi_requests.data(), MPI_STATUSES_IGNORE);
+    return receiving_data;
+  }
+}
+
+static inline dsss::string_set allgather_strings(
+    std::vector<dsss::char_type>& raw_string_data,
+    environment env = environment()) {
+  auto receiving_data = allgatherv(raw_string_data, env);
+  return dsss::string_set(std::move(receiving_data));
+}
+
+}  // namespace dsss::mpi
+
+/******************************************************************************/
diff --git a/prefix_doubling_comparison/dsss_mpi/allreduce.hpp b/prefix_doubling_comparison/dsss_mpi/allreduce.hpp
@@ -0,0 +1,133 @@
+/*******************************************************************************
+ * mpi/allreduce.hpp
+ *
+ * Copyright (C) 2018 Florian Kurpicz <florian.kurpicz@tu-dortmund.de>
+ *
+ * All rights reserved. Published under the BSD-2 license in the LICENSE file.
+ ******************************************************************************/
+
+#pragma once
+
+#include <array>
+#include <cmath>
+#include <type_traits>
+#include <vector>
+
+#include "mpi/environment.hpp"
+#include "mpi/type_mapper.hpp"
+#include "util/uint_types.hpp"
+
+namespace dsss::mpi {
+
+static inline bool allreduce_and(bool& send_data,
+                                 environment env = environment()) {
+  bool receive_data;
+  MPI_Allreduce(&send_data, &receive_data, type_mapper<bool>::factor(),
+                type_mapper<bool>::type(), MPI_LAND, env.communicator());
+  return receive_data;
+}
+
+static inline bool allreduce_or(bool& send_data,
+                                environment env = environment()) {
+  bool receive_data;
+  MPI_Allreduce(&send_data, &receive_data, type_mapper<bool>::factor(),
+                type_mapper<bool>::type(), MPI_LOR, env.communicator());
+  return receive_data;
+}
+
+template <typename DataType>
+static inline DataType allreduce_max(DataType& send_data,
+                                     environment env = environment()) {
+  static_assert(std::is_arithmetic<DataType>(),
+                "Only arithmetic types are allowed for allreduce_max.");
+  DataType receive_data;
+  MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(),
+                type_mapper<DataType>::type(), MPI_MAX, env.communicator());
+  return receive_data;
+}
+
+template <typename DataType>
+static inline DataType allreduce_min(DataType& send_data,
+                                     environment env = environment()) {
+  static_assert(std::is_arithmetic<DataType>(),
+                "Only arithmetic types are allowed for allreduce_min.");
+  DataType receive_data;
+  MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(),
+                type_mapper<DataType>::type(), MPI_MIN, env.communicator());
+  return receive_data;
+}
+
+template <typename DataType>
+static inline DataType allreduce_sum(DataType& send_data,
+                                     environment env = environment()) {
+  static_assert(std::is_arithmetic<DataType>(),
+                "Only arithmetic types are allowed for allreduce_sum.");
+  DataType receive_data;
+  MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(),
+                type_mapper<DataType>::type(), MPI_SUM, env.communicator());
+  return receive_data;
+}
+
+template <typename DataType>
+static inline std::vector<DataType> allreduce_sum(
+    std::vector<DataType>& send_data, environment env = environment()) {
+  static_assert(std::is_arithmetic<DataType>(),
+                "Only arithmetic types are allowed for allreduce_sum.");
+  std::vector<DataType> result(send_data.size());
+  MPI_Allreduce(send_data.data(), result.data(), send_data.size(),
+                type_mapper<DataType>::type(), MPI_SUM, env.communicator());
+  return result;
+}
+
+template <typename DataType, size_t Length>
+static inline std::array<DataType, Length> allreduce_sum(
+    std::array<DataType, Length>& send_data, environment env = environment()) {
+  static_assert(std::is_arithmetic<DataType>(),
+                "Only arithmetic types are allowed for allreduce_sum.");
+  std::array<DataType, Length> result;
+  MPI_Allreduce(send_data.data(), result.data(),
+                Length * type_mapper<DataType>::factor(),
+                type_mapper<DataType>::type(), MPI_SUM, env.communicator());
+  return result;
+}
+
+template <size_t Length>
+static inline std::array<dsss::uint40, Length> allreduce_sum(
+    std::array<dsss::uint40, Length>& send_data,
+    environment env = environment()) {
+  std::array<size_t, Length> tmp;
+  for (size_t i = 0; i < Length; ++i) {
+    tmp[i] = send_data[i];
+  }
+  tmp = allreduce_sum(tmp, env);
+  std::array<dsss::uint40, Length> result;
+  for (size_t i = 0; i < Length; ++i) {
+    result[i] = tmp[i];
+  }
+  return result;
+}
+
+template <typename DataType>
+static inline DataType allreduce_avg(DataType& send_data,
+                                     environment env = environment()) {
+  static_assert(std::is_arithmetic<DataType>(),
+                "Only arithmetic types are allowed for allreduce_avg.");
+  DataType receive_data = allreduce_sum(send_data);
+  return receive_data / env.size();
+}
+
+// Sample standard deviation
+template <typename DataType>
+static inline DataType allreduce_ssd(DataType& send_data,
+                                     environment env = environment()) {
+  static_assert(std::is_arithmetic<DataType>(),
+                "Only arithmetic types are allowed for allreduce_ssd.");
+  DataType receive_data = allreduce_avg(send_data);
+  DataType tmp = std::pow(send_data - receive_data, 2);
+  receive_data = allreduce_sum(tmp);
+  return std::sqrt(receive_data / env.size());
+}
+
+}  // namespace dsss::mpi
+
+/******************************************************************************/