Skip to content

Commit

Permalink
Update README with reproducibility instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
kurpicz committed Apr 12, 2024
1 parent 115ece8 commit 2b33992
Show file tree
Hide file tree
Showing 21 changed files with 3,152 additions and 3 deletions.
61 changes: 58 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,62 @@
# pDCX
# Comparison of Lines of Code using [KaMPIng][kamping], Plain MPI, and [Thrill][thrill]

This repository contains the code for two suffix array construction algorithms: [Prefix Doubling](https://doi.org/10.1137/0222058) (PD) and [DCX](https://dl.acm.org/doi/10.1145/1217856.1217858).
These algorithms have been implemented using [KaMPIng][kamping] (PD and DCX), Plain MPI (PD and DCX), and [Thrill][thrill] (PD).

The lines of code (LOC) necessary to implement the two algorithms using the different distributed memory programming frameworks are listed below.
We measured the lines of code using [`cloc`](https://github.com/AlDanial/cloc) 2.00.
All code files have been formatted using [`clang-format`](https://releases.llvm.org/14.0.0/tools/clang/docs/ClangFormat.html) 14 using the default Google style (`clang-format --style=Google`).

| | [KaMPIng][kamping] | plain MPI | [Thrill][thrill] |
|----|--------------------|-----------|------------------|
| PD | 163 |426 (+1442)| 266 |
| DCX| 1264 |1396 | --- |

## Prefix Doubling
Note that the prefix doubling implementations are copied from three different projects.
To keep the overhead of this repository small, we removed all code not directly part of the algorithm, e.g., `main` functions and benchmark utility.
Therefore, the code included here is not expected to be executed.
If you want to execute the algorithms, we refer to the corresponding repositories ([KaMPIng implementation](https://github.com/kamping-site/kamping/blob/main/examples/applications/suffix-sorting/prefix_doubling.hpp), [plain MPI implementation](https://github.com/kurpicz/dsss/blob/master/dsss/suffix_sorting/prefix_doubling.hpp), and [Thrill implementation](https://github.com/thrill/thrill/blob/master/examples/suffix_sorting/prefix_doubling.cpp)).

To reproduce the measured lines of code, you can use the following commands:

```bash
cd prefix_doubling_comparison/
echo -e "\e[32mLines of Code: PD KaMPIng Implementation\e[0m"
cloc kamping_prefix_doubling.hpp
echo -e "\e[32mLines of Code: PD Plain MPI Implementation\e[0m"
cloc mpi_prefix_doubling.hpp
echo -e "\e[32mLines of Code: Plain MPI Implementation MPI Wrapper\e[0m"
cloc dsss_mpi
echo -e "\e[32mLines of Code: PD Thrill Implementation\e[0m"
cloc thrill_prefix_doubling.cpp
cd ..
```

## DCX
Implementation of the [DCX](https://dl.acm.org/doi/10.1145/1217856.1217858) suffix array construction algorithm.
The original [implementation](src/mpi_dc.cpp) is by Timo Bingmann and consists of 2055 lines of code, when removing code shared with the [KaMPIng][kamping] implementation.
The original [implementation](src/mpi_dc.cpp) is by Timo Bingmann and consists of 1396 lines of code, when removing code shared with the [KaMPIng][kamping] implementation.
In our [KaMPIng][kamping] [implementation](src/kamping_dc.cpp), we replaced all plain MPI calls with our wrapper.
This mainly resulted in less boilerplate code and 1832 lines of code, i.e., 10.85% less code.
This mainly resulted in less boilerplate code and 1264 lines of code, i.e., 9.5% less code.

To reproduce the measured lines of code, you can use the following commands:

```bash
cd src/
echo -e "\e[32mLines of Code: DCX KaMPIng Implementation\e[0m"
cloc kamping_dc.cpp
echo -e "\e[32mLines of Code: Plain MPI Implementation\e[0m"
cloc mpi_dc.cpp
cd ..
```

Since this repository is based on these two implementations, they can be build and executed.

```bash
cmake -B build
cmake --build build
mpirun -np <# MPI threads> ./build/src/[kampingDCX|pDCX] [3/7/13] <input_file>
```

[kamping]: https://github.com/kamping-site/kamping "KaMPIng Repository"
[thrill]: https://project-thrill.org "Thrill's website"
100 changes: 100 additions & 0 deletions prefix_doubling_comparison/dsss_mpi/allgather.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
/*******************************************************************************
* mpi/allgather.hpp
*
* Copyright (C) 2018 Florian Kurpicz <florian.kurpicz@tu-dortmund.de>
*
* All rights reserved. Published under the BSD-2 license in the LICENSE file.
******************************************************************************/

#pragma once

#include <mpi.h>

#include <cstdint>
#include <vector>

#include "mpi/alltoall.hpp"
#include "mpi/big_type.hpp"
#include "mpi/environment.hpp"
#include "mpi/type_mapper.hpp"
#include "util/string_set.hpp"

namespace dsss::mpi {

template <typename DataType>
inline std::vector<DataType> allgather(DataType& send_data,
environment env = environment()) {
data_type_mapper<DataType> dtm;
std::vector<DataType> receive_data(env.size());
MPI_Allgather(&send_data, 1, dtm.get_mpi_type(), receive_data.data(), 1,
dtm.get_mpi_type(), env.communicator());
return receive_data;
}

template <typename DataType>
static inline std::vector<DataType> allgatherv_small(
std::vector<DataType>& send_data, environment env = environment()) {
int32_t local_size = send_data.size();
std::vector<int32_t> receiving_sizes = allgather(local_size);

std::vector<int32_t> receiving_offsets(env.size(), 0);
for (size_t i = 1; i < receiving_sizes.size(); ++i) {
receiving_offsets[i] = receiving_offsets[i - 1] + receiving_sizes[i - 1];
}

std::vector<DataType> receiving_data(receiving_sizes.back() +
receiving_offsets.back());

data_type_mapper<DataType> dtm;
MPI_Allgatherv(send_data.data(), local_size, dtm.get_mpi_type(),
receiving_data.data(), receiving_sizes.data(),
receiving_offsets.data(), dtm.get_mpi_type(),
env.communicator());

return receiving_data;
}

template <typename DataType>
static inline std::vector<DataType> allgatherv(
std::vector<DataType>& send_data, environment env = environment()) {
size_t local_size = send_data.size();
std::vector<size_t> receiving_sizes = allgather(local_size);

std::vector<size_t> receiving_offsets(env.size(), 0);
for (size_t i = 1; i < receiving_sizes.size(); ++i) {
receiving_offsets[i] = receiving_offsets[i - 1] + receiving_sizes[i - 1];
}

if (receiving_sizes.back() + receiving_offsets.back() < env.mpi_max_int()) {
return allgatherv_small(send_data);
} else {
std::vector<MPI_Request> mpi_requests(2 * env.size());
std::vector<DataType> receiving_data(receiving_sizes.back() +
receiving_offsets.back());

for (int32_t i = 0; i < env.size(); ++i) {
auto receive_type = get_big_type<DataType>(receiving_sizes[i]);
MPI_Irecv(receiving_data.data() + receiving_offsets[i], 1, receive_type,
i, 44227, env.communicator(), &mpi_requests[i]);
}
auto send_type = get_big_type<DataType>(local_size);
for (int32_t i = env.rank(); i < env.rank() + env.size(); ++i) {
int32_t target = i % env.size();
MPI_Isend(send_data.data(), 1, send_type, target, 44227,
env.communicator(), &mpi_requests[env.size() + target]);
}
MPI_Waitall(2 * env.size(), mpi_requests.data(), MPI_STATUSES_IGNORE);
return receiving_data;
}
}

static inline dsss::string_set allgather_strings(
std::vector<dsss::char_type>& raw_string_data,
environment env = environment()) {
auto receiving_data = allgatherv(raw_string_data, env);
return dsss::string_set(std::move(receiving_data));
}

} // namespace dsss::mpi

/******************************************************************************/
133 changes: 133 additions & 0 deletions prefix_doubling_comparison/dsss_mpi/allreduce.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
/*******************************************************************************
* mpi/allreduce.hpp
*
* Copyright (C) 2018 Florian Kurpicz <florian.kurpicz@tu-dortmund.de>
*
* All rights reserved. Published under the BSD-2 license in the LICENSE file.
******************************************************************************/

#pragma once

#include <array>
#include <cmath>
#include <type_traits>
#include <vector>

#include "mpi/environment.hpp"
#include "mpi/type_mapper.hpp"
#include "util/uint_types.hpp"

namespace dsss::mpi {

static inline bool allreduce_and(bool& send_data,
environment env = environment()) {
bool receive_data;
MPI_Allreduce(&send_data, &receive_data, type_mapper<bool>::factor(),
type_mapper<bool>::type(), MPI_LAND, env.communicator());
return receive_data;
}

static inline bool allreduce_or(bool& send_data,
environment env = environment()) {
bool receive_data;
MPI_Allreduce(&send_data, &receive_data, type_mapper<bool>::factor(),
type_mapper<bool>::type(), MPI_LOR, env.communicator());
return receive_data;
}

template <typename DataType>
static inline DataType allreduce_max(DataType& send_data,
environment env = environment()) {
static_assert(std::is_arithmetic<DataType>(),
"Only arithmetic types are allowed for allreduce_max.");
DataType receive_data;
MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(),
type_mapper<DataType>::type(), MPI_MAX, env.communicator());
return receive_data;
}

template <typename DataType>
static inline DataType allreduce_min(DataType& send_data,
environment env = environment()) {
static_assert(std::is_arithmetic<DataType>(),
"Only arithmetic types are allowed for allreduce_min.");
DataType receive_data;
MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(),
type_mapper<DataType>::type(), MPI_MIN, env.communicator());
return receive_data;
}

template <typename DataType>
static inline DataType allreduce_sum(DataType& send_data,
environment env = environment()) {
static_assert(std::is_arithmetic<DataType>(),
"Only arithmetic types are allowed for allreduce_sum.");
DataType receive_data;
MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(),
type_mapper<DataType>::type(), MPI_SUM, env.communicator());
return receive_data;
}

template <typename DataType>
static inline std::vector<DataType> allreduce_sum(
std::vector<DataType>& send_data, environment env = environment()) {
static_assert(std::is_arithmetic<DataType>(),
"Only arithmetic types are allowed for allreduce_sum.");
std::vector<DataType> result(send_data.size());
MPI_Allreduce(send_data.data(), result.data(), send_data.size(),
type_mapper<DataType>::type(), MPI_SUM, env.communicator());
return result;
}

template <typename DataType, size_t Length>
static inline std::array<DataType, Length> allreduce_sum(
std::array<DataType, Length>& send_data, environment env = environment()) {
static_assert(std::is_arithmetic<DataType>(),
"Only arithmetic types are allowed for allreduce_sum.");
std::array<DataType, Length> result;
MPI_Allreduce(send_data.data(), result.data(),
Length * type_mapper<DataType>::factor(),
type_mapper<DataType>::type(), MPI_SUM, env.communicator());
return result;
}

template <size_t Length>
static inline std::array<dsss::uint40, Length> allreduce_sum(
std::array<dsss::uint40, Length>& send_data,
environment env = environment()) {
std::array<size_t, Length> tmp;
for (size_t i = 0; i < Length; ++i) {
tmp[i] = send_data[i];
}
tmp = allreduce_sum(tmp, env);
std::array<dsss::uint40, Length> result;
for (size_t i = 0; i < Length; ++i) {
result[i] = tmp[i];
}
return result;
}

template <typename DataType>
static inline DataType allreduce_avg(DataType& send_data,
environment env = environment()) {
static_assert(std::is_arithmetic<DataType>(),
"Only arithmetic types are allowed for allreduce_avg.");
DataType receive_data = allreduce_sum(send_data);
return receive_data / env.size();
}

// Sample standard deviation
template <typename DataType>
static inline DataType allreduce_ssd(DataType& send_data,
environment env = environment()) {
static_assert(std::is_arithmetic<DataType>(),
"Only arithmetic types are allowed for allreduce_ssd.");
DataType receive_data = allreduce_avg(send_data);
DataType tmp = std::pow(send_data - receive_data, 2);
receive_data = allreduce_sum(tmp);
return std::sqrt(receive_data / env.size());
}

} // namespace dsss::mpi

/******************************************************************************/
Loading

0 comments on commit 2b33992

Please sign in to comment.