forked from bingmann/pDCX
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update README with reproducibility instructions
- Loading branch information
Showing
21 changed files
with
3,152 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,62 @@ | ||
# pDCX | ||
# Comparison of Lines of Code using [KaMPIng][kamping], Plain MPI, and [Thrill][thrill] | ||
|
||
This repository contains the code for two suffix array construction algorithms: [Prefix Doubling](https://doi.org/10.1137/0222058) (PD) and [DCX](https://dl.acm.org/doi/10.1145/1217856.1217858). | ||
These algorithms have been implemented using [KaMPIng][kamping] (PD and DCX), Plain MPI (PD and DCX), and [Thrill][thrill] (PD). | ||
|
||
The lines of code (LOC) necessary to implement the two algorithms using the different distributed memory programming frameworks are listed below. | ||
We measured the lines of code using [`cloc`](https://github.com/AlDanial/cloc) 2.00. | ||
All code files have been formatted using [`clang-format`](https://releases.llvm.org/14.0.0/tools/clang/docs/ClangFormat.html) 14 using the default Google style (`clang-format --style=Google`). | ||
|
||
| | [KaMPIng][kamping] | plain MPI | [Thrill][thrill] | | ||
|----|--------------------|-----------|------------------| | ||
| PD | 163 |426 (+1442)| 266 | | ||
| DCX| 1264 |1396 | --- | | ||
|
||
## Prefix Doubling | ||
Note that the prefix doubling implementations are copied from three different projects. | ||
To keep the overhead of this repository small, we removed all code not directly part of the algorithm, e.g., `main` functions and benchmark utility. | ||
Therefore, the code included here is not expected to be executed. | ||
If you want to execute the algorithms, we refer to the corresponding repositories ([KaMPIng implementation](https://github.com/kamping-site/kamping/blob/main/examples/applications/suffix-sorting/prefix_doubling.hpp), [plain MPI implementation](https://github.com/kurpicz/dsss/blob/master/dsss/suffix_sorting/prefix_doubling.hpp), and [Thrill implementation](https://github.com/thrill/thrill/blob/master/examples/suffix_sorting/prefix_doubling.cpp)). | ||
|
||
To reproduce the measured lines of code, you can use the following commands: | ||
|
||
```bash | ||
cd prefix_doubling_comparison/ | ||
echo -e "\e[32mLines of Code: PD KaMPIng Implementation\e[0m" | ||
cloc kamping_prefix_doubling.hpp | ||
echo -e "\e[32mLines of Code: PD Plain MPI Implementation\e[0m" | ||
cloc mpi_prefix_doubling.hpp | ||
echo -e "\e[32mLines of Code: Plain MPI Implementation MPI Wrapper\e[0m" | ||
cloc dsss_mpi | ||
echo -e "\e[32mLines of Code: PD Thrill Implementation\e[0m" | ||
cloc thrill_prefix_doubling.cpp | ||
cd .. | ||
``` | ||
|
||
## DCX | ||
Implementation of the [DCX](https://dl.acm.org/doi/10.1145/1217856.1217858) suffix array construction algorithm. | ||
The original [implementation](src/mpi_dc.cpp) is by Timo Bingmann and consists of 2055 lines of code, when removing code shared with the [KaMPIng][kamping] implementation. | ||
The original [implementation](src/mpi_dc.cpp) is by Timo Bingmann and consists of 1396 lines of code, when removing code shared with the [KaMPIng][kamping] implementation. | ||
In our [KaMPIng][kamping] [implementation](src/kamping_dc.cpp), we replaced all plain MPI calls with our wrapper. | ||
This mainly resulted in less boilerplate code and 1832 lines of code, i.e., 10.85% less code. | ||
This mainly resulted in less boilerplate code and 1264 lines of code, i.e., 9.5% less code. | ||
|
||
To reproduce the measured lines of code, you can use the following commands: | ||
|
||
```bash | ||
cd src/ | ||
echo -e "\e[32mLines of Code: DCX KaMPIng Implementation\e[0m" | ||
cloc kamping_dc.cpp | ||
echo -e "\e[32mLines of Code: Plain MPI Implementation\e[0m" | ||
cloc mpi_dc.cpp | ||
cd .. | ||
``` | ||
|
||
Since this repository is based on these two implementations, they can be build and executed. | ||
|
||
```bash | ||
cmake -B build | ||
cmake --build build | ||
mpirun -np <# MPI threads> ./build/src/[kampingDCX|pDCX] [3/7/13] <input_file> | ||
``` | ||
|
||
[kamping]: https://github.com/kamping-site/kamping "KaMPIng Repository" | ||
[thrill]: https://project-thrill.org "Thrill's website" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
/******************************************************************************* | ||
* mpi/allgather.hpp | ||
* | ||
* Copyright (C) 2018 Florian Kurpicz <florian.kurpicz@tu-dortmund.de> | ||
* | ||
* All rights reserved. Published under the BSD-2 license in the LICENSE file. | ||
******************************************************************************/ | ||
|
||
#pragma once | ||
|
||
#include <mpi.h> | ||
|
||
#include <cstdint> | ||
#include <vector> | ||
|
||
#include "mpi/alltoall.hpp" | ||
#include "mpi/big_type.hpp" | ||
#include "mpi/environment.hpp" | ||
#include "mpi/type_mapper.hpp" | ||
#include "util/string_set.hpp" | ||
|
||
namespace dsss::mpi { | ||
|
||
template <typename DataType> | ||
inline std::vector<DataType> allgather(DataType& send_data, | ||
environment env = environment()) { | ||
data_type_mapper<DataType> dtm; | ||
std::vector<DataType> receive_data(env.size()); | ||
MPI_Allgather(&send_data, 1, dtm.get_mpi_type(), receive_data.data(), 1, | ||
dtm.get_mpi_type(), env.communicator()); | ||
return receive_data; | ||
} | ||
|
||
template <typename DataType> | ||
static inline std::vector<DataType> allgatherv_small( | ||
std::vector<DataType>& send_data, environment env = environment()) { | ||
int32_t local_size = send_data.size(); | ||
std::vector<int32_t> receiving_sizes = allgather(local_size); | ||
|
||
std::vector<int32_t> receiving_offsets(env.size(), 0); | ||
for (size_t i = 1; i < receiving_sizes.size(); ++i) { | ||
receiving_offsets[i] = receiving_offsets[i - 1] + receiving_sizes[i - 1]; | ||
} | ||
|
||
std::vector<DataType> receiving_data(receiving_sizes.back() + | ||
receiving_offsets.back()); | ||
|
||
data_type_mapper<DataType> dtm; | ||
MPI_Allgatherv(send_data.data(), local_size, dtm.get_mpi_type(), | ||
receiving_data.data(), receiving_sizes.data(), | ||
receiving_offsets.data(), dtm.get_mpi_type(), | ||
env.communicator()); | ||
|
||
return receiving_data; | ||
} | ||
|
||
template <typename DataType> | ||
static inline std::vector<DataType> allgatherv( | ||
std::vector<DataType>& send_data, environment env = environment()) { | ||
size_t local_size = send_data.size(); | ||
std::vector<size_t> receiving_sizes = allgather(local_size); | ||
|
||
std::vector<size_t> receiving_offsets(env.size(), 0); | ||
for (size_t i = 1; i < receiving_sizes.size(); ++i) { | ||
receiving_offsets[i] = receiving_offsets[i - 1] + receiving_sizes[i - 1]; | ||
} | ||
|
||
if (receiving_sizes.back() + receiving_offsets.back() < env.mpi_max_int()) { | ||
return allgatherv_small(send_data); | ||
} else { | ||
std::vector<MPI_Request> mpi_requests(2 * env.size()); | ||
std::vector<DataType> receiving_data(receiving_sizes.back() + | ||
receiving_offsets.back()); | ||
|
||
for (int32_t i = 0; i < env.size(); ++i) { | ||
auto receive_type = get_big_type<DataType>(receiving_sizes[i]); | ||
MPI_Irecv(receiving_data.data() + receiving_offsets[i], 1, receive_type, | ||
i, 44227, env.communicator(), &mpi_requests[i]); | ||
} | ||
auto send_type = get_big_type<DataType>(local_size); | ||
for (int32_t i = env.rank(); i < env.rank() + env.size(); ++i) { | ||
int32_t target = i % env.size(); | ||
MPI_Isend(send_data.data(), 1, send_type, target, 44227, | ||
env.communicator(), &mpi_requests[env.size() + target]); | ||
} | ||
MPI_Waitall(2 * env.size(), mpi_requests.data(), MPI_STATUSES_IGNORE); | ||
return receiving_data; | ||
} | ||
} | ||
|
||
static inline dsss::string_set allgather_strings( | ||
std::vector<dsss::char_type>& raw_string_data, | ||
environment env = environment()) { | ||
auto receiving_data = allgatherv(raw_string_data, env); | ||
return dsss::string_set(std::move(receiving_data)); | ||
} | ||
|
||
} // namespace dsss::mpi | ||
|
||
/******************************************************************************/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
/******************************************************************************* | ||
* mpi/allreduce.hpp | ||
* | ||
* Copyright (C) 2018 Florian Kurpicz <florian.kurpicz@tu-dortmund.de> | ||
* | ||
* All rights reserved. Published under the BSD-2 license in the LICENSE file. | ||
******************************************************************************/ | ||
|
||
#pragma once | ||
|
||
#include <array> | ||
#include <cmath> | ||
#include <type_traits> | ||
#include <vector> | ||
|
||
#include "mpi/environment.hpp" | ||
#include "mpi/type_mapper.hpp" | ||
#include "util/uint_types.hpp" | ||
|
||
namespace dsss::mpi { | ||
|
||
static inline bool allreduce_and(bool& send_data, | ||
environment env = environment()) { | ||
bool receive_data; | ||
MPI_Allreduce(&send_data, &receive_data, type_mapper<bool>::factor(), | ||
type_mapper<bool>::type(), MPI_LAND, env.communicator()); | ||
return receive_data; | ||
} | ||
|
||
static inline bool allreduce_or(bool& send_data, | ||
environment env = environment()) { | ||
bool receive_data; | ||
MPI_Allreduce(&send_data, &receive_data, type_mapper<bool>::factor(), | ||
type_mapper<bool>::type(), MPI_LOR, env.communicator()); | ||
return receive_data; | ||
} | ||
|
||
template <typename DataType> | ||
static inline DataType allreduce_max(DataType& send_data, | ||
environment env = environment()) { | ||
static_assert(std::is_arithmetic<DataType>(), | ||
"Only arithmetic types are allowed for allreduce_max."); | ||
DataType receive_data; | ||
MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(), | ||
type_mapper<DataType>::type(), MPI_MAX, env.communicator()); | ||
return receive_data; | ||
} | ||
|
||
template <typename DataType> | ||
static inline DataType allreduce_min(DataType& send_data, | ||
environment env = environment()) { | ||
static_assert(std::is_arithmetic<DataType>(), | ||
"Only arithmetic types are allowed for allreduce_min."); | ||
DataType receive_data; | ||
MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(), | ||
type_mapper<DataType>::type(), MPI_MIN, env.communicator()); | ||
return receive_data; | ||
} | ||
|
||
template <typename DataType> | ||
static inline DataType allreduce_sum(DataType& send_data, | ||
environment env = environment()) { | ||
static_assert(std::is_arithmetic<DataType>(), | ||
"Only arithmetic types are allowed for allreduce_sum."); | ||
DataType receive_data; | ||
MPI_Allreduce(&send_data, &receive_data, type_mapper<DataType>::factor(), | ||
type_mapper<DataType>::type(), MPI_SUM, env.communicator()); | ||
return receive_data; | ||
} | ||
|
||
template <typename DataType> | ||
static inline std::vector<DataType> allreduce_sum( | ||
std::vector<DataType>& send_data, environment env = environment()) { | ||
static_assert(std::is_arithmetic<DataType>(), | ||
"Only arithmetic types are allowed for allreduce_sum."); | ||
std::vector<DataType> result(send_data.size()); | ||
MPI_Allreduce(send_data.data(), result.data(), send_data.size(), | ||
type_mapper<DataType>::type(), MPI_SUM, env.communicator()); | ||
return result; | ||
} | ||
|
||
template <typename DataType, size_t Length> | ||
static inline std::array<DataType, Length> allreduce_sum( | ||
std::array<DataType, Length>& send_data, environment env = environment()) { | ||
static_assert(std::is_arithmetic<DataType>(), | ||
"Only arithmetic types are allowed for allreduce_sum."); | ||
std::array<DataType, Length> result; | ||
MPI_Allreduce(send_data.data(), result.data(), | ||
Length * type_mapper<DataType>::factor(), | ||
type_mapper<DataType>::type(), MPI_SUM, env.communicator()); | ||
return result; | ||
} | ||
|
||
template <size_t Length> | ||
static inline std::array<dsss::uint40, Length> allreduce_sum( | ||
std::array<dsss::uint40, Length>& send_data, | ||
environment env = environment()) { | ||
std::array<size_t, Length> tmp; | ||
for (size_t i = 0; i < Length; ++i) { | ||
tmp[i] = send_data[i]; | ||
} | ||
tmp = allreduce_sum(tmp, env); | ||
std::array<dsss::uint40, Length> result; | ||
for (size_t i = 0; i < Length; ++i) { | ||
result[i] = tmp[i]; | ||
} | ||
return result; | ||
} | ||
|
||
template <typename DataType> | ||
static inline DataType allreduce_avg(DataType& send_data, | ||
environment env = environment()) { | ||
static_assert(std::is_arithmetic<DataType>(), | ||
"Only arithmetic types are allowed for allreduce_avg."); | ||
DataType receive_data = allreduce_sum(send_data); | ||
return receive_data / env.size(); | ||
} | ||
|
||
// Sample standard deviation | ||
template <typename DataType> | ||
static inline DataType allreduce_ssd(DataType& send_data, | ||
environment env = environment()) { | ||
static_assert(std::is_arithmetic<DataType>(), | ||
"Only arithmetic types are allowed for allreduce_ssd."); | ||
DataType receive_data = allreduce_avg(send_data); | ||
DataType tmp = std::pow(send_data - receive_data, 2); | ||
receive_data = allreduce_sum(tmp); | ||
return std::sqrt(receive_data / env.size()); | ||
} | ||
|
||
} // namespace dsss::mpi | ||
|
||
/******************************************************************************/ |
Oops, something went wrong.