Parallel Processing Systems Lab

Authors
Dimitra Leventi (@dileventi)
Dimitrios Mitropoulos (@dimitrismit)
Apostolis Stamatis (@apostolis1)

Overview

We conducted benchmarks using different configurations of resources (nodes, processors, processors per node), tasks (where applicable) and input sizes to determine scalability and bottlenecks

The results of the benchmarks and their in depth analysis can be found in the report

The report also contains critical parts of the source code

Lab 1 - Conway's Game of Life using OpenMP

Given a serial algorithm of Conway's Game of Life:

Detect the parallelization possibilities
Implement a solution using OpenMP in a shared address space architecture
Perform benchmarks

Lab 2 - Parallelization and optimization on shared memory architectures

Given a serial K-means algorithm:

Add the necessary synchronization commands when accessing shared resources, so the algorithmm can be run on a parallel system
Improve algorithm of (1) by creating local data structures to avoid synchronization using reduction
Perform benchmarks

Lab 3 - Locks and mutex on shared memory architectures

Benchmark different lock implementations for parallel systems, compare and interpret the results

pthread_mutex_t lock from the Pthreads library
pthread_spinlock_t lock from the Pthreads library
test-and-set lock
test-and-test-and-set lock
array based lock
linked list lock from chapter 7 of "The Art of Multiprocessor Programming"

Implementation of the Floyd-Warshall algorihtm using parallel tasks, understanding the limitations of parallel for

Lab 4 - Concurrent data structures

Benchmark the following implementations of a concurrent double linked list:

Coarse-grain locking
Fine-grain locking
Optimistic synchronization
Lazy synchronization
Non-blocking synchronization

Lab 5 - Parallelization and optimization on NVIDIA GPUs using CUDA

Different implementations and optimizations of the K-means algorithm

Naïve version: Nearest clusters calculation is offloaded to the GPU
Transpose version: Implement column-based indexing for the arrays (instead of row-based which is used in the naïve version)
Shared version: Move the frequently accessed clusters array to the shared GPU memory

Lab 6 - Parallelization and optimization on distributed memory architectures

Given the serial versions of Jacobi and Gauss-Seidel kernels for the ... problem:

Identify parallelism possiblities on Jacobi and Gauss-Seidel kernels
Design and implement a solution for a distributed memory arhcitecture using message passing with MPI
Perform benchmarks

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
lab1		lab1
lab2		lab2
lab3		lab3
lab4		lab4
lab5/kmeans		lab5/kmeans
lab6/a4		lab6/a4
README.md		README.md
assignment.pdf		assignment.pdf
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Processing Systems Lab

Overview

Lab 1 - Conway's Game of Life using OpenMP

Lab 2 - Parallelization and optimization on shared memory architectures

Lab 3 - Locks and mutex on shared memory architectures

Lab 4 - Concurrent data structures

Lab 5 - Parallelization and optimization on NVIDIA GPUs using CUDA

Lab 6 - Parallelization and optimization on distributed memory architectures

About

Releases

Packages

Languages

apostolis1/Parallel-Processing-Systems

Folders and files

Latest commit

History

Repository files navigation

Parallel Processing Systems Lab

Overview

Lab 1 - Conway's Game of Life using OpenMP

Lab 2 - Parallelization and optimization on shared memory architectures

Lab 3 - Locks and mutex on shared memory architectures

Lab 4 - Concurrent data structures

Lab 5 - Parallelization and optimization on NVIDIA GPUs using CUDA

Lab 6 - Parallelization and optimization on distributed memory architectures

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages