Skip to content

Latest commit

 

History

History
197 lines (173 loc) · 7.22 KB

journal.org

File metadata and controls

197 lines (173 loc) · 7.22 KB

Laboratory Notebook for a Multi-Threaded Version of Quicksort

Project Overview

This project aims at providing an efficient multi-threaded implementation of the QuickSort algorithm on multi-core machines. This document contains my attempts to evaluate the performance of an implementation of such code.

General Organization

src/

This directory comprises the parallel implementation and a standard Makefile to compile it.

data/

This is where raw experimental data should go. Each directory entry comprises a set of experiments and the directory name is based on the machine name and on the date. For example:

echo mkdir data/`hostname`_`date +%F`
mkdir data/sama_2014-10-13

Typical usage

Compilation

A simple makefile with various compilation options is provided in the src/ directory. Compilation is thus done by running the following command:

make -C src/
make: Entering directory '/home/alegrand/Work/Documents/Enseignements/M2R_Eval_Perf_13/M2R-ParallelQuicksort/src'
cc   -g -Wall -Wshadow -Wcast-align -Waggregate-return -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wmissing-noreturn -Wpointer-arith -Wwrite-strings -finline-functions -O0 -pthread -lrt -std=c99  -c -o parallelQuicksort.o parallelQuicksort.c
cc   -g -Wall -Wshadow -Wcast-align -Waggregate-return -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wmissing-noreturn -Wpointer-arith -Wwrite-strings -finline-functions -O0 -pthread -lrt -std=c99  parallelQuicksort.o  -o parallelQuicksort 
make: Leaving directory '/home/alegrand/Work/Documents/Enseignements/M2R_Eval_Perf_13/M2R-ParallelQuicksort/src'

Running the code

The code is quite simple at the moment and can be run in the following way:

./src/parallelQuicksort [1000000]

When run, the code executes initializes an array of the size given in argument (1000000 by default) with random integer values and sorts it using:

  1. a custom sequential implementation;
  2. a custom parallel implementation;
  3. the libc qsort function.

Times are reported in seconds.

Experimental Reports

2014-10-13

Initial code design

  • I obtained an initial implementation from http://sc12.supercomputing.org/hpceducator/PythonForParallelism/codes/parallelQuicksort.c. According to the header, the original author is Joshua Stough from Washington and Lee University. I hope he will not mind that I reuse this piece of code for educational purposes.
  • Here is a typical first execution on my laptop (an Intel(R) Core(TM) i7 running a Debian with Linux 3.14.15):
    ./src/quicksort
        
    Sequential quicksort took: 0.231571 sec.
    at loc 506315, 5.068226e-01 < 5.068269e-01 
    Oops, lyst did not get sorted by parallelQuicksort.
    Parallel quicksort took: 0.161259 sec.
    Built-in qsort took: 0.241568 sec.
        

    Sweet, in my first attempt, it looks like this parallel version is indeed running faster than then sequential one. I have to say this warning message is stressing me a bit though.

  • On smaller instances, the code would segfault. So I reindented the code and thanks to valgrind and gdb, I could find what was wrong. I also renamed the file so that compiling is more convenient. This fixed the previous warning message so now everything seems fine:
    ./src/parallelQuicksort
        
    Sequential quicksort took: 0.239347 sec.
    Parallel quicksort took: 0.176365 sec.
    Built-in quicksort took: 0.244716 sec.
        

First series of experiments

Let’s try to see how the three algorithms behave when changing the array size. Since one measurement is not enough, I run the code 5 times in a row.

OUTPUT_DIRECTORY=data/`hostname`_`date +%F`
mkdir -p $OUTPUT_DIRECTORY
OUTPUT_FILE=$OUTPUT_DIRECTORY/measurements_`date +%R`.txt

touch $OUTPUT_FILE
for i in 100 1000 10000 100000 1000000; do
    for rep in `seq 1 5`; do
        echo "Size: $i" >> $OUTPUT_FILE;
        ./src/parallelQuicksort $i >> $OUTPUT_FILE;
    done ;
done

I obtained the following output.

A simple plot with R

Here is a simple script to parse the results:

use strict;

my($line);
my($size);

print "Size, Type, Time\n" ;
while($line=<>) {
    chomp $line;
    if($line =~/^Size: ([\d\.]*)$/) {
        $size = $1;
        next;
    } 
    if($line =~/^(.*) quicksort.*: ([\d\.]*) sec.$/) {
        print "$size, \"$1\", $2\n" ;
        next;
    } 
}

I can then simply parse my data with the following command:

perl scripts/csv_quicksort_extractor.pl < data/sama_2014-10-13/measurements_03\:47.txt > data/sama_2014-10-13/measurements_03\:47.csv
df <- read.csv("data/sama_2014-10-13/measurements_03:47.csv",header=T)
plot(df$Size,df$Time,col=c("red","blue","green")[df$Type])

data/sama_2014-10-13/measurements_03:47.png

Well, this is not particularly nice and some may not know/like R.

A simple plot with gnuplot

So let’s try to parse in an other way and use gnuplot:

use strict;

my($line);
my($size);
my($seq,$par,$libc);
print "Size, Seq, Par, Libc\n" ;
while($line=<>) {
    chomp $line;
    if($line =~/^Size: ([\d\.]*)$/) {
        $size = $1;
        next;
    } 
    if($line =~/^Sequential quicksort.*: ([\d\.]*) sec.$/) {
        $seq=$1; next;
    } 
    if($line =~/^Parallel quicksort.*: ([\d\.]*) sec.$/) {
        $par=$1; next;
    } 
    if($line =~/^Built-in quicksort.*: ([\d\.]*) sec.$/) {
        $libc=$1; 
        print "$size, $seq, $pqr, $libc\n";
        next;
    }
}
FILENAME="data/sama_2014-10-13/measurements_03:47"
perl scripts/csv_quicksort_extractor2.pl < "$FILENAME.txt" > "${FILENAME}_wide.csv"
echo "
  set terminal png size 600,400 
  set output '${FILENAME}_wide.png'
  set datafile separator ','
  set key autotitle columnhead
  plot '${FILENAME}_wide.csv' using 1:2 with linespoints, '' using 1:3 with linespoints, '' using 1:4 with linespoints
" | gnuplot
echo [[file:${FILENAME}_wide.png]]

data/sama_2014-10-13/measurements_03:47_wide.png

Well, I’m not sure it is nicer but I have lines. A first crude analysis seems to reveal the the parallel version is worth it for arrays larger than 400000.