-
Notifications
You must be signed in to change notification settings - Fork 7
L3‐Performance Benchmarking
A very simple client-server program has been developed which forms the basis for performance benchmarking of overheads due to L3-logging.
This document explains how to use the ./test.sh
scripts to exercise these micro-benchmarks.
These programs can be built to compile conditionally-defined code which is enabled using environment-variables.
make help
will report this section to build the client/server programs to exercise each kind of logging.
To build client-server performance test programs and run performance test(s)
make clean && CC=gcc LD=g++ L3_ENABLED=0 make client-server-perf-test # Baseline
make clean && CC=gcc LD=g++ L3_LOGT_FPRINTF=1 make client-server-perf-test # fprintf() logging
make clean && CC=gcc LD=g++ L3_LOGT_WRITE=1 make client-server-perf-test # write() logging
make clean && CC=gcc LD=g++ make client-server-perf-test # L3-logging
make clean && CC=gcc LD=g++ L3_FASTLOG_ENABLED=1 make client-server-perf-test # L3 Fast logging
make clean && CC=gcc LD=g++ L3_LOC_ENABLED=1 make client-server-perf-test # L3+LOC logging
make clean && CC=gcc LD=g++ L3_LOC_ENABLED=2 make client-server-perf-test # L3+LOC-ELF logging
To manually run the pair of client/server programs, do:
- Build the programs:
make clean && CC=gcc LD=g++ L3_ENABLED=0 make client-server-perf-test
-
Start the server:
./build/release/bin/use-cases/svmsg_file_server
-
Run one or more instances of the client program:
./build/release/bin/use-cases/svmsg_file_client
These steps are packaged in test-methods in the test.sh
script, which can be used as described below.
This driver script supports different test-methods to build-and-run the client/server performance benchmark with different test-parameters, such as clock-ID to use, number of clients, number of messages sent by each client, the logging mode etc.
agurajada-Linux-Vm:[3990] $ ./test.sh --list
test.sh: List of builds and test cases to execute:
[...]
run-all-client-server-perf-tests
test-build-and-run-client-server-perf-test
test-build-and-run-client-server-perf-test-l3_loc_eq_1
test-build-and-run-client-server-perf-test-l3_loc_eq_2
See the output from ./test.sh --help
for more detailed usage information of performance test-methods.
The test-method run-all-client-server-perf-tests
will build the client/server programs in all supported logging-modes and will execute the workload. Default configuration is 5-clients sending 1000 messages to the server.
You can customize this configuration as follows:
$ ./test.sh run-all-client-server-perf-tests [ server-clock-ID [ num-msgs [ num-clients ] ] ]
Specify the string --clock-default
for the server-clock-ID
argument to use the default clock-ID.
A typical measurement would be to run 5-concurrent clients, sending 1 Million messages each, as:
$ ./test.sh run-all-client-server-perf-tests --clock-default $((1000 * 1000))
Single-client measurements can be run as follows:
$ ./test.sh run-all-client-server-perf-tests --clock-default $((1000 * 1000)) 1
A typical execution would be something like the following:
Execute the workload:
./test.sh run-all-client-server-perf-tests --clock-default $((1000 * 1000)) > /tmp/perf-test.1Mil-rows.5ct.1.out 2>&1
Post-process the outputs looking for aggregated throughput metrics:
egrep -B1 -E 'Start Server|, num_ops=' /tmp/perf-test.1Mil-rows.5ct.1.out
There are many other metrics gathered while the micro-benchmarks are running, but the above aggregated throughput metric is a good starting point for comparing logging overheads across different configurations.
Here is a comparison of the performance metrics obtained through this exercise:
- Hardware. : Run on a Ubuntu 22.04.4 LTS Linux-VM (on a Mac), 8 CPUs, 15 GB, Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
- Worklod. : 5-clients, sending 1 Million messages each to a server
-
Parameter : The default
CLOCK_REALTIME
clock v/s theCLOCK_PROCESS_CPUTIME_ID
clock - Metric : We compare the throughput metric (num-ops/sec) processed on the server-side.
- Run all client-server performance workloads using the default clock-ID (CLOCK_REALTIME):
$ ./test.sh run-all-client-server-perf-tests --clock-default $((1000 * 1000)) > ~/tmp/run-all-perf.defaults.1Mil.5ct.out 2>&1
- Throughput metrics using the CLOCK_REALTIME clock:
agurajada-Linux-Vm:[112] $ grep "msg, throughput=" ~/tmp/run-all-perf.defaults.1Mil.5ct.out
For 5 clients, No logging, num_ops=5000000 (5 Million) ops, Cumulative time=147097405 (~147.09 Million) ns, Avg. Elapsed real time=29 ns/msg, throughput=33991082 (~33.99 Million) ops/sec
For 5 clients, L3-logging (no LOC), num_ops=5000000 (5 Million) ops, Cumulative time=233036124 (~233.03 Million) ns, Avg. Elapsed real time=46 ns/msg, throughput=21455900 (~21.45 Million) ops/sec
For 5 clients, L3-fast logging (no LOC), num_ops=5000000 (5 Million) ops, Cumulative time=266615681 (~266.61 Million) ns, Avg. Elapsed real time=53 ns/msg, throughput=18753585 (~18.75 Million) ops/sec
For 5 clients, L3-logging default LOC, num_ops=5000000 (5 Million) ops, Cumulative time=243770463 (~243.77 Million) ns, Avg. Elapsed real time=48 ns/msg, throughput=20511098 (~20.51 Million) ops/sec
For 5 clients, L3-logging LOC-ELF, num_ops=5000000 (5 Million) ops, Cumulative time=276021671 (~276.02 Million) ns, Avg. Elapsed real time=55 ns/msg, throughput=18114519 (~18.11 Million) ops/sec
- Run all client-server performance workloads using the CLOCK_PROCESS_CPUTIME_ID clock test-configuration:
$ ./test.sh run-all-client-server-perf-tests --clock-process-cputime-id $((1000 * 1000)) > ~/tmp/run-all-perf.process-CPU-time.1Mil.5ct.out 2>&1
- Throughput metrics using the CLOCK_PROCESS_CPUTIME_ID clock:
agurajada-Linux-Vm:[113] $ grep "msg, throughput=" ~/tmp/run-all-perf.process-CPU-time.1Mil.5ct.out
For 5 clients, No logging, num_ops=5000000 (5 Million) ops, Cumulative time=1675916269 (~1.67 Billion) ns, Avg. Process-CPU time=335 ns/msg, throughput=2983442 (~2.98 Million) ops/sec
For 5 clients, L3-logging (no LOC), num_ops=5000000 (5 Million) ops, Cumulative time=1760319156 (~1.76 Billion) ns, Avg. Process-CPU time=352 ns/msg, throughput=2840394 (~2.84 Million) ops/sec
For 5 clients, L3-fast logging (no LOC), num_ops=5000000 (5 Million) ops, Cumulative time=1723637277 (~1.72 Billion) ns, Avg. Process-CPU time=344 ns/msg, throughput=2900842 (~2.90 Million) ops/sec
For 5 clients, L3-logging default LOC, num_ops=5000000 (5 Million) ops, Cumulative time=1669255578 (~1.66 Billion) ns, Avg. Process-CPU time=333 ns/msg, throughput=2995347 (~2.99 Million) ops/sec
For 5 clients, L3-logging LOC-ELF, num_ops=5000000 (5 Million) ops, Cumulative time=1864252472 (~1.86 Billion) ns, Avg. Process-CPU time=372 ns/msg, throughput=2682040 (~2.68 Million) ops/sec
- Throughput metrics using the
CLOCK_REALTIME
on native Linux
**** Performance comparison for NumClients=5, NumOps=5000000 (5 Million) ****
+-------------------------------+-------------------+----------+-------------------+----------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop |
+-------------------------------+-------------------+----------+-------------------+----------+
| Baseline - No logging | ~499.13 K ops/sec | 0.00 % | ~201.79 K ops/sec | 0.00 % |
| L3-logging (no LOC) | ~499.53 K ops/sec | 0.08 % | ~200.88 K ops/sec | -0.45 % |
| L3-fast logging (no LOC) | ~481.35 K ops/sec | -3.56 % | ~187.03 K ops/sec | -7.32 % |
| L3-fprintf() logging (no LOC) | ~481.41 K ops/sec | -3.55 % | ~186.47 K ops/sec | -7.59 % |
| L3-write() logging (no LOC) | ~331.74 K ops/sec | -33.54 % | ~99.39 K ops/sec | -50.74 % |
| L3-logging default LOC | ~497.39 K ops/sec | -0.35 % | ~199.09 K ops/sec | -1.34 % |
| L3-logging LOC-ELF | ~496.46 K ops/sec | -0.53 % | ~198.98 K ops/sec | -1.40 % |
| spdlog-logging | ~436.88 K ops/sec | -12.47 % | ~155.58 K ops/sec | -22.90 % |
| spdlog-backtrace-logging | ~454.83 K ops/sec | -8.88 % | ~168.04 K ops/sec | -16.73 % |
+-------------------------------+-------------------+----------+-------------------+----------+