Skip to content

L3‐Performance Benchmarking

gregthelaw edited this page Jun 3, 2024 · 7 revisions

Client-Server Performance

A very simple client-server program has been developed which forms the basis for performance benchmarking of overheads due to L3-logging.

This document explains how to use the ./test.sh scripts to exercise these micro-benchmarks.

Makefile

These programs can be built to compile conditionally-defined code which is enabled using environment-variables.

make help will report this section to build the client/server programs to exercise each kind of logging.

To build client-server performance test programs and run performance test(s)
 make clean && CC=gcc LD=g++ L3_ENABLED=0         make client-server-perf-test  # Baseline
 make clean && CC=gcc LD=g++ L3_LOGT_FPRINTF=1    make client-server-perf-test  # fprintf() logging
 make clean && CC=gcc LD=g++ L3_LOGT_WRITE=1      make client-server-perf-test  # write() logging
 make clean && CC=gcc LD=g++                      make client-server-perf-test  # L3-logging
 make clean && CC=gcc LD=g++ L3_FASTLOG_ENABLED=1 make client-server-perf-test  # L3 Fast logging
 make clean && CC=gcc LD=g++ L3_LOC_ENABLED=1     make client-server-perf-test  # L3+LOC logging
 make clean && CC=gcc LD=g++ L3_LOC_ENABLED=2     make client-server-perf-test  # L3+LOC-ELF logging

To manually run the pair of client/server programs, do:

  • Build the programs:
make clean && CC=gcc LD=g++ L3_ENABLED=0 make client-server-perf-test
  • Start the server: ./build/release/bin/use-cases/svmsg_file_server

  • Run one or more instances of the client program: ./build/release/bin/use-cases/svmsg_file_client

These steps are packaged in test-methods in the test.sh script, which can be used as described below.

test.sh

This driver script supports different test-methods to build-and-run the client/server performance benchmark with different test-parameters, such as clock-ID to use, number of clients, number of messages sent by each client, the logging mode etc.

agurajada-Linux-Vm:[3990] $ ./test.sh --list
test.sh: List of builds and test cases to execute:
[...]
  run-all-client-server-perf-tests
  test-build-and-run-client-server-perf-test
  test-build-and-run-client-server-perf-test-l3_loc_eq_1
  test-build-and-run-client-server-perf-test-l3_loc_eq_2

See the output from ./test.sh --help for more detailed usage information of performance test-methods.

The test-method run-all-client-server-perf-tests will build the client/server programs in all supported logging-modes and will execute the workload. Default configuration is 5-clients sending 1000 messages to the server.

You can customize this configuration as follows:

$ ./test.sh run-all-client-server-perf-tests [ server-clock-ID [ num-msgs [ num-clients ] ] ]

Specify the string --clock-default for the server-clock-ID argument to use the default clock-ID.

A typical measurement would be to run 5-concurrent clients, sending 1 Million messages each, as:

$ ./test.sh run-all-client-server-perf-tests --clock-default $((1000 * 1000))

Single-client measurements can be run as follows:

$ ./test.sh run-all-client-server-perf-tests --clock-default $((1000 * 1000)) 1

Sample execution steps

A typical execution would be something like the following:

Execute the workload:

./test.sh run-all-client-server-perf-tests --clock-default $((1000 * 1000)) > /tmp/perf-test.1Mil-rows.5ct.1.out  2>&1

Post-process the outputs looking for aggregated throughput metrics:

egrep -B1 -E 'Start Server|, num_ops=' /tmp/perf-test.1Mil-rows.5ct.1.out

There are many other metrics gathered while the micro-benchmarks are running, but the above aggregated throughput metric is a good starting point for comparing logging overheads across different configurations.

Sample micro-benchmark results

Here is a comparison of the performance metrics obtained through this exercise:

  • Hardware. : Run on a Ubuntu 22.04.4 LTS Linux-VM (on a Mac), 8 CPUs, 15 GB, Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  • Worklod. : 5-clients, sending 1 Million messages each to a server
  • Parameter : The default CLOCK_REALTIME clock v/s the CLOCK_PROCESS_CPUTIME_ID clock
  • Metric : We compare the throughput metric (num-ops/sec) processed on the server-side.

  • Run all client-server performance workloads using the default clock-ID (CLOCK_REALTIME):
$ ./test.sh run-all-client-server-perf-tests --clock-default $((1000 * 1000)) > ~/tmp/run-all-perf.defaults.1Mil.5ct.out 2>&1
  • Throughput metrics using the CLOCK_REALTIME clock:
agurajada-Linux-Vm:[112] $ grep "msg, throughput=" ~/tmp/run-all-perf.defaults.1Mil.5ct.out

For 5 clients, No logging, num_ops=5000000 (5 Million) ops, Cumulative time=147097405 (~147.09 Million) ns, Avg. Elapsed real time=29 ns/msg, throughput=33991082 (~33.99 Million) ops/sec
For 5 clients, L3-logging (no LOC), num_ops=5000000 (5 Million) ops, Cumulative time=233036124 (~233.03 Million) ns, Avg. Elapsed real time=46 ns/msg, throughput=21455900 (~21.45 Million) ops/sec
For 5 clients, L3-fast logging (no LOC), num_ops=5000000 (5 Million) ops, Cumulative time=266615681 (~266.61 Million) ns, Avg. Elapsed real time=53 ns/msg, throughput=18753585 (~18.75 Million) ops/sec
For 5 clients, L3-logging default LOC, num_ops=5000000 (5 Million) ops, Cumulative time=243770463 (~243.77 Million) ns, Avg. Elapsed real time=48 ns/msg, throughput=20511098 (~20.51 Million) ops/sec
For 5 clients, L3-logging LOC-ELF, num_ops=5000000 (5 Million) ops, Cumulative time=276021671 (~276.02 Million) ns, Avg. Elapsed real time=55 ns/msg, throughput=18114519 (~18.11 Million) ops/sec
  • Run all client-server performance workloads using the CLOCK_PROCESS_CPUTIME_ID clock test-configuration:
$ ./test.sh run-all-client-server-perf-tests --clock-process-cputime-id $((1000 * 1000)) > ~/tmp/run-all-perf.process-CPU-time.1Mil.5ct.out 2>&1
  • Throughput metrics using the CLOCK_PROCESS_CPUTIME_ID clock:
agurajada-Linux-Vm:[113] $ grep "msg, throughput=" ~/tmp/run-all-perf.process-CPU-time.1Mil.5ct.out

For 5 clients, No logging, num_ops=5000000 (5 Million) ops, Cumulative time=1675916269 (~1.67 Billion) ns, Avg. Process-CPU time=335 ns/msg, throughput=2983442 (~2.98 Million) ops/sec
For 5 clients, L3-logging (no LOC), num_ops=5000000 (5 Million) ops, Cumulative time=1760319156 (~1.76 Billion) ns, Avg. Process-CPU time=352 ns/msg, throughput=2840394 (~2.84 Million) ops/sec
For 5 clients, L3-fast logging (no LOC), num_ops=5000000 (5 Million) ops, Cumulative time=1723637277 (~1.72 Billion) ns, Avg. Process-CPU time=344 ns/msg, throughput=2900842 (~2.90 Million) ops/sec
For 5 clients, L3-logging default LOC, num_ops=5000000 (5 Million) ops, Cumulative time=1669255578 (~1.66 Billion) ns, Avg. Process-CPU time=333 ns/msg, throughput=2995347 (~2.99 Million) ops/sec
For 5 clients, L3-logging LOC-ELF, num_ops=5000000 (5 Million) ops, Cumulative time=1864252472 (~1.86 Billion) ns, Avg. Process-CPU time=372 ns/msg, throughput=2682040 (~2.68 Million) ops/sec
  • Throughput metrics using the CLOCK_REALTIME on native Linux
    **** Performance comparison for NumClients=5, NumOps=5000000 (5 Million) ****
+-------------------------------+-------------------+----------+-------------------+----------+
| Run-Type                      | Server throughput | Srv:Drop | Client throughput | Cli:Drop |
+-------------------------------+-------------------+----------+-------------------+----------+
| Baseline - No logging         | ~499.13 K ops/sec |  0.00 %  | ~201.79 K ops/sec |  0.00 %  |
| L3-logging (no LOC)           | ~499.53 K ops/sec |  0.08 %  | ~200.88 K ops/sec | -0.45 %  |
| L3-fast logging (no LOC)      | ~481.35 K ops/sec | -3.56 %  | ~187.03 K ops/sec | -7.32 %  |
| L3-fprintf() logging (no LOC) | ~481.41 K ops/sec | -3.55 %  | ~186.47 K ops/sec | -7.59 %  |
| L3-write() logging (no LOC)   | ~331.74 K ops/sec | -33.54 % |  ~99.39 K ops/sec | -50.74 % |
| L3-logging default LOC        | ~497.39 K ops/sec | -0.35 %  | ~199.09 K ops/sec | -1.34 %  |
| L3-logging LOC-ELF            | ~496.46 K ops/sec | -0.53 %  | ~198.98 K ops/sec | -1.40 %  |
| spdlog-logging                | ~436.88 K ops/sec | -12.47 % | ~155.58 K ops/sec | -22.90 % |
| spdlog-backtrace-logging      | ~454.83 K ops/sec | -8.88 %  | ~168.04 K ops/sec | -16.73 % |
+-------------------------------+-------------------+----------+-------------------+----------+