This repository implements scrambling linear feedback shift registers in C++. There are three implementations:
- Fibonacci (bit at a time)
- Fibonacci (byte at a time)
- Galois (bit at a time)
They are templated via the LFSRs taps. The reason for using templates is to give the compiler every opportunity to optimise the various bit-level operations that are required, i.e., shifts, XORs, ANDs etc.
There are three executables included:
-
test_lfsr.cpp
: This tests instantiations of every polynomial listed on the wikipedia page for the following:- That each LFSR is maximal length by continually clocking in zeroes and
testing that the state resets as expected every
$2^{degree} - 1$ iterations; - That the scrambled output does not equal it's input at any stage. Note that this is not a guarantee that this won't happen; I believe it is possible for a pathological series of inputs to cause the output byte to equal the input
- And finally, that the output of a scrambled and then descrambled byte equals the input byte.
- That each LFSR is maximal length by continually clocking in zeroes and
testing that the state resets as expected every
-
bench_lfsr.cpp
: Using Google Benchmark, benchmark the following for each type ofLFSR:- Individual scramble byte operations
- Scrambling a range of values
-
tune_lfsr.cpp
: Contains and runs an instantiation of each LFSR type, writing the output to a file. This is mainly for profiling using Intel VTune orcallgrind
/kcachegrind
as doing so on either the benchmarks or tests produces a lot of call stack noise.
Both the bit at a time Galois and Fibonacci LFSRs are written in the usual way, save for the template madness.
- Fibonacci's input is on the right (LSB) of the shift register, and shifted to the left;
- Galois' input is on the left (MSB) of the shift register and shifted to the right.
The "bulk" Fibonacci is written differently, in a manner I haven't personally seen done elsewhere, but is probably obvious to folks who have dealt with this sort of thing before, particularly those who are into their mathematics.
- Since the scramble calculation involves XORing the input with the bits in the
shift register at the various tap positions, this can instead be done by:
- including the input bits into the register, as opposed to separately, as per literature;
- Getting the "population count" (the number of bits set) of the
combined register state
AND
a mask representing the tap positions. This result$mod2$ is equivalent to XORing them all together as one would do.
- Where an input bit has a dependency on another input bit (in the input byte), the dependencies can be merged into a new dependency (tap) mask, making each operation independent.
As an illustrative example, we can take relatively small LFSR with taps at
5 & 6 (being fancy, this would be polynomial
- The
$In_{n}$ columns represent the part of the shift register dedicated to input. - The
$B_{n}$ columns represent the part of the shift register that keeps state.
Each row represents a mask of taps that can be AND
ed with the shift register
to accumulate the bit values at the respective tap positions.
Naively, we might think we could do the XOR + popcount by sliding the tap mask
back 1 bit for each bit, however the bits at
Note: Since the input is being shifted left to right (MSB to LSB) in a
std::bitset
, the tap indexes are reversed and 0-indexed. Therefore the tap indexes for this polynomial become:
Tap Index $x^6$ 0
$x^5$ 1
$1$ 6
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
However, this can be solved by accumulating the dependencies for each dependee in the input, XORing them together, producing a set of mask that look like this:
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
||||||||||||||
Taps for bit |
0 |
Note that the zeroes occur in the updated matrix when a dependee occurs
Performance was benchmarked using instantiations for the following polynomial:
According to Google Benchmark, there are substantially different results across MSVC and GCC. On Linux, GCC produces significantly faster bit-by-bit LFSRs than MSVC, with the Galois version being the fastest, and the bulk Fibonacci disappointingly sitting somewhere in the middle. On Windows, however, the bulk Fibonacci version blows everybody out of the water with a figure that is faster than any of the Linux implementations. The disappointment here lies in the bit-by-bit versions, which are considerably slower than their Linux counterparts.
For this polynomial, we could say (given my machine configuration), that on Linux we would want the Galois implementation, and on Windows the bulk Fibonacci.
CPU Caches:
L1 Data 48 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 2048 KiB (x16)
L3 Unified 36864 KiB (x1)
Load Average: 0.25, 0.17, 0.11
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
LFSR_FeedthroughGalois 6.32 ns 6.32 ns 109504352
LFSR_FeedthroughFibonacci 13.1 ns 13.1 ns 53089466
LFSR_FeedthroughFibonacciBulk 10.4 ns 10.4 ns 66044340
LFSR_FeedthroughGalois_Range 815 ns 815 ns 845428
LFSR_FeedthroughFibonacci_Range 1808 ns 1808 ns 389237
LFSR_FeedthroughFibonacciBulk_Range 1404 ns 1378 ns 506681
CPU Caches:
L1 Data 48 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 2048 KiB (x16)
L3 Unified 36864 KiB (x1)
------------------------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------------------------
LFSR_FeedthroughGalois 27.5 ns 13.8 ns 49777778
LFSR_FeedthroughFibonacci 24.5 ns 12.7 ns 64000000
LFSR_FeedthroughFibonacciBulk 5.37 ns 2.25 ns 298666667
LFSR_FeedthroughGalois_Range 3831 ns 1967 ns 373333
LFSR_FeedthroughFibonacci_Range 3199 ns 1549 ns 494345
LFSR_FeedthroughFibonacciBulk_Range 745 ns 296 ns 2007040