-
Notifications
You must be signed in to change notification settings - Fork 0
Galois Field math implementation
I might come back and revise this at a later date, but for now I'm going to the Wikiversity article to get a basic understanding of the characteristic 2 Galois Fields and how to construct them as that was what I used.
Instead I'll be discussing implementation details that are different from the norm and why I did it that way.
We choose the primitive polynomial with the lowest Hamming weight (bits set/terms) and degree (not counting the highest term/bit since that must be set by definition). This isn't necessarily something out of the ordinary as most implementations probably do this but there is additional reasoning behind this choice that isn't required by software look up table based implementations and was not mentioned in any examples I saw. Doing this simplifies parallel modular reduction calculations in the case that one actually performs long field multiplication as opposed to a log lookup, add, exponent lookup. It's especially efficient in the case that only bits 0 and 1 are set.
Sometimes referred to as
The polynomials in my implementation consist of multiple symbols worth of terms packed end to end such that there are no unused bits between them. The lowest order terms are in the least significant bits with increasing order terms in increasingly significant bits. In practice this means that if you use printf()
to print a gf8_poly
value in octal (%o
) or a gf16_poly
value in hexadecimal (%llX
), each digit will correspond to a symbol with the right most being the degree 0 aka constant term. This term order means that indexing an individual term is always (poly >> (term_order * symbol_size)) & symbol_mask
. In practice this indexing is always done in terms of symbol size to avoid frequently multiplying by the symbol size.
The benefits of this are that whole Reed-Solomon polynomial blocks for uint32
and uint64
respectively and easily passed on the stack and many of the most frequently used math operations can be applied to the whole block at once. The simplest example of this is that polynomial addition/subtraction is simply a single XOR operation on the whole block for these characteristic 2 fields. Additionally, Reed-Solomon encoding and decoding of arbitrary data is simplified because the regardless of the data being split across several symbols, it's still physically contiguous and doesn't require padding be inserted or removed at either end.
This is where the main benefit of packed polynomials comes into play. In the case of not having special hardware support for carry-less polynomial multiply such as the CLMUL instruction set on x86, then the typical implementation uses log and exponent table look-ups to compute the result of multiplies on a term by term basis. If we have polynomials of length
There are however some minor caveats to this. If the native register size is not enough to fit an entire block, operations must now be split again eating into the improvements and that detractor scales at some fraction of
In terms of what this practically means for Reed-Solomon, we see the most speed benefit for small fields and high numbers of check symbols since typically that is the defining length of what is being multiplied and since I intend to use this for black and white fiducial markers that have occlusion resistance, this is exactly the kind of environment it would excel in.
Division and evaluation is implemented according to the synthetic division algorithm. Since one of the steps of the algorithm involves multiplying the divisor polynomial by a single term, this can be computed using a polynomial scaling.