Edit detail for LibrsyncRollsumAnalysis revision 2 of 8

1 2 3 4 5 6 7 8
Editor: DonovanBaarda
Time: 2016/08/04 23:27:40 GMT+10
Note:


        

There are two ways to think about it; modulus of sums and bit information propagation.

The modulus options

The truncation to 16 bits is doing a mod 0x10000 and chops of information stored in higher bits. Information in bits of the bytes added only propagates up to higher bits, never down into lower bits.

Doing a mod 0xFFFF instead of just truncating to 16 bits is the same as using 1's compliment addition. This does an "end around carry" effect, propagating information that overflows into the higher bits back into the lower bits. This ensures all bits get treated more evenly. This is what Fletcher uses.

Note mod 0xffff can be implemented as (v >> 16) + (v & 0xffff) for repeated summing, followed by "if v>0xffff then v-= 0xffff" at the end. This works because V % N can be implemented by subtracting N until V<N. So V % (N - 1) can be implemented by subtracting N and adding 1 until V<(N-1). The mask and shift sum operation is the same as this. See http://en.wikipedia.org/wiki/Fletcher%27s_checksum#Optimizations for examples. Also the zlib adler32 implementation does something similar for it's mod operation.

Doing a mod of the largest 16bit prime is what adler32 does. This has been shown to just reduce the range of possible values for no additional gains over mod 0xffff.

The +31 to each byte

The +31 is adding 0xf to each byte. On byte by byte basis this is spreading the information in the lower 4 bits up into the higher bits faster. Without an end-around-carry this ensures the information overflows out of the s0 word faster. In in the s0 word of the checksum it is equivalent to adding 31*B to the sum of all the bytes. If the block size B is a power of two, it's equivalent to adding 0xf << N to the sum. If the blocksize is > 2^12 some of those bits overflow out of s0 anyway.

For the s1 word it's equivalent to adding 31*B/2*(B+1) = 31*(B^2+B)/2 to the s1 sum. When B is a power of 2 this is equivalent to 0xf << (2N-1) + 0xf << (N-1). For B > 2^9 the larger term overflows out of the s1 word, so only the lower term counts.

For very small block sizes the +31 does help propagate information into higher order bits faster, which probably helps make the hash appear more evenly balanced when using the sum as a hashtable key, but for even normal sized blocks all it does is ensure the information overflows out of the sums faster.

So the +31 doesn't help, at least when just truncating the sums to 16bits. Initializing s0=1 would be simpler and achieve the same objective of ensuring the length is encoded in the s1 value

Characteristics of s0 sum

A sum of B evenly distributed random bytes will have a normal distribution with a min=0, a max=B*255, a mean=B*127.5 and a variance=B*5461.25. For B=2^10 this gives mean=0x1fe00 and stddev=0x93c. Since the mean is more than 16 bits, bit information has already started overflowing out of the sum. Since 99.7% of possible sums will fall within a contiguous range of +-3*stddev, this means a range 0x3768 or within less that 1/4 of the possible range of values. Also, 68% will fall within +-1*stddev, or a range of only 0x1278, or just over 1/16 of the possible values. Any truncation or modulus of this sum just means the range used wraps around, and any offset added just shifts where the range lies. It's only when B>21843 or 21KB that 99.7% of samples use the full 16bit range, and B>196611 or 192kB till 68% of samples use the whole 16bit range.

Taking a naive modulus of the rollsum as the index into a hash table means for hashtables sized smaller than 2^16 you only use the s0 value, and only the lowest bits. The lowest bits should give good diversity, but for large hash tables and small block sizes you can easily end up only populating a fraction of the hashtable with lots of collisions.

Characteristics of s1 sum

The s1 sum of bytes x1,x2,...,xB is B*x1 + (B-1)*x2 + ... + xB.

This means the second last byte is shifted left 1, the 4th last left 2, the 8th last 3, ... the first left N for block size B=2^N. For a 16bit truncate that means for a 1KB block the high end 2 bits of the first byte have overflowed out of the sum.

For B evenly distributed random bytes, the s1 sum, excluding any offsets, will have min=0, max=255*(B^2+B)/2, mean=127.5*(B^2+B)/2, and variance=???