1 2 3 4 5 6 7 8 | ||
Editor: DonovanBaarda
Time: 2016/08/05 15:17:48 GMT+10 |
||
Note: |
changed: -The +31 is adding 0xf to each byte. On a byte by byte basis this is spreading the information in the lower 4 bits up into the higher bits faster. Without an end-around-carry this ensures the information overflows out of the s0 word faster. -In in the s0 word of the checksum it is equivalent to adding 31*B to the sum of all the bytes. If the block size is a power of two, it's equivalent to adding 0xf << N to the sum. If the blocksize is > 2^12 some of those bits overflow out of s0 anyway. - -For the s1 word it's equivalent to adding 31*B/2*(B+1) = 31*(B^2+B)/2 to the s1 sum. When B is a power of 2 this is equivalent to 0xf << (2N-1) + 0xf << (N-1). For B > 2^9 the larger term overflows out of the s1 word, so only the lower term counts. The +31 is adding 0xf to each byte. On a byte by byte basis this is spreading the information in the lower 4 bits up into the higher bits faster. Without an end-around-carry this ensures the information overflows out of the s1 word faster. In in the s1 word of the checksum it is equivalent to adding 31*B to the sum of all the bytes. If the block size is a power of two, it's equivalent to adding 0xf << N to the sum. If the blocksize is > 2^12 some of those bits overflow out of s1 anyway. For the s2 word it's equivalent to adding 31*B/2*(B+1) = 31*(B^2+B)/2 to the s2 sum. When B is a power of 2 this is equivalent to 0xf << (2N-1) + 0xf << (N-1). For B > 2^9 the larger term overflows out of the s2 word, so only the lower term counts. changed: -So the +31 doesn't help. When truncating the sums to 16bits it overflows bit information out of the s0 and s1 sums faster. When doing mod 0xffff then information doesn't overflow out of the sums, but it doesn't make the sums any better. - -Initializing s0=1 would be simpler and achieve the same objective of ensuring the length is encoded in the s1 value. This is what adler32 does. So the +31 doesn't help. When truncating the sums to 16bits it overflows bit information out of the s1 and s2 sums faster. When doing mod 0xffff then information doesn't overflow out of the sums, but it doesn't make the sums any better. Initializing s1=1 would be simpler and achieve the same objective of ensuring the length is encoded in the s2 value. This is what adler32 does. changed: -Characteristics of s0 sum Characteristics of s1 sum changed: -The s0 sum of bytes x1,x2,...,xB is just::: - - s0 = x1 + x2 + ... + xB The s1 sum of bytes x1,x2,...,xB is just::: s1 = x1 + x2 + ... + xB changed: -For a blocksize of 1kB s0 is sum of 2^10 evenly distributed random bytes giving max=0x3fc00, mean=0x1fe00 and stddev=0x93c. Since the mean and max are more than 16 bits, bit information has already started overflowing out of a truncated sum. - -Since 99.7% of possible sums will fall within a contiguous range of +-3*stddev, this means a range of 0x3768, or within less than 1/4 of the possible range of 16bit values. Also, 68% will fall within +-1*stddev, or a range of only 0x1278, or just over 1/16 of the possible values. Any truncation or modulus of this sum just means the common range wraps around, and any offset added just shifts where the range starts. It's only when B>21843 or 21KB that 99.7% of samples use the full 16bit range, and B>196611 or 192kB till 68% of samples use the whole 16bit range. - -So the s0 sum will have a poor hash distribution for normal block sizes. It's only for block sizes >21kB that it starts to look OK, and only for block sizes >192kB that it is pretty good. - -Characteristics of s1 sum For a blocksize of 1kB s1 is sum of 2^10 evenly distributed random bytes giving max=0x3fc00, mean=0x1fe00 and stddev=0x93c. Since the mean and max are more than 16 bits, bit information has already started overflowing out of a truncated sum. Since 99.7% of possible sums will fall within a contiguous range of +-3*stddev, this means a range of 0x3768, or within less than 1/4 of the possible range of 16bit values, or only the bottom 14bits. Also, 68% will fall within +-1*stddev, or a range of only 0x1278, or just over 1/16 of the possible values, or the bottom 12bits. Any truncation or modulus of this sum just means the common range wraps around, and any offset added just shifts where the range starts. It's only when B>21843 or 21KB that 99.7% of samples use the full 16bit range, and B>196611 or 192kB till 68% of samples use the whole 16bit range. So the s1 sum will have a poor hash distribution for normal block sizes. It's only for block sizes >21kB that it starts to look OK, and only for block sizes >192kB that it is pretty good. Characteristics of s2 sum changed: -The s1 sum of bytes x1,x2,...,xB is::: - - s1 = B*x1 + (B-1)*x2 + ... + xB - -This means the second last byte is shifted left 1, the 4th last left 2, the 8th last 3, ... the first left N for block size B=2^N. After truncating to 16bits that means for a 1KB block the high end 2 bits of the first byte have overflowed out of the sum. Larger block sizes means even more of the first byte will have overflowed out, and for a 64KB block size there is no remnant of the first byte left in s1. The s2 sum of bytes x1,x2,...,xB is::: s2 = B*x1 + (B-1)*x2 + ... + xB This means the second last byte is shifted left 1, the 4th last left 2, the 8th last 3, ... the first left N for block size B=2^N. After truncating to 16bits that means for a 1KB block the high end 2 bits of the first byte have overflowed out of the sum. Larger block sizes means even more of the first byte will have overflowed out, and for a 64KB block size there is no remnant of the first byte left in s2. changed: -For a blocksize of 1kB of 2^10 evenly distributed random bytes, s1 has max=0x7f60200, mean=0x3fb0100, stddev=0x15594a. Since the mean is 28bits, quite a lot of bit information has overflowed out of a truncated 16bit sum. Since the stddev is also much more than 16bits the truncated 16bit sums should be well distributed over the whole 16bit range. - -So the s1 sum should have a pretty good hash distribution for even small block sizes. For a blocksize of 1kB of 2^10 evenly distributed random bytes, s2 has max=0x7f60200, mean=0x3fb0100, stddev=0x15594a. Since the mean is 28bits, quite a lot of bit information has overflowed out of a truncated 16bit sum. Since the stddev is also much more than 16bits the truncated 16bit sums should be well distributed over the whole 16bit range. So the s2 sum should have a pretty good hash distribution for even small block sizes. changed: -The overall 32bit rollsum is rollsum = (s1 & 0xffff) << 16 + (s0 & 0xffff), so s1 is the upper 16bits, and s0 is the lower 16 bits. - -Taking a naive modulus of the rollsum as the index into a hash table means for hashtables sized smaller than 2^16 you only use the s0 value, and only the lowest bits. The lowest bits should give good diversity, but since s0 only uses a fraction of the 16bit range the higher bits will not be well distributed. So for large hash tables and small block sizes you can easily end up only populating a fraction of the hashtable with lots of collisions. - -Using a hashtable sizes of (2^N-1) and using a simple mod (2^N-1) of the combined 32bit rollsum will mix s1 into s0. Since s1 will have a good distribution, this should give a pretty good hash distribution and minimize collisions. This should be as good as any sort of integer hashing algorithm applied to the rollsum, and much simpler. The overall 32bit rollsum is rollsum = (s2 & 0xffff) << 16 + (s1 & 0xffff), so s2 is the upper 16bits, and s1 is the lower 16 bits. Taking a naive modulus of the rollsum as the index into a hash table means for hashtables sized smaller than 2^16 you only use the s1 value, and only the lowest bits. The lowest bits should give good diversity, but since s1 only uses a fraction of the 16bit range the higher bits will not be well distributed. So for large hash tables and small block sizes you can easily end up only populating a fraction of the hashtable with lots of collisions. Using a hashtable sizes of (2^N-1) and using a simple mod (2^N-1) of the combined 32bit rollsum will mix s2 into s1. Since s2 will have a good distribution, this should give a pretty good hash distribution and minimize collisions. This should be as good as any sort of integer hashing algorithm applied to the rollsum, and much simpler. changed: - a) drop the +31 to each byte and replace it with initializing s0=1. a) drop the +31 to each byte and replace it with initializing s1=1.
There are two ways to think about it; how bit information propagates in the s1 and s2 sums, and how the distribution of the s1 and s2 total sums ends up distributed in the truncated 16bit s1 and s2 values.
The truncation to 16 bits is doing a mod 0x10000 and chops of information stored in higher bits. Information in bits of bytes added together only propagate up to higher bits, never down into lower bits.
Doing mod 0xFFFF instead of just truncating to 16 bits is the same as using 1's compliment addition. This does an "end around carry", propagating information that overflows into the higher bits back into the lower bits. This ensures all bits get treated more evenly and preserves bit information overflowing past 16bits. This is what Fletcher uses.
Note mod 0xffff can be implemented as (v >> 16) + (v & 0xffff) for repeated summing, followed by "if v>0xffff then v-= 0xffff" at the end. This works because V % N can be implemented by subtracting N until V<N. So V % (N - 1) can be implemented by subtracting N and adding 1 until V<(N-1). The mask and shift sum operation is the same as this. See http://en.wikipedia.org/wiki/Fletcher%27s_checksum#Optimizations for examples. Also the zlib adler32 implementation does something similar for it's mod operation.
Doing a mod of the largest 16bit prime is what adler32 does. This has been shown to just reduce the range of possible values for no additional gains over mod 0xffff.
The +31 is adding 0xf to each byte. On a byte by byte basis this is spreading the information in the lower 4 bits up into the higher bits faster. Without an end-around-carry this ensures the information overflows out of the s1 word faster. In in the s1 word of the checksum it is equivalent to adding 31*B to the sum of all the bytes. If the block size is a power of two, it's equivalent to adding 0xf << N to the sum. If the blocksize is > 2^12 some of those bits overflow out of s1 anyway.
For the s2 word it's equivalent to adding 31*B/2*(B+1) = 31*(B^2+B)/2 to the s2 sum. When B is a power of 2 this is equivalent to 0xf << (2N-1) + 0xf << (N-1). For B > 2^9 the larger term overflows out of the s2 word, so only the lower term counts.
For very small block sizes the +31 does help propagate information into higher order bits faster, which probably helps a naive truncation appear more evenly balanced when using the sum as a hashtable key, but for even normal sized blocks all it does is ensure the information overflows out of the sums faster.
So the +31 doesn't help. When truncating the sums to 16bits it overflows bit information out of the s1 and s2 sums faster. When doing mod 0xffff then information doesn't overflow out of the sums, but it doesn't make the sums any better.
Initializing s1=1 would be simpler and achieve the same objective of ensuring the length is encoded in the s2 value. This is what adler32 does.
Note for evenly distributed random bytes, for a single byte we can calculate::
sum,sum2 = 0,0 for i in range(256): sum += i sum2 += i*i meanB=sum/256.0 varB=sum2/256.0 - meanB*meanB stddevB=sqrt(varB)
Which gives meanB=127.5, varB=5461.25, stddevB=73.9.
For a sum of N bytes, we end up with a normal distribution with::
mean = N*meanB var = N*varB.
For a single byte multiplied by N we get::
mean = N*meanB var = N*N*varB
The s1 sum of bytes x1,x2,...,xB is just::
s1 = x1 + x2 + ... + xB
For N evenly distributed random bytes we have::
min = 0 max = N * 255 mean = N * meanB = N*127.5 var = N * varB = N*5461.25
For a blocksize of 1kB s1 is sum of 2^10 evenly distributed random bytes giving max=0x3fc00, mean=0x1fe00 and stddev=0x93c. Since the mean and max are more than 16 bits, bit information has already started overflowing out of a truncated sum.
Since 99.7% of possible sums will fall within a contiguous range of +-3*stddev, this means a range of 0x3768, or within less than 1/4 of the possible range of 16bit values, or only the bottom 14bits. Also, 68% will fall within +-1*stddev, or a range of only 0x1278, or just over 1/16 of the possible values, or the bottom 12bits. Any truncation or modulus of this sum just means the common range wraps around, and any offset added just shifts where the range starts. It's only when B>21843 or 21KB that 99.7% of samples use the full 16bit range, and B>196611 or 192kB till 68% of samples use the whole 16bit range.
So the s1 sum will have a poor hash distribution for normal block sizes. It's only for block sizes >21kB that it starts to look OK, and only for block sizes >192kB that it is pretty good.
The s2 sum of bytes x1,x2,...,xB is::
s2 = B*x1 + (B-1)*x2 + ... + xB
This means the second last byte is shifted left 1, the 4th last left 2, the 8th last 3, ... the first left N for block size B=2^N. After truncating to 16bits that means for a 1KB block the high end 2 bits of the first byte have overflowed out of the sum. Larger block sizes means even more of the first byte will have overflowed out, and for a 64KB block size there is no remnant of the first byte left in s2.
For N evenly distributed random bytes we have::
min = 0 max = N*255 + (N-1)*255 + ... + 1*255 = (N + (N-1) + ... + 1) * 255 = N*(N-1)/2 * 255 mean = N*meanB + (N-1)*meanB + ... 1*meanB = (N + (N-1) + ... + 1) * meanB = N*(N-1)/2 * meanB = N*(N-1)/2 * 127.5 var = N^2*varB + (N-1)^2*varB + ... + varB = (N^2 + (N-1)^2 + ... + 1^2)*varB = N*(N+1)*(2*N+1)/6 * varB = N*(N+1)*(2*N+1)/6 * 5461.25
See https://proofwiki.org/wiki/Sum_of_Sequence_of_Squares for why the variance sum reduces to that value.
For a blocksize of 1kB of 2^10 evenly distributed random bytes, s2 has max=0x7f60200, mean=0x3fb0100, stddev=0x15594a. Since the mean is 28bits, quite a lot of bit information has overflowed out of a truncated 16bit sum. Since the stddev is also much more than 16bits the truncated 16bit sums should be well distributed over the whole 16bit range.
So the s2 sum should have a pretty good hash distribution for even small block sizes.
The overall 32bit rollsum is rollsum = (s2 & 0xffff) << 16 + (s1 & 0xffff), so s2 is the upper 16bits, and s1 is the lower 16 bits.
Taking a naive modulus of the rollsum as the index into a hash table means for hashtables sized smaller than 2^16 you only use the s1 value, and only the lowest bits. The lowest bits should give good diversity, but since s1 only uses a fraction of the 16bit range the higher bits will not be well distributed. So for large hash tables and small block sizes you can easily end up only populating a fraction of the hashtable with lots of collisions.
Using a hashtable sizes of (2^N-1) and using a simple mod (2^N-1) of the combined 32bit rollsum will mix s2 into s1. Since s2 will have a good distribution, this should give a pretty good hash distribution and minimize collisions. This should be as good as any sort of integer hashing algorithm applied to the rollsum, and much simpler.
To make the rollsum simpler and better we could:
- drop the +31 to each byte and replace it with initializing s1=1.
- change the sums to use mod 0xffff instead of just truncating to 16bits.
These will simplify and improve the quality of the rollsum, but it's unclear by how much. These changes could be implemented easily with little to no extra overheads (maybe even less overheads). However, they will break backwards compatibility with older versions. I'm not sure how to handle that.
When Using the rollsum as a hashtable index:
- Using 2^N hashtable sizes and just taking the lower N bits will give a bad hash distribution with lots of collisions.
- Using 2^N-1 hashtable sizes and doing mod 2^N-1 should give a good hash distribution, and will be much simpler than using integer hash functions.