[Mono-bugs] [Bug 59537][Nor] Changed - improve NameTable performance
bugzilla-daemon@bugzilla.ximian.com
bugzilla-daemon@bugzilla.ximian.com
Thu, 16 Sep 2004 01:53:08 -0400 (EDT)
Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.
Changed by mono@bitfurnace.com.
http://bugzilla.ximian.com/show_bug.cgi?id=59537
--- shadow/59537 2004-09-16 01:52:09.000000000 -0400
+++ shadow/59537.tmp.28783 2004-09-16 01:53:08.000000000 -0400
@@ -454,6 +454,54 @@
Please make sure that it does not break existing nunit testcases.
NameTableTests are few, but it seems to break many other tests which
depend on NameTable (i.e. almost all tests).
Am also very interested in what is the key of this performance
improvement.
+
+------- Additional Comments From mono@bitfurnace.com 2004-09-16 01:53 -------
+Hmm, an explanation of Knuth's multiplication method and universal
+hashing...
+
+Well, without the multiplier, on doubling, an extra bit of hash is
+introduced, implying that each current hashtable entry will be moved
+to one of two alternative slots. In fact, given that the extra bit is
+a high bit, it implies that each current hashtable entry will be moved
+to either the high or low half of the new hashtable. This isnt so bad,
+but for a poor initial distribution, it isnt the best possible result.
+
+If we take our starting hash and multiply it by a random but odd
+number chosen each time we double, we randomly re-distribute the
+hashtable entries fresh at each doubling. Multiplying by a random odd
+number tends to concentrate entropy (the scrambling factor) in the
+high bits, so our final probe number is achived by grabbing N of the
+highest bits (by right shifting by 32-N bits), where N bits is the
+size of our hashtable.
+
+Why does this multiply-and-right-shift work? I guess the best way to
+see this in operation is to write out binary multiplication on paper
+as a long multiplication. Its a series of shifts and adds, and you
+should see that multiplying by a random number will result in nearly
+every bit of the starting number having an impact on the high bits of
+the result.
+
+The choice of multiplier is important. Most literature is satisfied
+with a simple random odd multiplier, but clearly multipliers with only
+a few bits bit set in them are likely bad multipliers. I dont have a
+rigorous proof of what constitutes a good random multiplier, but my
+best guess is that a good random multiplier is one with a even-ish mix
+of ones and zeros, relatively evenly distributed across the 32 bits of
+a word.
+
+My solution is to create a random multiplier as a 4-byte word, where
+each byte is chosen in the range 1-254. This should ensure that no
+individual byte of the multiplier is composed of all zeros or all
+ones. This restriction comprises around 3% of the possible range of
+multipliers, and should help with avoiding bad multipliers.
+
+Of course, every random hashing scheme has its degenerate cases, but
+the very same randomness will make those cases exceedingly unlikely.
+In practise, linear probing with random hash multipliers is extremely
+fast and the worst case results are not seen.
+
+If worst-case results are a worry, we can rehash with a new multiplier
+ when exceeding long probe lengths are encountered.