[Mono-dev] String.GetHashCode Discussion.
billholmes54 at gmail.com
Wed Jul 16 17:06:34 EDT 2008
I have been asked to move this discussion to the e-mail list from IRC.
Basically we (my company and I) have new unit tests errors in 2.0
that did not occur at 1.9. The errors were traced to the
String.GetHashCode change. I had one of our interns (Mike) research
the change and I wanted to share his findings with you. His final
document s attached. I can also send the test case Mike used to
generate his data. I believe it is too large to send to all of your
(1:28:15 PM) holmes: Have there been any discussions about r106887
which was a change to String.GetHashCode?
(1:31:54 PM) holmes: This change is causing some of our tests to fail.
I will admit that our tests are relying on the sting hash code a bit
too much. However after some investigation I am wondering if the new
algorithm is generating a unique enough result. (I say that as
respectfully as I can.)
(1:34:33 PM) holmes: We conducted a short test where 58000 English
string were tested and the new code only results in 60% unique results
compared to the previous code's performance of 100%
(1:38:36 PM) holmes: We have found the current algorithm only
considers the last ~6 characters of a string, which may be the
problem. That is the last thing I will say here I promise.
(1:42:21 PM) kumpera: marek: how have you tested the distribution of
your new string hashing algorithm?
(1:42:54 PM) marek: kumpera: I used english dictionary
(1:43:57 PM) marek: holmes: how big is your dictionary?
(1:45:05 PM) holmes: marek: 58110
(1:46:12 PM) holmes: I think the real problem is not that a word
hashes unique enough as a sentence producing a unique result.
(1:46:56 PM) holmes: the last 6 characters define the whole algorithm.
(or at least I think so)
(1:56:50 PM) lupus: it's likely better to revert the change, string
hashing is also supposed to be synced with the implementation in C in
the runtime anyway
More information about the Mono-devel-list