[Mono-dev] Dictionary`2: optimized and serialization-compatible with MS.net

Mon Jun 11 12:27:27 EDT 2007

On Mon, 2007-06-11 at 16:51 +0200, Paolo Molaro wrote:
> You're missing that the hashcode check can be completely removed
> since it's not needed. Having it may have some performance benefit
> with many collisions or for presence checks that fail and happen
> to hit a bucket with values that also have different hashcodes.
> But this benefit (I think it is small, would love to see numbers for a
> few test cases) needs to be balanced against a memory usage overhead
> that in most cases is at about 20%.

Yes, I've realized this shortly after sending my last email. How could
I've missed that?!

I changed the code accordingly, getting rid of the hashcodes completely.
Unfortunately, the result of the change is a slow down which is often
considerable (especially for string keys):

Dictionary<int, int>:       6-12%
Dictionary<double, double>: 25-35%
Dictionary<string, int>:    30-80% *

*: depending on average string length and the strings themselves (I
tried quite a few string variations here)

For key types with slow Equals() methods (e.g. String.Equals, whose
running time is proportional to the length of the matching substring),
the slow down will always be substantial.

I would vote for keeping the hashcode around, as my guess (read: gut
feeling) for the average slow down in realistic scenarios would be
around 30%.

> Well, my opionion is that using a hashtable there is completely overkill
> and the wrong thing to do (even if it is a super-optimized Dictionary<T>)
> performance-wise, in fact I think I removed that sometime in the past
> because it gave an enourmous performance vantage to not do it in a wide
> range of cases. Just limiting the chars that are handled with the skip
> table to latin1 would be enough to have so much better performance in
> 95% of the cases that the remaining cases would become uninteresting.
> But if we want to pay some service to people not using only the latin1
> subset of unicode, we could still use a 256-entry array for latin1
> and a hash/dictionary that is created and used only for the chars > 255.
> GetShiftDistance would become:
> 
> 	if (c < 256)
> 		return latin1_skip_table [(byte)c];
> 	...current code...

Yes, that's true, a skip table based purely on Hashtable/Dictinary looks
like overkill in this case. Especially compared to your solution.

Juraj