[Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

Wed Apr 12 08:04:49 EDT 2006

Hi,

Here's the real answer.

I might not be fully understanding you, but if you are saying that
your current patch as is should be applied, then it's no-go (due
to the big difference in ASCII and Unicode as you showed us).

Note that I'm not saying that performance is always higher-rated
matter than compatibility (I'm actually rather anti-pro-optimization
dude than others). If there is a way to achieves this "compatibility"
and does not harm performance, it'd be awesome. I'm just not for
*extermism*. The reason you were once convinced was not because the
evidences are numbers but because the differences are significant.

(Hey, there is no doubt that I love your detailed analysis BTW :-)

I agree with you on that we had better feel free to override virtual
stuff that does not result in MissingMethodException (but it might
be only myself).

For individual changes other than that performance loss, there are
certain goodness in your patches. But for some I'm not convinced
(such as giving new byte [1]) because you really don't provide
evident NUnit tests.

If you don't write any, I will create ones for some changes that I am
convinced. But as I've written in the first reply, the difference is
so minor that it is low priority for me.

BTW thanks for the decent tester code. It conceived me that there are
still some optimizible things.

Atsushi Eno

Kornél Pál wrote:
> Hi,
> 
> I've done some tests:
> New 1.1:
> UnicodeEncoding: 6750
> ASCIIEncoding: 18609
> UTF8Encoding: 9922
> CP932: 14641
> 
> New 2.0:
> UnicodeEncoding: 13594
> ASCIIEncoding: 19562
> UTF8Encoding: 16625
> CP932: 38906
> 
> Old 1.1:
> UnicodeEncoding: 6906
> ASCIIEncoding: 18859
> UTF8Encoding: 10062
> CP932: 21719
> 
> Old 2.0:
> UnicodeEncoding: 6750
> ASCIIEncoding: 7297
> UTF8Encoding: 16719
> CP932: 45469
> 
> I have the following conclusion:
> 
> UnicodeEncoding in 2.0 is slower because GetBytes(string) is not 
> overridden. But performance is improved in 1.1 because the overridden 
> implementation optimized for UnicodeEncoding.
> 
> In ASCIIEncoding you can see the drawback of doing optimizations in 
> Encoding class because the current code is only faster on 2.0. Using the 
> new code 1.1 didn't change because not using unsafe code.
> 
> There is no change in UTF8Encoding (or little but improvement is minimal).
> 
> CP932 is faster because optimization is done in MonoEncoding.
> 
> As a conclusion I think that Encoding should be MS.NET compatible 
> because it's more likely to be used by users. And no improvement can be 
> done in profile 1.1 because there are no unsafe methods so there is no 
> use to sacrifice compatibility for performance.
> 
> I think that the best solution for encoding optimization is to use a 
> single unsafe implementation (for each funtionality; GetBytes, GetChars, 
> GetByteCount, GetCharCount) and other methods (string, char[], byte[]) 
> are calling this single implementation. This makes the code more 
> maintainable as well. This is what I've done in UnicodeEncoding.
> 
> And I think the point where we shouldn't care about MS.NET compatibility 
> are the derived public encoding classes; we should override as much 
> methods as we need even if they aren't overridden in MS.NET. (For 
> private encoding classes layout compatibility is not requirement.)
> 
> For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and 
> GetString(byte[], int, int) in UnicodeEncoding significant performance 
> improvement can be achieved in all profiles.
> 
> Is this "deal" acceptable? If you have any objections please let me know.
> 
> Kornél