[Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

Wed Apr 12 06:07:34 EDT 2006

Hi,

I've done some tests:
New 1.1:
UnicodeEncoding: 6750
ASCIIEncoding: 18609
UTF8Encoding: 9922
CP932: 14641

New 2.0:
UnicodeEncoding: 13594
ASCIIEncoding: 19562
UTF8Encoding: 16625
CP932: 38906

Old 1.1:
UnicodeEncoding: 6906
ASCIIEncoding: 18859
UTF8Encoding: 10062
CP932: 21719

Old 2.0:
UnicodeEncoding: 6750
ASCIIEncoding: 7297
UTF8Encoding: 16719
CP932: 45469

I have the following conclusion:

UnicodeEncoding in 2.0 is slower because GetBytes(string) is not overridden. 
But performance is improved in 1.1 because the overridden implementation 
optimized for UnicodeEncoding.

In ASCIIEncoding you can see the drawback of doing optimizations in Encoding 
class because the current code is only faster on 2.0. Using the new code 1.1 
didn't change because not using unsafe code.

There is no change in UTF8Encoding (or little but improvement is minimal).

CP932 is faster because optimization is done in MonoEncoding.

As a conclusion I think that Encoding should be MS.NET compatible because 
it's more likely to be used by users. And no improvement can be done in 
profile 1.1 because there are no unsafe methods so there is no use to 
sacrifice compatibility for performance.

I think that the best solution for encoding optimization is to use a single 
unsafe implementation (for each funtionality; GetBytes, GetChars, 
GetByteCount, GetCharCount) and other methods (string, char[], byte[]) are 
calling this single implementation. This makes the code more maintainable as 
well. This is what I've done in UnicodeEncoding.

And I think the point where we shouldn't care about MS.NET compatibility are 
the derived public encoding classes; we should override as much methods as 
we need even if they aren't overridden in MS.NET. (For private encoding 
classes layout compatibility is not requirement.)

For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and 
GetString(byte[], int, int) in UnicodeEncoding significant performance 
improvement can be achieved in all profiles.

Is this "deal" acceptable? If you have any objections please let me know.

Kornél

----- Original Message ----- 
From: "Kornél Pál" <kornelpal at gmail.com>
To: "Atsushi Eno" <atsushi at ximian.com>
Cc: <mono-devel-list at lists.ximian.com>
Sent: Wednesday, April 12, 2006 12:35 AM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 
andmodifysome Encoding wrappers

> Hi,
>
> Numbers are things that can convince me.:)
>
> Now I have three questions:
> - Are there parts of the patch that are OK to commit?
> - Do we care about class signature (what methods are overriden)?
> - Do we care about the implementation of virtual methods (what methods do 
> they call)?
>
> I can follow any guidelines - altough I don't belive in performance above 
> everything else - but I would like to know them otherwise I cannot follow 
> them.
>
> Kornél
>
> ----- Original Message ----- 
> From: "Atsushi Eno" <atsushi at ximian.com>
> To: "Kornél Pál" <kornelpal at gmail.com>
> Cc: <mono-devel-list at lists.ximian.com>
> Sent: Tuesday, April 11, 2006 6:56 PM
> Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 
> andmodifysome Encoding wrappers
>
>
>> Hi,
>>
>> I'm not interested in how your patch accomplishes MS.NET compatibility.
>> My question is simple: is your patch *good* for Mono?
>>
>> using System;
>> using System.Diagnostics;
>> using System.IO;
>> using System.Text;
>>
>> public class Test
>> {
>>         public static void Main (string [] args)
>>         {
>>                 int loop = args.Length > 1 ? int.Parse (args [1]) : 100;
>>                 string s = File.OpenText (args [0]).ReadToEnd ();
>>                 Encoding e = Encoding.Unicode;
>>                 Stopwatch sw = Stopwatch.StartNew ();
>>                 for (int i = 0; i < loop; i++)
>>                         e.GetBytes (s);
>>                 sw.Stop ();
>>                 Console.WriteLine (sw.ElapsedMilliseconds);
>>         }
>> }
>>
>> Before your patch:
>> mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml
>> 5038
>>
>> After the patch:
>> $ rundev2 mono ./unicode.exe 
>> ../../svn/mono/web/web/masterinfos/System.Web.xml
>> 10175
>>
>> Atsushi Eno
>>
>> Kornél Pál wrote:
>>> Hi,
>>>
>>> I had some time and looked at all the encoding classes in I18N and in 
>>> System.Text.
>>>
>>> byte* and char* is only used in UnicodeEncoding and GetByteCount and 
>>> GetBytes in I18N.
>>>
>>> This means that having the #if NET_2_0 codes that you don't want to 
>>> remove will cause performance loss on profile 2.0 in System.Text while 
>>> will not improve performance in profile 1.1 as no such optimization is 
>>> done.
>>>
>>> The solution is to use arrays in Encoding that improves simple, old 
>>> fashioned encoding classes but override these methods to use pointers in 
>>> classes that implement their core functionality using unsafe code.
>>>
>>> Encodings in System.Text (except UnicodeEncoding) use arrays and I think 
>>> custom encodings created by users are array based as well so it results 
>>> in better performance if we use arrays in Encoding. If custom encodings 
>>> are using unsafe code they will have to override other methods because 
>>> of MS.NET anyway to get the performance improvement.
>>>
>>> By overriding GetByteCount (string) and GetBytes (string) in 
>>> MonoEncoding performance improvement on unsafe code will be preserved in 
>>> addition it will be available in all profiles.
>>>
>>> MonoEncoding was already good so I just added these two methods and 
>>> added the following code to GetBytes methods:
>>>
>>> int byteCount = bytes.Length - byteIndex;
>>> if (bytes.Length == 0)
>>> bytes = new byte [1];
>>>
>>> Some check is required because &bytes[0] will fail for zero-size arrays. 
>>> "bytes.Length == byteIndex" could avoid this (but was present in only 
>>> one of the methods) but this would prevent ArgumentException being 
>>> thrown for too small output buffers. Creating a small array is little 
>>> overhead and an exception will probably be thrown because charCount is 
>>> non-zero.
>>>
>>> Attached an improved patch. Please review the patch.
>>>
>>> Kornél
> 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Test.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060412/460389f0/attachment.pl