[Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers

Kornél Pál kornelpal at gmail.com
Tue Apr 11 07:28:01 EDT 2006


Hi,

I had some time and looked at all the encoding classes in I18N and in 
System.Text.

byte* and char* is only used in UnicodeEncoding and GetByteCount and 
GetBytes in I18N.

This means that having the #if NET_2_0 codes that you don't want to remove 
will cause performance loss on profile 2.0 in System.Text while will not 
improve performance in profile 1.1 as no such optimization is done.

The solution is to use arrays in Encoding that improves simple, old 
fashioned encoding classes but override these methods to use pointers in 
classes that implement their core functionality using unsafe code.

Encodings in System.Text (except UnicodeEncoding) use arrays and I think 
custom encodings created by users are array based as well so it results in 
better performance if we use arrays in Encoding. If custom encodings are 
using unsafe code they will have to override other methods because of MS.NET 
anyway to get the performance improvement.

By overriding GetByteCount (string) and GetBytes (string) in MonoEncoding 
performance improvement on unsafe code will be preserved in addition it will 
be available in all profiles.

MonoEncoding was already good so I just added these two methods and added 
the following code to GetBytes methods:

int byteCount = bytes.Length - byteIndex;
if (bytes.Length == 0)
 bytes = new byte [1];

Some check is required because &bytes[0] will fail for zero-size arrays. 
"bytes.Length == byteIndex" could avoid this (but was present in only one of 
the methods) but this would prevent ArgumentException being thrown for too 
small output buffers. Creating a small array is little overhead and an 
exception will probably be thrown because charCount is non-zero.

Attached an improved patch. Please review the patch.

Kornél

----- Original Message ----- 
From: "Kornél Pál" <kornelpal at gmail.com>
To: "Atsushi Eno" <atsushi at ximian.com>
Cc: <mono-devel-list at lists.ximian.com>
Sent: Monday, April 10, 2006 3:53 PM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0
andmodifysome Encoding wrappers


> Now I haven't got enough time for a detailed reply (I'll later do that)
> but I have the following general opinion:
>
> For a public abstract base class like Encoding I think it's important to
> use the same wrappers as MS.NET does because people probably assume this.
> And their code will behave differently on Mono otherwise if the only
> override some methods.
>
> I18N encodings however are totally internal so nobody can override them
> (if obtained using GetEncoding) so they can do whatever we want.
>
> I saw that you use pointer based conversion in MonoEncoding that is good
> and fast and should not be reverted and is not affected by the bad
> performance of Encoding. I think Encoding should be MS.NET compatible but
> MonoEncoding can override any methods of Encoding that are inefficient.
>
> So I think we should "retard" Encoding to provide compatibility and update
> (or leave if it's already good enough) MonoEncoding for better
> performance.
>
> Note that Encoding in MS.NET 2.0 is just as inefficient (if not more
> inefficient) as in 1.x but they created their own EncodingNLS (I guess
> this is that class) to address this problem. We could do the same using
> MonoEncoding.
>
> Type type = Encoding.GetEncoding(1252).GetType();
> while (type != null)
> {
> Console.WriteLine(type.FullName);
> type = type.BaseType;
> }
>
> System.Text.SBCSCodePageEncoding
> System.Text.BaseCodePageEncoding
> System.Text.EncodingNLS
> System.Text.Encoding
> System.Object
>
> Kornél
>
> ----- Original Message ----- 
> From: "Atsushi Eno" <atsushi at ximian.com>
> Cc: <mono-devel-list at lists.ximian.com>
> Sent: Monday, April 10, 2006 2:48 PM
> Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0
> andmodifysome Encoding wrappers
>
>
>> Hi,
>>
>> Kornél Pál wrote:
>>> Hi,
>>>
>>> Now I understood why is UnicodeEncodig.GetBytes(string) overridden in
>>> MS.NET 1.x but not in MS.NET 2.0.
>>
>>> Encoding of MS.NET uses char[] to convert strings in all versions and
>>> the call an overload that takes char[] in GetBytes(string) as well.
>>> (This is a difference compared to Mono as it uses char* in 2.0.) And I
>>> think MS realized that the should make GetBytes(string) a higher level
>>> wrapper just like the other ones and call GetBytes(string, int, int,
>>> byte[], int) like the overridden method in UnicodeEncoding does.
>>>
>>> But then they realized that this would break compatibility with MS.NET
>>> 1.x so they dropped the modification done to Encodig.GetBytes(string)
>>> but forgot to put back the override in UnicodeEncoding so 2.0
>>> UnicodeEncodig.GetBytes(string) is actually less efficient than in 1.0.
>>>
>>> I updated the patch to call the right method in
>>> UnicodeEncodig.GetBytes(string).
>>>
>>> Also note that Encoding of Mono is using the new unsafe methods for
>>> GetBytes that takes string but MS.NET is not doing this optimization and
>>> is using char[] instead that is more efficient when the new unsafe
>>> methods are not overridden as they convert pointers back to arrays by
>>> default. In addition calling the same methods improves compatibility.
>>
>> Umm, I don't think that should be our way to go. Creating char[] in
>> GetBytes(string) is *obviously* inefficient. Since I actually added
>> pointer-based overrides in I18N classes, there is no
>> "pointers-goes-back-to-arrays" problem. This kind of "compatibility"
>> change rather harms Mono performance.
>>
>> Without exact, practical, clear and present danger of incompatiblity
>> problem, this change makes no sense to me. I'd love to make Mono not
>> suck in the name of compatibility which is anyways broken promise in
>> .NET land.
>>
>> (You should also notice that Encoding implementation in 2.0 is totally
>> different than 1.x. In .NET 2.0 they are managed. In 1.x they are
>> just WIN32API wrappers. They are anyways incompatible, for example
>> having different fallback replacement characters.)
>>
>>> (Note that all of these information were obtained by overriding Encoding
>>> and printing notification to the console when a method is called.)
>>>
>>> The updated patch is attached.
>>
>> I wouldn't apply this new patch. If other mono hackers do that I won't
>> stop (but then it is very likely to happen that I stop helping
>> Encoding improvements anymore).
>>
>> Atsushi Eno
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Encodings.diff
Type: application/octet-stream
Size: 9297 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060411/b99834a2/attachment.obj 


More information about the Mono-devel-list mailing list