[Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 and modifysome Encoding wrappers

Kornél Pál kornelpal at gmail.com
Mon Apr 10 08:01:59 EDT 2006


Hi,

Now I understood why is UnicodeEncodig.GetBytes(string) overridden in MS.NET 
1.x but not in MS.NET 2.0.

Encoding of MS.NET uses char[] to convert strings in all versions and the 
call an overload that takes char[] in GetBytes(string) as well. (This is a 
difference compared to Mono as it uses char* in 2.0.) And I think MS 
realized that the should make GetBytes(string) a higher level wrapper just 
like the other ones and call GetBytes(string, int, int, byte[], int) like 
the overridden method in UnicodeEncoding does.

But then they realized that this would break compatibility with MS.NET 1.x 
so they dropped the modification done to Encodig.GetBytes(string) but forgot 
to put back the override in UnicodeEncoding so 2.0 
UnicodeEncodig.GetBytes(string) is actually less efficient than in 1.0.

I updated the patch to call the right method in 
UnicodeEncodig.GetBytes(string).

Also note that Encoding of Mono is using the new unsafe methods for GetBytes 
that takes string but MS.NET is not doing this optimization and is using 
char[] instead that is more efficient when the new unsafe methods are not 
overridden as they convert pointers back to arrays by default. In addition 
calling the same methods improves compatibility.

(Note that all of these information were obtained by overriding Encoding and 
printing notification to the console when a method is called.)

The updated patch is attached.

Kornél

----- Original Message ----- 
From: "Kornél Pál" <kornelpal at gmail.com>
To: "Atsushi Eno" <atsushi at ximian.com>
Cc: <mono-devel-list at lists.ximian.com>
Sent: Monday, April 10, 2006 1:12 PM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 and
modifysome Encoding wrappers


> OK, now I understand what you mean.:)
>
> UnicodeEncoding.GetString should have exactly the same result as
> previously but it should be much faster because data is written directly
> to the string instead of creating a temporary char[] buffer.
>
> All the modified Encoding methods are very high level wrappers that have
> currently faster implementations than in my patch because some unnecessary
> checks are avoided.
>
> The problem however is that Encoding is an abstract class with a lot of
> virtual methods so people can implement their own encodings. In this case
> they may assume that these wrappers call (or wrap) the same methods as in
> MS.NET.
>
> I noticed this when implementing UnicodeEncoding.GetString because
> GetString (byte[]) calls GetString (byte[] bytes, int, int) on MS.NET
> while Mono uses new string (GetChars (byte[])). As you can see on MS.NET
> overriding GetString (byte[] bytes, int, int) will result in modification
> of GetString (byte[]) as well while on Mono you have to override both of
> them to have the same effect.
>
> The same goes for the two other modified Encoding methods. They are higher
> level wrappers on MS.NET than on Mono.
>
> So the difference can only be experienced when overriding methods in
> Encoding, you will get different results on MS.NET and Mono.
>
> The exact behaviour of these wrappers could be enforced using a test case
> by creating an inherited class that throws exceptions when methods that
> should not be called are called and fails when the expected method was not
> called but I'm not sure whether we need this kind of tests.
>
> Note that explicitly throwing ArgumentNullException is required because of
> accessing .Length that would result in NullReferenceException. These
> methods throw ArgumentNullException on MS.NET. Previously I thought that
> ArgumentNullExceptions were already thrown by the wrapped methods in the
> current implementation but now I reallized that GetBytes (char[])
> currently throws NullReferenceException that is a bug.
>
> So the goal of this patch is to implement a fast UnicodeEncoding.GetString
> and to modify some wrapper methods of Encoding to call the same methods as
> they do on MS.NET (this actually means to make the higher level wrappers
> than they are currently).
>
> Kornél
>
> ----- Original Message ----- 
> From: "Atsushi Eno" <atsushi at ximian.com>
> To: "Kornél Pál" <kornelpal at gmail.com>
> Cc: <mono-devel-list at lists.ximian.com>
> Sent: Monday, April 10, 2006 12:54 PM
> Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 and
> modifysome Encoding wrappers
>
>
>> Kornél Pál wrote:
>>> No, but if you have conceptions regarding what kind of new tests we need
>>> I can create some.
>>
>> Things that fails without your patch but does not fail with it.
>> (Otherwise why did you create the patch? ;-)
>>
>> For GetBytes() fix I can imagine the difference (though it is very
>> minor: whether it throws NullReferenceException or
>> ArgumentNullException) but not sure for others.
>>
>> Atsushi Eno
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: UnicodeEncoding.diff
Type: application/octet-stream
Size: 4484 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060410/2a1fe142/attachment.obj 


More information about the Mono-devel-list mailing list