[Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0 andmodifysome Encoding wrappers
kornelpal at gmail.com
Wed Apr 12 06:07:34 EDT 2006
I've done some tests:
I have the following conclusion:
UnicodeEncoding in 2.0 is slower because GetBytes(string) is not overridden.
But performance is improved in 1.1 because the overridden implementation
optimized for UnicodeEncoding.
In ASCIIEncoding you can see the drawback of doing optimizations in Encoding
class because the current code is only faster on 2.0. Using the new code 1.1
didn't change because not using unsafe code.
There is no change in UTF8Encoding (or little but improvement is minimal).
CP932 is faster because optimization is done in MonoEncoding.
As a conclusion I think that Encoding should be MS.NET compatible because
it's more likely to be used by users. And no improvement can be done in
profile 1.1 because there are no unsafe methods so there is no use to
sacrifice compatibility for performance.
I think that the best solution for encoding optimization is to use a single
unsafe implementation (for each funtionality; GetBytes, GetChars,
GetByteCount, GetCharCount) and other methods (string, char, byte) are
calling this single implementation. This makes the code more maintainable as
well. This is what I've done in UnicodeEncoding.
And I think the point where we shouldn't care about MS.NET compatibility are
the derived public encoding classes; we should override as much methods as
we need even if they aren't overridden in MS.NET. (For private encoding
classes layout compatibility is not requirement.)
For example if I remove !NET_2_0 and NET_2_0 from GetBytes(string) and
GetString(byte, int, int) in UnicodeEncoding significant performance
improvement can be achieved in all profiles.
Is this "deal" acceptable? If you have any objections please let me know.
----- Original Message -----
From: "Kornél Pál" <kornelpal at gmail.com>
To: "Atsushi Eno" <atsushi at ximian.com>
Cc: <mono-devel-list at lists.ximian.com>
Sent: Wednesday, April 12, 2006 12:35 AM
Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0
andmodifysome Encoding wrappers
> Numbers are things that can convince me.:)
> Now I have three questions:
> - Are there parts of the patch that are OK to commit?
> - Do we care about class signature (what methods are overriden)?
> - Do we care about the implementation of virtual methods (what methods do
> they call)?
> I can follow any guidelines - altough I don't belive in performance above
> everything else - but I would like to know them otherwise I cannot follow
> ----- Original Message -----
> From: "Atsushi Eno" <atsushi at ximian.com>
> To: "Kornél Pál" <kornelpal at gmail.com>
> Cc: <mono-devel-list at lists.ximian.com>
> Sent: Tuesday, April 11, 2006 6:56 PM
> Subject: Re: [Mono-dev] [PATCH] Add GetString to UnicodeEncoding 2.0
> andmodifysome Encoding wrappers
>> I'm not interested in how your patch accomplishes MS.NET compatibility.
>> My question is simple: is your patch *good* for Mono?
>> using System;
>> using System.Diagnostics;
>> using System.IO;
>> using System.Text;
>> public class Test
>> public static void Main (string  args)
>> int loop = args.Length > 1 ? int.Parse (args ) : 100;
>> string s = File.OpenText (args ).ReadToEnd ();
>> Encoding e = Encoding.Unicode;
>> Stopwatch sw = Stopwatch.StartNew ();
>> for (int i = 0; i < loop; i++)
>> e.GetBytes (s);
>> sw.Stop ();
>> Console.WriteLine (sw.ElapsedMilliseconds);
>> Before your patch:
>> mono ./unicode.exe ../../svn/mono/web/web/masterinfos/System.Web.xml
>> After the patch:
>> $ rundev2 mono ./unicode.exe
>> Atsushi Eno
>> Kornél Pál wrote:
>>> I had some time and looked at all the encoding classes in I18N and in
>>> byte* and char* is only used in UnicodeEncoding and GetByteCount and
>>> GetBytes in I18N.
>>> This means that having the #if NET_2_0 codes that you don't want to
>>> remove will cause performance loss on profile 2.0 in System.Text while
>>> will not improve performance in profile 1.1 as no such optimization is
>>> The solution is to use arrays in Encoding that improves simple, old
>>> fashioned encoding classes but override these methods to use pointers in
>>> classes that implement their core functionality using unsafe code.
>>> Encodings in System.Text (except UnicodeEncoding) use arrays and I think
>>> custom encodings created by users are array based as well so it results
>>> in better performance if we use arrays in Encoding. If custom encodings
>>> are using unsafe code they will have to override other methods because
>>> of MS.NET anyway to get the performance improvement.
>>> By overriding GetByteCount (string) and GetBytes (string) in
>>> MonoEncoding performance improvement on unsafe code will be preserved in
>>> addition it will be available in all profiles.
>>> MonoEncoding was already good so I just added these two methods and
>>> added the following code to GetBytes methods:
>>> int byteCount = bytes.Length - byteIndex;
>>> if (bytes.Length == 0)
>>> bytes = new byte ;
>>> Some check is required because &bytes will fail for zero-size arrays.
>>> "bytes.Length == byteIndex" could avoid this (but was present in only
>>> one of the methods) but this would prevent ArgumentException being
>>> thrown for too small output buffers. Creating a small array is little
>>> overhead and an exception will probably be thrown because charCount is
>>> Attached an improved patch. Please review the patch.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
More information about the Mono-devel-list