[Mono-dev] [PATCH] Boost speed of UnicodeEncoding

Zac Bowling zac at zacbowling.com
Fri Mar 17 07:07:47 EST 2006


Awesome work!

I disappeared for a few days but managed to get my patch nearly ready 
as well but it looks like yours runs a few microseconds faster then 
mine in all my tests.

The part that beats mine is on the bigEndian text where you modded the 
memcpy technique in the CopyChars function for doing the byte swaping:

...
dest[0] = src[1]; dest[1] = src[0];
dest[2] = src[3];
dest[3] = src[2];
dest[4] = src[5];
dest[5] = src[4];
...

(absolutely amazing how much faster that is! :-P)

One big thing different in my patch is that I did almost all of this 
inside the String.cs file instead. Sort of a throw back to Java being 
able to do some stuff inside Java's String class like this without 
having to call java.nio.charset but this makes more sense. :-)

This should work so much better better now and make my life a little 
nicer reading these UTF-16 geo data CSV files now.

good thinking :-)

-- 
Zac Bowling
http://zacbowling.com/


----- Message from kornelpal at hotmail.com ---------
    Date: Thu, 16 Mar 2006 23:59:53 +0100
    From: Kornél Pál <kornelpal at hotmail.com>
Reply-To: Kornél Pál <kornelpal at hotmail.com>
Subject: Re: [Mono-dev] [PATCH] Boost speed of UnicodeEncoding
      To: Atsushi Eno <atsushi at ximian.com>


> Hi,
>
> Originally I didn't plan to create a patch I only made some suggestions. But
> then I realized that current the UnicodeEncoding is too inefficient.
>
> So I implemented my idea to UnicodeEncoding.
>
> UnicodeEncodingPerformance.cs is the test I used.
>
> Results:
> Before:
> 1, string to byte[], same: 265
> 1, char[] to byte[], same: 282
> 1, byte[] to char[], same: 453
> 1, string to byte[], diff: 265
> 1, char[] to byte[], diff: 266
> 1, byte[] to char[], diff: 453
> 4, string to byte[], same: 672
> 4, char[] to byte[], same: 703
> 4, byte[] to char[], same: 594
> 4, string to byte[], diff: 656
> 4, char[] to byte[], diff: 609
> 4, byte[] to char[], diff: 641
> 1024, string to byte[], same: 1406
> 1024, char[] to byte[], same: 1391
> 1024, byte[] to char[], same: 922
> 1024, string to byte[], diff: 1297
> 1024, char[] to byte[], diff: 1281
> 1024, byte[] to char[], diff: 1250
> 1048576, string to byte[], same: 3453
> 1048576, char[] to byte[], same: 2500
> 1048576, byte[] to char[], same: 1515
> 1048576, string to byte[], diff: 2734
> 1048576, char[] to byte[], diff: 1407
> 1048576, byte[] to char[], diff: 1312
>
>
> After:
> 1, string to byte[], same: 578
> 1, char[] to byte[], same: 563
> 1, byte[] to char[], same: 844
> 1, string to byte[], diff: 328
> 1, char[] to byte[], diff: 359
> 1, byte[] to char[], diff: 578
> 4, string to byte[], same: 578
> 4, char[] to byte[], same: 563
> 4, byte[] to char[], same: 812
> 4, string to byte[], diff: 391
> 4, char[] to byte[], diff: 406
> 4, byte[] to char[], diff: 594
> 1024, string to byte[], same: 47
> 1024, char[] to byte[], same: 47
> 1024, byte[] to char[], same: 62
> 1024, string to byte[], diff: 203
> 1024, char[] to byte[], diff: 204
> 1024, byte[] to char[], diff: 203
> 1048576, string to byte[], same: 391
> 1048576, char[] to byte[], same: 375
> 1048576, byte[] to char[], same: 375
> 1048576, string to byte[], diff: 984
> 1048576, char[] to byte[], diff: 391
> 1048576, byte[] to char[], diff: 375
>
> Note these are the results of two actual executions so they are not fully
> representative.
>
> As you can see converting 1 character became slower. But longer strings are
> much faster converted (4 bytes for example). Just to show how inefficient
> the old code was converting 1024 characters is about 20-30 times faster than
> it was before.
>
> I think converting a single character should not be optimized as doing so is
> already inefficient. It's much faster to use convert it inline using shift
> operators.
>
> Please review and approve the patch.
>
> Kornél
>
> ----- Original Message -----
> From: "Atsushi Eno" <atsushi at ximian.com>
> To: "Kornél Pál" <kornelpal at hotmail.com>
> Cc: <mono-devel-list at lists.ximian.com>; "Zac Bowling" <zac at zacbowling.com>
> Sent: Wednesday, March 15, 2006 11:10 PM
> Subject: Re: [Mono-dev] Patch to boost speed of UnicodeEncoding
>
>
>> Hi,
>>
>> It's always nice if encoding conversion stuff get faster. Can you
>> also provide how it becomes faster when you finish writing the patch?
>>
>> Thx,
>> Atsushi Eno
>>
>>
>> Kornél Pál wrote:
>>> Hi,
>>>
>>> I think doing something like in the attached draft is faster. No new
>>> String
>>> object is created. Arrays are accessed using pointers. And I think there
>>> is
>>> no use to use a more complicated conversion method for short strings.
>>>
>>> This draft is very unsafe. It lacks of any checks and does not perform
>>> any
>>> special character or byte sequence handling.
>>>
>>> Note that I haven't done any tests to determine whether using byte
>>> pointer
>>> or using int pointers and shift operations to swap bytes is faster. But
>>> mixing bytes an ints results in two different code for big and little
>>> endian
>>> encodings while byte swapping can be performed using a single code when
>>> using only bytes or only ints.
>>>
>>> Kornél
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>


----- End message from kornelpal at hotmail.com -----



More information about the Mono-devel-list mailing list