[Mono-dev] [PATCH] Boost speed of UnicodeEncoding

Fri Mar 17 13:30:09 EST 2006

Hi,

I didn't modify string.memcpy and that needs some boost as well especially
for short strings (I mean memory in fact:). Modifying string.memcpy will
affect the String class as well so it can boost the entire Mono framework.
If you have some patches please post it to the list.

Kornél

----- Original Message -----
From: "Zac Bowling" <zac at zacbowling.com>
To: <mono-devel-list at lists.ximian.com>
Sent: Friday, March 17, 2006 1:07 PM
Subject: Re: [Mono-dev] [PATCH] Boost speed of UnicodeEncoding

Awesome work!

I disappeared for a few days but managed to get my patch nearly ready
as well but it looks like yours runs a few microseconds faster then
mine in all my tests.

The part that beats mine is on the bigEndian text where you modded the
memcpy technique in the CopyChars function for doing the byte swaping:

...
dest[0] = src[1]; dest[1] = src[0];
dest[2] = src[3];
dest[3] = src[2];
dest[4] = src[5];
dest[5] = src[4];
...

(absolutely amazing how much faster that is! :-P)

One big thing different in my patch is that I did almost all of this
inside the String.cs file instead. Sort of a throw back to Java being
able to do some stuff inside Java's String class like this without
having to call java.nio.charset but this makes more sense. :-)

This should work so much better better now and make my life a little
nicer reading these UTF-16 geo data CSV files now.

good thinking :-)

--
Zac Bowling
http://zacbowling.com/

----- Message from kornelpal at hotmail.com ---------
    Date: Thu, 16 Mar 2006 23:59:53 +0100
    From: Kornél Pál <kornelpal at hotmail.com>
Reply-To: Kornél Pál <kornelpal at hotmail.com>
Subject: Re: [Mono-dev] [PATCH] Boost speed of UnicodeEncoding
      To: Atsushi Eno <atsushi at ximian.com>

> Hi,
>
> Originally I didn't plan to create a patch I only made some suggestions.
> But
> then I realized that current the UnicodeEncoding is too inefficient.
>
> So I implemented my idea to UnicodeEncoding.
>
> UnicodeEncodingPerformance.cs is the test I used.
>
> Results:
> Before:
> 1, string to byte[], same: 265
> 1, char[] to byte[], same: 282
> 1, byte[] to char[], same: 453
> 1, string to byte[], diff: 265
> 1, char[] to byte[], diff: 266
> 1, byte[] to char[], diff: 453
> 4, string to byte[], same: 672
> 4, char[] to byte[], same: 703
> 4, byte[] to char[], same: 594
> 4, string to byte[], diff: 656
> 4, char[] to byte[], diff: 609
> 4, byte[] to char[], diff: 641
> 1024, string to byte[], same: 1406
> 1024, char[] to byte[], same: 1391
> 1024, byte[] to char[], same: 922
> 1024, string to byte[], diff: 1297
> 1024, char[] to byte[], diff: 1281
> 1024, byte[] to char[], diff: 1250
> 1048576, string to byte[], same: 3453
> 1048576, char[] to byte[], same: 2500
> 1048576, byte[] to char[], same: 1515
> 1048576, string to byte[], diff: 2734
> 1048576, char[] to byte[], diff: 1407
> 1048576, byte[] to char[], diff: 1312
>
>
> After:
> 1, string to byte[], same: 578
> 1, char[] to byte[], same: 563
> 1, byte[] to char[], same: 844
> 1, string to byte[], diff: 328
> 1, char[] to byte[], diff: 359
> 1, byte[] to char[], diff: 578
> 4, string to byte[], same: 578
> 4, char[] to byte[], same: 563
> 4, byte[] to char[], same: 812
> 4, string to byte[], diff: 391
> 4, char[] to byte[], diff: 406
> 4, byte[] to char[], diff: 594
> 1024, string to byte[], same: 47
> 1024, char[] to byte[], same: 47
> 1024, byte[] to char[], same: 62
> 1024, string to byte[], diff: 203
> 1024, char[] to byte[], diff: 204
> 1024, byte[] to char[], diff: 203
> 1048576, string to byte[], same: 391
> 1048576, char[] to byte[], same: 375
> 1048576, byte[] to char[], same: 375
> 1048576, string to byte[], diff: 984
> 1048576, char[] to byte[], diff: 391
> 1048576, byte[] to char[], diff: 375
>
> Note these are the results of two actual executions so they are not fully
> representative.
>
> As you can see converting 1 character became slower. But longer strings
> are
> much faster converted (4 bytes for example). Just to show how inefficient
> the old code was converting 1024 characters is about 20-30 times faster
> than
> it was before.
>
> I think converting a single character should not be optimized as doing so
> is
> already inefficient. It's much faster to use convert it inline using shift
> operators.
>
> Please review and approve the patch.
>
> Kornél
>
> ----- Original Message -----
> From: "Atsushi Eno" <atsushi at ximian.com>
> To: "Kornél Pál" <kornelpal at hotmail.com>
> Cc: <mono-devel-list at lists.ximian.com>; "Zac Bowling" <zac at zacbowling.com>
> Sent: Wednesday, March 15, 2006 11:10 PM
> Subject: Re: [Mono-dev] Patch to boost speed of UnicodeEncoding
>
>
>> Hi,
>>
>> It's always nice if encoding conversion stuff get faster. Can you
>> also provide how it becomes faster when you finish writing the patch?
>>
>> Thx,
>> Atsushi Eno
>>
>>
>> Kornél Pál wrote:
>>> Hi,
>>>
>>> I think doing something like in the attached draft is faster. No new
>>> String
>>> object is created. Arrays are accessed using pointers. And I think there
>>> is
>>> no use to use a more complicated conversion method for short strings.
>>>
>>> This draft is very unsafe. It lacks of any checks and does not perform
>>> any
>>> special character or byte sequence handling.
>>>
>>> Note that I haven't done any tests to determine whether using byte
>>> pointer
>>> or using int pointers and shift operations to swap bytes is faster. But
>>> mixing bytes an ints results in two different code for big and little
>>> endian
>>> encodings while byte swapping can be performed using a single code when
>>> using only bytes or only ints.
>>>
>>> Kornél
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>

----- End message from kornelpal at hotmail.com -----
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list at lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list