[Mono-dev] [PATCH] Boost speed of UnicodeEncoding

Kornél Pál kornelpal at hotmail.com
Thu Mar 16 17:59:53 EST 2006


Hi,

Originally I didn't plan to create a patch I only made some suggestions. But
then I realized that current the UnicodeEncoding is too inefficient.

So I implemented my idea to UnicodeEncoding.

UnicodeEncodingPerformance.cs is the test I used.

Results:
Before:
1, string to byte[], same: 265
1, char[] to byte[], same: 282
1, byte[] to char[], same: 453
1, string to byte[], diff: 265
1, char[] to byte[], diff: 266
1, byte[] to char[], diff: 453
4, string to byte[], same: 672
4, char[] to byte[], same: 703
4, byte[] to char[], same: 594
4, string to byte[], diff: 656
4, char[] to byte[], diff: 609
4, byte[] to char[], diff: 641
1024, string to byte[], same: 1406
1024, char[] to byte[], same: 1391
1024, byte[] to char[], same: 922
1024, string to byte[], diff: 1297
1024, char[] to byte[], diff: 1281
1024, byte[] to char[], diff: 1250
1048576, string to byte[], same: 3453
1048576, char[] to byte[], same: 2500
1048576, byte[] to char[], same: 1515
1048576, string to byte[], diff: 2734
1048576, char[] to byte[], diff: 1407
1048576, byte[] to char[], diff: 1312


After:
1, string to byte[], same: 578
1, char[] to byte[], same: 563
1, byte[] to char[], same: 844
1, string to byte[], diff: 328
1, char[] to byte[], diff: 359
1, byte[] to char[], diff: 578
4, string to byte[], same: 578
4, char[] to byte[], same: 563
4, byte[] to char[], same: 812
4, string to byte[], diff: 391
4, char[] to byte[], diff: 406
4, byte[] to char[], diff: 594
1024, string to byte[], same: 47
1024, char[] to byte[], same: 47
1024, byte[] to char[], same: 62
1024, string to byte[], diff: 203
1024, char[] to byte[], diff: 204
1024, byte[] to char[], diff: 203
1048576, string to byte[], same: 391
1048576, char[] to byte[], same: 375
1048576, byte[] to char[], same: 375
1048576, string to byte[], diff: 984
1048576, char[] to byte[], diff: 391
1048576, byte[] to char[], diff: 375

Note these are the results of two actual executions so they are not fully
representative.

As you can see converting 1 character became slower. But longer strings are
much faster converted (4 bytes for example). Just to show how inefficient
the old code was converting 1024 characters is about 20-30 times faster than
it was before.

I think converting a single character should not be optimized as doing so is
already inefficient. It's much faster to use convert it inline using shift
operators.

Please review and approve the patch.

Kornél

----- Original Message -----
From: "Atsushi Eno" <atsushi at ximian.com>
To: "Kornél Pál" <kornelpal at hotmail.com>
Cc: <mono-devel-list at lists.ximian.com>; "Zac Bowling" <zac at zacbowling.com>
Sent: Wednesday, March 15, 2006 11:10 PM
Subject: Re: [Mono-dev] Patch to boost speed of UnicodeEncoding


> Hi,
>
> It's always nice if encoding conversion stuff get faster. Can you
> also provide how it becomes faster when you finish writing the patch?
>
> Thx,
> Atsushi Eno
>
>
> Kornél Pál wrote:
>> Hi,
>>
>> I think doing something like in the attached draft is faster. No new
>> String
>> object is created. Arrays are accessed using pointers. And I think there
>> is
>> no use to use a more complicated conversion method for short strings.
>>
>> This draft is very unsafe. It lacks of any checks and does not perform
>> any
>> special character or byte sequence handling.
>>
>> Note that I haven't done any tests to determine whether using byte
>> pointer
>> or using int pointers and shift operations to swap bytes is faster. But
>> mixing bytes an ints results in two different code for big and little
>> endian
>> encodings while byte swapping can be performed using a single code when
>> using only bytes or only ints.
>>
>> Kornél
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: UnicodeEncoding.diff
Type: application/octet-stream
Size: 10271 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060316/0890aada/attachment.obj 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: UnicodeEncodingPerformance.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060316/0890aada/attachment.pl 


More information about the Mono-devel-list mailing list