[Mono-dev] [PATCH] Boost speed of UnicodeEncoding

Zac Bowling zac at zacbowling.com
Fri Mar 17 19:24:14 EST 2006


Hey,

Before I meant was how you use copy the bytes using 16, 8, 4, and 2 
byte chunks method like memcpy does inside your CopyChars function :-)

This may or may not be useful. I was doing something like this before I 
broke it about an hour ago to convert a byte array into string (by 
adding a 2 new internal constructors and icalls to String). In the code 
I broke, I have have some similar internal statics functions on string 
that would convert things around in unmanaged code in a similar way. 
I'll send it in a minute when it works again.

Index: mono/metadata/string-icalls.c
===================================================================
--- mono/metadata/string-icalls.c       (revision 58130)
+++ mono/metadata/string-icalls.c       (working copy)
@@ -96,6 +96,12 @@
        return res;
}

+MonoString *
+ves_icall_System_String_ctor_bytep_int_int (gpointer dummy, guint16 
*value, gint32 sindex, gint32 length)
+{
+       return ves_icall_System_String_ctor_charp_int_int(dummy, value, 
sindex, length);
+}
+
MonoString *
ves_icall_System_String_ctor_charp_int_int (gpointer dummy, gunichar2 
*value, gint32 sindex, gint32 length)
{
@@ -179,6 +185,28 @@
}

MonoString *
+ves_icall_System_String_ctor_bytea_int_int (gpointer dummy, MonoArray *value,
+                                        gint32 sindex, gint32 length)
+{
+       MonoDomain *domain;
+
+       MONO_ARCH_SAVE_REGS;
+
+       if (value == NULL)
+               mono_raise_exception (mono_get_exception_argument_null 
("value"));
+       if (sindex < 0)
+               mono_raise_exception 
(mono_get_exception_argument_out_of_range ("startIndex"));
+       if (length < 0)
+               mono_raise_exception 
(mono_get_exception_argument_out_of_range ("length"));
+       if (sindex + length > mono_array_length (value))
+               mono_raise_exception 
(mono_get_exception_argument_out_of_range ("Out of range"));
+
+       domain = mono_domain_get ();
+       return mono_string_new_utf16 (domain, (gunichar2 *) 
mono_array_addr(value, guint16, sindex), length);
+
+}
+
+MonoString *
ves_icall_System_String_ctor_chara_int_int (gpointer dummy, MonoArray *value,
                                         gint32 sindex, gint32 length)
{
Index: mono/metadata/string-icalls.h
===================================================================
--- mono/metadata/string-icalls.h       (revision 58130)
+++ mono/metadata/string-icalls.h       (working copy)
@@ -14,6 +14,9 @@
#include <mono/metadata/object.h>

MonoString *
+ves_icall_System_String_ctor_bytep_int_int (gpointer dummy, guint16 
*value, gint32 sindex, gint32 length);
+
+MonoString *
ves_icall_System_String_ctor_charp (gpointer dummy, gunichar2 *value);

MonoString *
@@ -32,6 +35,9 @@
ves_icall_System_String_ctor_chara (gpointer dummy, MonoArray *value);

MonoString *
+ves_icall_System_String_ctor_bytea_int_int (gpointer dummy, MonoArray 
*value,  gint32 sindex, gint32 length);
+
+MonoString *
ves_icall_System_String_ctor_chara_int_int (gpointer dummy, MonoArray 
*value,  gint32 sindex, gint32 length);

MonoString *
Index: mono/metadata/icall.c
===================================================================
--- mono/metadata/icall.c       (revision 58130)
+++ mono/metadata/icall.c       (working copy)
@@ -6925,6 +6925,8 @@
};

static const IcallEntry string_icalls [] = {
+       {".ctor(byte*,int,int)", ves_icall_System_String_ctor_bytep_int_int},
+       {".ctor(byte[],int,int)", ves_icall_System_String_ctor_bytea_int_int},
        {".ctor(char*)", ves_icall_System_String_ctor_charp},
        {".ctor(char*,int,int)", ves_icall_System_String_ctor_charp_int_int},
        {".ctor(char,int)", ves_icall_System_String_ctor_char_int},




and in corlib I added:


Index: System/String.cs
===================================================================
--- System/String.cs    (revision 58130)
+++ System/String.cs    (working copy)
@@ -1945,6 +1945,12 @@
                        memcpy4 (dest, src, size);
                }

+               [MethodImplAttribute (MethodImplOptions.InternalCall)]
+               unsafe internal extern String (byte *value, int 
startIndex, int length);
+
+               [MethodImplAttribute (MethodImplOptions.InternalCall)]
+               internal extern String (byte [] val, int startIndex, 
int length);
+
                [CLSCompliant (false), MethodImplAttribute 
(MethodImplOptions.InternalCall)]
                unsafe public extern String (char *value);


-- 
Zac Bowling
http://zacbowling.com/


----- Message from kornelpal at hotmail.com ---------
    Date: Fri, 17 Mar 2006 19:30:09 +0100
    From: Kornél Pál <kornelpal at hotmail.com>
Reply-To: Kornél Pál <kornelpal at hotmail.com>
Subject: Re: [Mono-dev] [PATCH] Boost speed of UnicodeEncoding
      To: Zac Bowling <zac at zacbowling.com>, mono-devel-list at lists.ximian.com


> Hi,
>
> I didn't modify string.memcpy and that needs some boost as well especially
> for short strings (I mean memory in fact:). Modifying string.memcpy will
> affect the String class as well so it can boost the entire Mono framework.
> If you have some patches please post it to the list.
>
> Kornél
>
> ----- Original Message -----
> From: "Zac Bowling" <zac at zacbowling.com>
> To: <mono-devel-list at lists.ximian.com>
> Sent: Friday, March 17, 2006 1:07 PM
> Subject: Re: [Mono-dev] [PATCH] Boost speed of UnicodeEncoding
>
>
> Awesome work!
>
> I disappeared for a few days but managed to get my patch nearly ready
> as well but it looks like yours runs a few microseconds faster then
> mine in all my tests.
>
> The part that beats mine is on the bigEndian text where you modded the
> memcpy technique in the CopyChars function for doing the byte swaping:
>
> ...
> dest[0] = src[1]; dest[1] = src[0];
> dest[2] = src[3];
> dest[3] = src[2];
> dest[4] = src[5];
> dest[5] = src[4];
> ...
>
> (absolutely amazing how much faster that is! :-P)
>
> One big thing different in my patch is that I did almost all of this
> inside the String.cs file instead. Sort of a throw back to Java being
> able to do some stuff inside Java's String class like this without
> having to call java.nio.charset but this makes more sense. :-)
>
> This should work so much better better now and make my life a little
> nicer reading these UTF-16 geo data CSV files now.
>
> good thinking :-)
>
> --
> Zac Bowling
> http://zacbowling.com/
>
>
> ----- Message from kornelpal at hotmail.com ---------
>    Date: Thu, 16 Mar 2006 23:59:53 +0100
>    From: Kornél Pál <kornelpal at hotmail.com>
> Reply-To: Kornél Pál <kornelpal at hotmail.com>
> Subject: Re: [Mono-dev] [PATCH] Boost speed of UnicodeEncoding
>      To: Atsushi Eno <atsushi at ximian.com>
>
>
>> Hi,
>>
>> Originally I didn't plan to create a patch I only made some suggestions.
>> But
>> then I realized that current the UnicodeEncoding is too inefficient.
>>
>> So I implemented my idea to UnicodeEncoding.
>>
>> UnicodeEncodingPerformance.cs is the test I used.
>>
>> Results:
>> Before:
>> 1, string to byte[], same: 265
>> 1, char[] to byte[], same: 282
>> 1, byte[] to char[], same: 453
>> 1, string to byte[], diff: 265
>> 1, char[] to byte[], diff: 266
>> 1, byte[] to char[], diff: 453
>> 4, string to byte[], same: 672
>> 4, char[] to byte[], same: 703
>> 4, byte[] to char[], same: 594
>> 4, string to byte[], diff: 656
>> 4, char[] to byte[], diff: 609
>> 4, byte[] to char[], diff: 641
>> 1024, string to byte[], same: 1406
>> 1024, char[] to byte[], same: 1391
>> 1024, byte[] to char[], same: 922
>> 1024, string to byte[], diff: 1297
>> 1024, char[] to byte[], diff: 1281
>> 1024, byte[] to char[], diff: 1250
>> 1048576, string to byte[], same: 3453
>> 1048576, char[] to byte[], same: 2500
>> 1048576, byte[] to char[], same: 1515
>> 1048576, string to byte[], diff: 2734
>> 1048576, char[] to byte[], diff: 1407
>> 1048576, byte[] to char[], diff: 1312
>>
>>
>> After:
>> 1, string to byte[], same: 578
>> 1, char[] to byte[], same: 563
>> 1, byte[] to char[], same: 844
>> 1, string to byte[], diff: 328
>> 1, char[] to byte[], diff: 359
>> 1, byte[] to char[], diff: 578
>> 4, string to byte[], same: 578
>> 4, char[] to byte[], same: 563
>> 4, byte[] to char[], same: 812
>> 4, string to byte[], diff: 391
>> 4, char[] to byte[], diff: 406
>> 4, byte[] to char[], diff: 594
>> 1024, string to byte[], same: 47
>> 1024, char[] to byte[], same: 47
>> 1024, byte[] to char[], same: 62
>> 1024, string to byte[], diff: 203
>> 1024, char[] to byte[], diff: 204
>> 1024, byte[] to char[], diff: 203
>> 1048576, string to byte[], same: 391
>> 1048576, char[] to byte[], same: 375
>> 1048576, byte[] to char[], same: 375
>> 1048576, string to byte[], diff: 984
>> 1048576, char[] to byte[], diff: 391
>> 1048576, byte[] to char[], diff: 375
>>
>> Note these are the results of two actual executions so they are not fully
>> representative.
>>
>> As you can see converting 1 character became slower. But longer strings
>> are
>> much faster converted (4 bytes for example). Just to show how inefficient
>> the old code was converting 1024 characters is about 20-30 times faster
>> than
>> it was before.
>>
>> I think converting a single character should not be optimized as doing so
>> is
>> already inefficient. It's much faster to use convert it inline using shift
>> operators.
>>
>> Please review and approve the patch.
>>
>> Kornél
>>
>> ----- Original Message -----
>> From: "Atsushi Eno" <atsushi at ximian.com>
>> To: "Kornél Pál" <kornelpal at hotmail.com>
>> Cc: <mono-devel-list at lists.ximian.com>; "Zac Bowling" <zac at zacbowling.com>
>> Sent: Wednesday, March 15, 2006 11:10 PM
>> Subject: Re: [Mono-dev] Patch to boost speed of UnicodeEncoding
>>
>>
>>> Hi,
>>>
>>> It's always nice if encoding conversion stuff get faster. Can you
>>> also provide how it becomes faster when you finish writing the patch?
>>>
>>> Thx,
>>> Atsushi Eno
>>>
>>>
>>> Kornél Pál wrote:
>>>> Hi,
>>>>
>>>> I think doing something like in the attached draft is faster. No new
>>>> String
>>>> object is created. Arrays are accessed using pointers. And I think there
>>>> is
>>>> no use to use a more complicated conversion method for short strings.
>>>>
>>>> This draft is very unsafe. It lacks of any checks and does not perform
>>>> any
>>>> special character or byte sequence handling.
>>>>
>>>> Note that I haven't done any tests to determine whether using byte
>>>> pointer
>>>> or using int pointers and shift operations to swap bytes is faster. But
>>>> mixing bytes an ints results in two different code for big and little
>>>> endian
>>>> encodings while byte swapping can be performed using a single code when
>>>> using only bytes or only ints.
>>>>
>>>> Kornél
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>
>>
>
>
> ----- End message from kornelpal at hotmail.com -----
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
>
>


----- End message from kornelpal at hotmail.com -----





More information about the Mono-devel-list mailing list