[Mono-dev] Patch to boost speed of UnicodeEncoding

Kornél Pál kornelpal at hotmail.com
Sat Mar 11 08:40:33 EST 2006


Hi,

I think doing something like in the attached draft is faster. No new String
object is created. Arrays are accessed using pointers. And I think there is
no use to use a more complicated conversion method for short strings.

This draft is very unsafe. It lacks of any checks and does not perform any
special character or byte sequence handling.

Note that I haven't done any tests to determine whether using byte pointer
or using int pointers and shift operations to swap bytes is faster. But
mixing bytes an ints results in two different code for big and little endian
encodings while byte swapping can be performed using a single code when
using only bytes or only ints.

Kornél

----- Original Message -----
From: "Zac Bowling" <zac at zacbowling.com>
To: <mono-devel-list at lists.ximian.com>
Sent: Saturday, March 11, 2006 1:09 PM
Subject: [Mono-dev] Patch to boost speed of UnicodeEncoding


> Alright guys,
>
> Here is a cool (and still incomplete) patch to speed up
> System.Text.UnicodeEncoding I'm working on. Just want to make sure this
> is sane before I finish it by getting everyone's opinions.
>
> I was tinkering with this idea. Since the strings are stored in memory
> as UTF-16 (UCS 2) already, the idea of converting them with like we do
> with a while loop, one char at a time, was really bothering me.
> Directly copying whats in memory seems a little bit more sane. I don't
> want to make it sound that easy because it isn't (and maybe why it
> wasn't done like this when it was first written). :-P
>
> The biggest problem is that UnicodeEncoding can be bigEndian or
> littleEndian so I went through the logic and testing to see if the
> system's endian (with 'BitConverter.IsLittleEndian') matched the endian
> of the current Encoding class (using the 'bigEndian' bool field) and if
> it doesn't then use the same method we already use. (Is that right? Is
> the internal version of utf-16 we use in our strings specific to the
> endian of the system? I assumed yes here but if it's not, it's a simple
> change to remark it out.)
>
> Also since the memcpy function in String.cs uses some unsafe logic,
> taking a possible hit for that with a really small string seems silly,
> so I put in an condition that if the char count is less then or equal
> to 10 chars, then use the existing method. (Maybe 10 chars should be
> adjusted or is that idea silly?)
>
> Below is an unfinished sample of my idea. Of course I will have to
> reverse this logic for GetChars() (instead of GetBytes below) and
> finish the overloads in System.Text.UnicodeEncoding's GetBytes and
> GetChars methods but I want to see what everything thinks.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ByteArrayCharArray.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060311/ba14fc3a/attachment.pl 


More information about the Mono-devel-list mailing list