[Mono-devel-list] String.Replace patch to Managed code.

Andreas Nahr ClassDevelopment at A-SoftTech.com
Wed Jun 2 15:41:22 EDT 2004


Hi,

ok - sorry about the pos thing. I should have noted that.
As for the speed. I just ran a small microbenchmark to show you *how* big the differences are.
Using a long string: yours: 2994ms, mine: 501ms
Using a short string: yours: 8382ms, mine: 1071ms
So for these two cases we are talking about 600-800% slower code...

I DO know that the code is ugly. If one of my students would deliver that in a normal app, he would probably get a bad grade for it ;)
But in this scenario (String class and potentially a few others of System) I think we should sacrifice that.
Also its only ugly code from the developer's point of view. Once compiled this is perfectly managed code which is much more 'clean' than the internalcalls.

I looked at the (string, string) case but i'm not sure if this code is correct. I didn't dive too much into icu (until now), because there is really bad stuff in there. E.g. IndexOf(string) might deliver a correct match index even if the sought string is longer than the searched string in some locales and cases. This really makes that kind of things complicated.
Another example: if you compare a string with 7 chars with a string with 8 chars length the compare might succeed.

Andreas

  ----- Original Message ----- 
  From: grompf 
  To: Andreas Nahr 
  Sent: Tuesday, June 01, 2004 11:48 PM
  Subject: Re: [Mono-devel-list] String.Replace patch to Managed code.


  Adreas,

  Also, looking at mine vs your second. This should be the same speed as your second, if not faster (due to dropping the second rolled loop). Your nested loop is pretty much the same as the one in IndexOf(char), the only place there would be a "hit" would be on the checks in IndexOf. I'm of the opinion that reusing functions to keep things "cleaner" is probably a good idea.

  Have you put any thought into Replace(String, String). I'm trying to think of an elegant way to realloc the tmp string length without having to realloc (which is expensive on some platforms). I'm not sure there is a way around it. I think there could be a performance gain doing it something like so tho (this is top of head, so am sure could be tweaked)

  Get all IndexOf (oldValue.Substring(0, 1) )
  check them all
  allocate new string
  build

  This would have to be tested obviously.. It has some expensive indexof ops (multiple loops at offsets), but wouldn't need to realloc as would could keep a buffer of positions and allocate (real_length) at the end.

  -kangaroo

  On 1-Jun-04, at 5:36 PM, Andreas Nahr wrote:


    If I would have to guess (without testing!) this should be faster than my first (simple) solution and slower that the second one, however the code looks nicer and is more readable ;)
     
    You could also write:
    if(this.IndexOf(oldChar); == -1)
        return this;
    to save a variable.
    However the additional function call is going to suck away a good amount of speed.
     
    Andreas
    ----- Original Message -----
    From: grompf 
    To: Andreas Nahr 
    Cc: mono-devel-list at lists.ximian.commono-devel-list@lists.ximian.com 
    Sent: Tuesday, June 01, 2004 7:33 PM
    Subject: Re: [Mono-devel-list] String.Replace patch to Managed code.

    Err.. sorry that should be

    /* This method is culture insensitive */
    public unsafe String Replace (char oldChar, char newChar)
    {
    // Change Replace(char, char) to managed code, because ICU doesn't deal with replacing \0
    int pos = this.IndexOf(oldChar);
    if(pos == -1)
    return this;
    string tmp = InternalAllocateStr(length);
    fixed(char* s = &start_char, d = tmp) {
    CharCopy(s, d, pos);
    for(int i = pos; i < length ; i++)
    {
    if(s[i] == oldChar)
    d[i] = newChar;
    else
    d[i] = s[i];
    }
    }
    return tmp;
    }

    -kangaroo

    On 1-Jun-04, at 1:26 PM, grompf wrote:


    Andreas,

    Admittedly and fully understood that what I initially presented wasn't optimized. I was working on tackling the bug first (with "safe" managed code), before moving it to the optimized unsafe version (which you posted for Replace(char, char) and looks pretty good.

    Now, that being said, considering your latest patch with IndexOf improvments. Your second "optimized" patch, wouldn't it be faster to do:

    int pos = this.IndexOf(oldChar);
    if(pos == -1)
    return this;

    fixed (char *source = &start_char) {
    string tmp = InternalAllocateStr(length);
    CharCopy(source, dest, pos);
    for(int i = 0; i < length; i++) {
    if(source[x] == oldChar)
    dest[x] = newChar;
    else
    dest[x] = source[x];
    }
    return tmp;
    }


    ??

    -kangaroo

    On 31-May-04, at 7:06 PM, Andreas Nahr wrote:


    Hi,
     
    I'm working on string for quite some time now, but only when I find spare time to do so, so things are progressing relatively slow in that field (especially as things are very performance sensitive in there and need a *lot* of testing)
     
    And I think that also a slight problem with your patch. Just from looking at it (without too much testing): The performance of it would be really bad.
    For the invariant version a relatively fast managed implementation is relatively easy (unfortunatelly it is still a little bit slower than native, but a lot faster than your solution):
     
       string tmp = InternalAllocateStr (length);
       fixed (char* s = &start_char, d = tmp) {
        char* source = s, dest = d;
        for (int x = 0; x < length; x++) {
         if (*source == oldChar)
          *dest = newChar;
         else
          *dest = *source;
         source++;
         dest++;
        }
       }
       return tmp;
     
     
     
    If you like to test things a little bit you could also look at this optimized version (needs the CharCopy patch I submitted to this list):
     
       fixed (char* source = &start_char) {
        for (int x = 0; x < length; x++) {
         if (source[x] == oldChar) {
          string tmp = InternalAllocateStr (length);
          fixed (char* dest = tmp) {
           CharCopy (source, dest, x);
           source[x] = newChar;
           for (; x < length; x++) {
            if (source[x] == oldChar)
             dest[x] = newChar;
            else
             dest[x] = source[x];
           }
          }
          return tmp;
         }
        }
       }
       return this;
     
     
     
    Andreas
     
    ----- Original Message -----
    From: grompf
    To: mono-devel-list at lists.ximian.com
    Sent: Monday, May 31, 2004 9:03 PM
    Subject: [Mono-devel-list] String.Replace patch to Managed code.

    In my effort to address bug #59274, I tracked the problem down to icu/glib in locales.c not replacing \0.

    Attached is a patch for String.cs to replace the internal methods with managed code.

    There is probably a more efficient way of Replace(String, String), but both of these methods have been tested and working for bug #59274 as well as other Replace testings. However, I'm a little unsure how to test the culture dependancy of Replace(String, String). Considering the IndexOf call should determine the culture as well, it _should_ be ok as far I understand the culture dependancy stuff.

    If this looks good, let me know and I'll move on to some other methods (like IndexOf).

    -kangaroo


     
    !DSPAM:40bcf718106132073820473!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20040602/8330efe0/attachment.html 


More information about the Mono-devel-list mailing list