[Mono-dev] [PATCH] String.Split() broken behavior

Tyler Larson mono-devel at tlarson.com
Sun Oct 7 05:40:21 EDT 2007


Rodrigo Kumpera wrote:
> Hi Tyler,
>
> I some coments about our patch, some are just minor details like 
> formating issues. You can read about our coding guidelines in 
> http://www.mono-project.com/Coding_Guidelines 
> <http://www.mono-project.com/Coding_Guidelines>
>
> For the options it's a good idea to create an enum in the c side. 
> "options & SPLIT_OPTIONS_REMOVE_EMPTY_ENTRIES", or something alika, is 
> way better than "options & 0x01" - which uses some sort of magical 
> number. Take a look at the GenericParameterAttributes enum in 
> object-internals.h.
>
> +            if (string_icall_is_in_array(separator, arrsize, src[i])) {
> +                splitsize++;
>
> Please add a space after function name and array index, it should be 
> something like:
> +            if (string_icall_is_in_array (separator, arrsize, src [i])) {
> +                splitsize++;
>
> +                if (lastpos == 1)
> +                {
> +                    splitsize++;
> +                }
>
> Avoid braces for single line ifs.
>
> About fixing the broken code, we are going to need new unit tests for 
> all this code. You could sent the patch for 
> mcs/class/corlib/Test/System/StringTest.cs too ;)
>
>
> Cheers,
> Rodrigo
>
>
Attached is a revised patch to the String.Split problem. This patch now 
includes code cleanup to bring it in conformance with the published 
coding standards as well as unit tests relevant to the changes I made.

For what its worth, I would suspect that this function is probably the 
fastest basic tokenization mechanism available though the class 
libraries. The reason why I found the bugs in this code to begin with is 
because I always used this method to tokenize input strings. Indeed, I'm 
surprised that these bugs went unnoticed and unrepaired for so long 
(literally years) for such a critical core component as the system 
string library.

Is there some other (perhaps faster) mechanism other people are using 
for string tokenization that I don't know about?

-Tyler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringsplit.patch
Type: text/x-patch
Size: 11802 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20071007/a8fc2880/attachment.bin 


More information about the Mono-devel-list mailing list