[Mono-dev] [PATCH] String.Split() broken behavior

Paolo Molaro lupus at ximian.com
Fri Oct 12 14:36:51 EDT 2007


On 10/07/07 Tyler Larson wrote:
> Attached is a revised patch to the String.Split problem. This patch now 
> includes code cleanup to bring it in conformance with the published coding 
> standards as well as unit tests relevant to the changes I made.

I guess this is the patch that Joel committed to svn.
I had to revert it, mostly because it broke the build: the same tests
it introduces were failing and it wasn't reviewed and approved.
There are other issues with the patch that I'll point out below.

> --- mcs/class/corlib/System/String.cs	(revision 87045)
> +++ mcs/class/corlib/System/String.cs	(working copy)
> @@ -216,12 +216,11 @@
>  			if (count == 1) 
>  				return new String[1] { ToString() };
>  
> -			return InternalSplit (separator, count);
> +			return InternalSplit (separator, count, 0);
>  		}
>  
>  #if NET_2_0
>  		[ComVisible (false)]
> -		[MonoDocumentationNote ("optimization")]

Since the code is not optimized, please don't remove this.

> @@ -2461,7 +2441,7 @@
>  		private extern void InternalCopyTo (int sIndex, char[] dest, int destIndex, int count);
>  
>  		[MethodImplAttribute (MethodImplOptions.InternalCall)]
> -		private extern String[] InternalSplit (char[] separator, int count);
> +		private extern String[] InternalSplit (char[] separator, int count, int options);

A change in an icall signature requires a change to the corlib internal
version.

> --- mcs/class/corlib/Test/System/StringTest.cs	(revision 87045)
> +++ mcs/class/corlib/Test/System/StringTest.cs	(working copy)
> @@ -1898,6 +1898,68 @@
>  		AssertEquals ("B", res [1]);
>  		AssertEquals ("C", res [2]);
>  	}
> +	
> +	[Test]
> +	public void SplitStringChars()
> +	{
> +		String[] res;
> +
> +		// count == 0
> +		res = "..A..B..".Split (new Char[] { '.' }, 0, StringSplitOptions.None);
> +		AssertEquals ("#01-01", 0, res.Length);
> +
> +		// count == 1
> +		res = "..A..B..".Split (new Char[] { '.' }, 1, StringSplitOptions.None);
> +		AssertEquals ("#02-01", 1, res.Length);
> +		AssertEquals ("#02-02", "..A..B..", res[0]);
> +
> +		// count == 1 + RemoveEmpty
> +		res = "..A..B..".Split (new Char[] { '.' }, 1, StringSplitOptions.RemoveEmptyEntries);
> +		AssertEquals ("#03-01", 1, res.Length);

I think it was this test failing. You can check by running
	make run-test PROFILE=net_2_0
in mcs/class/corlib.

> Index: mono/mono/metadata/string-icalls.c
> ===================================================================
> --- mono/mono/metadata/string-icalls.c	(revision 87045)
> +++ mono/mono/metadata/string-icalls.c	(working copy)
> @@ -97,8 +97,14 @@
>  	memcpy(destptr, src + sindex, sizeof(gunichar2) * count);
>  }
>  
> +/* System.StringSplitOptions */
> +typedef enum {
> +	STRINGSPLITOPTIONS_NONE = 0,
> +	STRINGSPLITOPTIONS_REMOVE_EMPTY_ENTRIES = 1
> +} StringSplitOptions;
> +
>  MonoArray * 
> -ves_icall_System_String_InternalSplit (MonoString *me, MonoArray *separator, gint32 count)
> +ves_icall_System_String_InternalSplit (MonoString *me, MonoArray *separator, gint32 count, gint32 options)
>  {
>  	MonoString * tmpstr;
>  	MonoArray * retarr;
> @@ -106,65 +112,143 @@
>  	gint32 arrsize, srcsize, splitsize;
>  	gint32 i, lastpos, arrpos;
>  	gint32 tmpstrsize;
> +	gint32 remempty;
> +	gint32 flag;
>  	gunichar2 *tmpstrptr;
>  
>  	gunichar2 cmpchar;
>  
>  	MONO_ARCH_SAVE_REGS;
>  
> -	src = mono_string_chars(me);
> -	srcsize = mono_string_length(me);
> -	arrsize = mono_array_length(separator);
> +	remempty = options & STRINGSPLITOPTIONS_REMOVE_EMPTY_ENTRIES;
> +	src = mono_string_chars (me);
> +	srcsize = mono_string_length (me);
> +	arrsize = mono_array_length (separator);
>  
> -	cmpchar = mono_array_get(separator, gunichar2, 0);
> +	splitsize = 1;
> +	/* Count the number of elements we will return. Note that this operation
> +	 * guarantees that we will return exactly splitsize elements, and we will
> +	 * have enough data to fill each. This allows us to skip some checks later on.
> +	 */
> +	if (remempty == 0) {
> +		for (i = 0; i != srcsize && splitsize < count; i++) 
> +			if (string_icall_is_in_array (separator, arrsize, src[i]))

You need a space before square brackets.

> +				splitsize++;
> +	}
> +	else {

This needs to go into a sinfle line.

> +		/* Require pattern "Nondelim + Delim + Nondelim" to increment counter.
> +		 * Lastpos != 0 means first nondelim found.
> +		 * Flag = 0 means last char was delim.
> +		 * Efficient, though perhaps confusing.
> +		 */
> +		lastpos = 0;
> +		flag = 0;
> +		for (i = 0; i != srcsize && splitsize < count; i++) {
> +			if (string_icall_is_in_array (separator, arrsize, src[i]))
> +				flag = 0;
> +			else if (flag == 0) {

If you introduce parens, use them in all the branches of an if/else
sequence.

> +				if (lastpos == 1)
> +					splitsize++;
> +				flag = 1;
> +				lastpos = 1;
> +			}
> +		}
>  
> -	splitsize = 0;
> -	for (i = 0; i != srcsize && splitsize < count; i++) {
> -		if (string_icall_is_in_array(separator, arrsize, src[i]))
> -			splitsize++;
> +		/* Nothing but separators */
> +		if (lastpos == 0)
> +		{

The open curly brace goes in the same line as the if.

> +	if (splitsize == 1) {
> +		if (remempty == 0 || count == 1)
> +		{

Same.

> +			/* Copy the whole string */
> +			retarr = mono_array_new (mono_domain_get(), mono_get_string_class (), 1);
> +			mono_array_setref (retarr, 0, me);
> +		}
> +		else 
> +		{

Wrong formatting, it should be " } else {" in one line.

> +			/* otherwise we have to filter out leading & trailing delims */
>  
> +			/* find first non-delim char */
> +			for ( ; srcsize != 0; srcsize--, src++)
> +				if (!string_icall_is_in_array (separator, arrsize, src[0]))
> +					break;

Non trivial for loops should have curly braces. A space is needed before
brackets. Some of the same style issues are all over the place,I won't
list them all.

When the changes are made and the tests actually pass, since this is
runtime code we need either an explicit signoff that the code is
contributed under the X11 license or a copyright assignment.

Sorry it took so long for the patch review.

Thanks!

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better



More information about the Mono-devel-list mailing list