[Mono-dev] Optimized QuickSearch (System.Text.RegularExpressions)

Paolo Molaro lupus at ximian.com
Wed Jun 20 12:25:08 EDT 2007


On 06/15/07 Juraj Skripsky wrote:
> The attached patch implements an optimization for QuickSearch as
> suggested by Paolo.
> 
> I've split the skip table ("Hashtable shift") into two tables:
> "byte[] shift" contains skip distances for chars <= 255. "Hashtable
> shiftExtended" contains skip dist. for chars > 255 (and those having
> skip distances > 255). The second table is created only if necessary.
> 
> This will make the class more efficient in both time and space (for most
> search strings).

Did you actually measure so we can see what kind of speedup there is?


> Index: System.Text.RegularExpressions/quicksearch.cs
> ===================================================================
> --- System.Text.RegularExpressions/quicksearch.cs	(revision 79685)
> +++ System.Text.RegularExpressions/quicksearch.cs	(working copy)
> @@ -160,44 +160,77 @@
>  			// private
>  
>  		private void SetupShiftTable () {
> -			shift = new Hashtable ();
> -			if (reverse)
> -			{
> -				for (int i = len ; i > 0; -- i) 
> -				{
> -					char c = str[i -1];
> -					shift[GetChar(c)] = i;
> -				}
> +			bool needsExtendedTable = len > (byte.MaxValue - 1);

If I'm not mistaken, we could use a short skip (255) even if it's not optimal,
thus avoiding more uses of the hashtable.

> +
> +			byte maxLowChar = 0;
> +			for (int i = 0; i < len; i++) {
> +				char cur = str [i];
> +				if (cur <= (char)byte.MaxValue) {
> +					if ((byte)cur > maxLowChar)
> +						maxLowChar = (byte)cur;				
> +				} else
> +					needsExtendedTable = true;
>  			}
> -			else
> -			{
> -				for (int i = 0; i < len; ++ i) 
> -				{
> -					char c = str[i];
> -					shift[GetChar(c)] = len - i;
> +			
> +			shift = new byte [maxLowChar + 1];
> +			if (needsExtendedTable)
> +				shiftExtended = new Hashtable ();
> +
> +			for (int i = 0, j = len; i < len; i++, j--) {
> +				char c = str [(!reverse ? i : j - 1)];
> +				if (c < shift.Length) {
> +					if (j <= byte.MaxValue) {
> +						shift [c] = (byte)(j - 1);
> +						continue;
> +					} else {
> +						shift [c] = byte.MaxValue;
> +					}
>  				}
> +				shiftExtended [c] = j;
>  			}
> -			
>  		}
>  	    
>  		private int GetShiftDistance (char c) {
> -			if(shift == null)
> +			if (shiftExtended == null)
>  				return 1;
> +				
> +			c = GetChar (c);
> +			if (c < shift.Length) {
> +				int dist = shift [c];
> +				switch (dist) {
> +					case 0:
> +						return len + 1;
>  
> -			object s = shift [GetChar (c)];
> +					case byte.MaxValue:
> +						break;
> +
> +					default:
> +						return dist + 1;
> +				}

It's likely better to use a chained if sequnce instead of switch, though
I suppose mcs does it for you...
It looks ok to commit once we have some perf numbers.
Thanks!

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better



More information about the Mono-devel-list mailing list