[Mono-dev] Optimized QuickSearch (System.Text.RegularExpressions)
Paolo Molaro
lupus at ximian.com
Wed Jun 20 12:25:08 EDT 2007
On 06/15/07 Juraj Skripsky wrote:
> The attached patch implements an optimization for QuickSearch as
> suggested by Paolo.
>
> I've split the skip table ("Hashtable shift") into two tables:
> "byte[] shift" contains skip distances for chars <= 255. "Hashtable
> shiftExtended" contains skip dist. for chars > 255 (and those having
> skip distances > 255). The second table is created only if necessary.
>
> This will make the class more efficient in both time and space (for most
> search strings).
Did you actually measure so we can see what kind of speedup there is?
> Index: System.Text.RegularExpressions/quicksearch.cs
> ===================================================================
> --- System.Text.RegularExpressions/quicksearch.cs (revision 79685)
> +++ System.Text.RegularExpressions/quicksearch.cs (working copy)
> @@ -160,44 +160,77 @@
> // private
>
> private void SetupShiftTable () {
> - shift = new Hashtable ();
> - if (reverse)
> - {
> - for (int i = len ; i > 0; -- i)
> - {
> - char c = str[i -1];
> - shift[GetChar(c)] = i;
> - }
> + bool needsExtendedTable = len > (byte.MaxValue - 1);
If I'm not mistaken, we could use a short skip (255) even if it's not optimal,
thus avoiding more uses of the hashtable.
> +
> + byte maxLowChar = 0;
> + for (int i = 0; i < len; i++) {
> + char cur = str [i];
> + if (cur <= (char)byte.MaxValue) {
> + if ((byte)cur > maxLowChar)
> + maxLowChar = (byte)cur;
> + } else
> + needsExtendedTable = true;
> }
> - else
> - {
> - for (int i = 0; i < len; ++ i)
> - {
> - char c = str[i];
> - shift[GetChar(c)] = len - i;
> +
> + shift = new byte [maxLowChar + 1];
> + if (needsExtendedTable)
> + shiftExtended = new Hashtable ();
> +
> + for (int i = 0, j = len; i < len; i++, j--) {
> + char c = str [(!reverse ? i : j - 1)];
> + if (c < shift.Length) {
> + if (j <= byte.MaxValue) {
> + shift [c] = (byte)(j - 1);
> + continue;
> + } else {
> + shift [c] = byte.MaxValue;
> + }
> }
> + shiftExtended [c] = j;
> }
> -
> }
>
> private int GetShiftDistance (char c) {
> - if(shift == null)
> + if (shiftExtended == null)
> return 1;
> +
> + c = GetChar (c);
> + if (c < shift.Length) {
> + int dist = shift [c];
> + switch (dist) {
> + case 0:
> + return len + 1;
>
> - object s = shift [GetChar (c)];
> + case byte.MaxValue:
> + break;
> +
> + default:
> + return dist + 1;
> + }
It's likely better to use a chained if sequnce instead of switch, though
I suppose mcs does it for you...
It looks ok to commit once we have some perf numbers.
Thanks!
lupus
--
-----------------------------------------------------------------
lupus at debian.org debian/rules
lupus at ximian.com Monkeys do it better
More information about the Mono-devel-list
mailing list