[Mono-dev] Optimized QuickSearch (System.Text.RegularExpressions)

Juraj Skripsky js at hotfeet.ch
Thu Jun 21 08:20:35 EDT 2007


On Wed, 2007-06-20 at 18:25 +0200, Paolo Molaro wrote:
> On 06/15/07 Juraj Skripsky wrote:
> > The attached patch implements an optimization for QuickSearch as
> > suggested by Paolo.
> > 
> > I've split the skip table ("Hashtable shift") into two tables:
> > "byte[] shift" contains skip distances for chars <= 255. "Hashtable
> > shiftExtended" contains skip dist. for chars > 255 (and those having
> > skip distances > 255). The second table is created only if necessary.
> > 
> > This will make the class more efficient in both time and space (for most
> > search strings).
> 
> Did you actually measure so we can see what kind of speedup there is?

Running the attached test as follows...

wget http://en.wikipedia.org/wiki/Linux

a) mono RegexPerf.exe 10000 Linux "open source"
b) mono RegexPerf.exe 10000 Linux "open source" true

gives us following results (ticks per run)

     SVN    Patched
a)  1830        470
b)  1750        750

In a) we search for all occurrences of "open source" in the html from
"http://en.wikipedia.org/wiki/Linux". In b) wo do the same after
replacing "e" and "o" by "\x0411\x04AA" and "\x02A0\x03E0" in both the
html and the search string.


> > Index: System.Text.RegularExpressions/quicksearch.cs
> > ===================================================================
> > --- System.Text.RegularExpressions/quicksearch.cs	(revision 79685)
> > +++ System.Text.RegularExpressions/quicksearch.cs	(working copy)
> > @@ -160,44 +160,77 @@
> >  			// private
> >  
> >  		private void SetupShiftTable () {
> > -			shift = new Hashtable ();
> > -			if (reverse)
> > -			{
> > -				for (int i = len ; i > 0; -- i) 
> > -				{
> > -					char c = str[i -1];
> > -					shift[GetChar(c)] = i;
> > -				}
> > +			bool needsExtendedTable = len > (byte.MaxValue - 1);
> 
> If I'm not mistaken, we could use a short skip (255) even if it's not optimal,
> thus avoiding more uses of the hashtable.

I think an array of bytes handles most cases just fine as search strings
longer than 255 chars _and_ resulting in skip distances > 255 are very
seldom. 


> > +				
> > +			c = GetChar (c);
> > +			if (c < shift.Length) {
> > +				int dist = shift [c];
> > +				switch (dist) {
> > +					case 0:
> > +						return len + 1;
> >  
> > -			object s = shift [GetChar (c)];
> > +					case byte.MaxValue:
> > +						break;
> > +
> > +					default:
> > +						return dist + 1;
> > +				}
> 
> It's likely better to use a chained if sequnce instead of switch, though
> I suppose mcs does it for you...

Yes, I've replaced with "if"s making sure the most important comparison
is done first.


> It looks ok to commit once we have some perf numbers.
> Thanks!

System.dll with the patch applied passes all unit tests. May I commit?


> lupus
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: quicksearch.patch
Type: text/x-patch
Size: 3078 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20070621/a95e142e/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RegexPerf.cs
Type: text/x-csharp
Size: 845 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20070621/a95e142e/attachment-0001.bin 


More information about the Mono-devel-list mailing list