[Mono-dev] Regular expression performance

Thomas Harning Jr. harningt at gmail.com
Sun Sep 11 16:37:50 EDT 2005


Hello, I'm working on making a Wiki and attempting to use Regular
expressions to parse the wiki code.

I was trying to figure out how performance was affected by dealing
with the string split up (since I have areas I don't really want
parsed) and how "joining" 2 conditions together affected performance.

Splitting it up line-by-line seemed to hurt performance a bit even
though it matched less items.  Probably due to memory disjointness.

When I merged 2 conditions like this for example:
 (\*\*|Bob)    it hurt performance by more than double what it would
take for each individually.
 I tried running each part the \*\* and Bob match over the entire
string to see if it was slower than combining it... but I found that
it took 1/4 the time the merged Regex took.  I woulda thought that
having it merged would allow the Regex engine to be able to scan for
a * or B and then go on from there w/o much trouble. And since it
would be going through the string once, it would be faster since
less cache-misses would be required.
I ran these tests with the "compiled" regexes on Mono.
Strange enough... I re-ran the regexes non-compiled w/ the same
performance stats.  Does Mono not have compiling yet???

If anyone has any suggestions as to how to work with parsing a Wiki,
any help would be greatly appreciated.
A few notes as to how I'm doing the Wiki:
  I'm parsing the wiki into a tree of elements so that output could
potentially be to things other than HTML such as re-outputting to
Wiki (to cleanup things?  the regexes eat up a little bit of junk
that's output and accept a few things that are 'ok' but not in the
spec), or even an application so that the Wiki could be used offline.
  I'm using Lua as a scripting language to allow for various things
to be dealt with programatically.  It could allow for easy extension
of the abilities of the Wiki.
  It's all coded in C# (well.. w/ Lua as a little Glue for putting
together the main pages and other stuff).


I've looked into using Jay and other parsers... but those look like
overkill.  But if anyone thinks a parser/lexer could work
better/faster, please let me know.
Thanks!
-- 
Thomas Harning Jr.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050911/2a34a97e/attachment.bin 


More information about the Mono-devel-list mailing list