[Mono-dev] Regular expression performance
Thomas Harning Jr.
harningt at gmail.com
Sun Sep 11 16:37:50 EDT 2005
Hello, I'm working on making a Wiki and attempting to use Regular
expressions to parse the wiki code.
I was trying to figure out how performance was affected by dealing
with the string split up (since I have areas I don't really want
parsed) and how "joining" 2 conditions together affected performance.
Splitting it up line-by-line seemed to hurt performance a bit even
though it matched less items. Probably due to memory disjointness.
When I merged 2 conditions like this for example:
(\*\*|Bob) it hurt performance by more than double what it would
take for each individually.
I tried running each part the \*\* and Bob match over the entire
string to see if it was slower than combining it... but I found that
it took 1/4 the time the merged Regex took. I woulda thought that
having it merged would allow the Regex engine to be able to scan for
a * or B and then go on from there w/o much trouble. And since it
would be going through the string once, it would be faster since
less cache-misses would be required.
I ran these tests with the "compiled" regexes on Mono.
Strange enough... I re-ran the regexes non-compiled w/ the same
performance stats. Does Mono not have compiling yet???
If anyone has any suggestions as to how to work with parsing a Wiki,
any help would be greatly appreciated.
A few notes as to how I'm doing the Wiki:
I'm parsing the wiki into a tree of elements so that output could
potentially be to things other than HTML such as re-outputting to
Wiki (to cleanup things? the regexes eat up a little bit of junk
that's output and accept a few things that are 'ok' but not in the
spec), or even an application so that the Wiki could be used offline.
I'm using Lua as a scripting language to allow for various things
to be dealt with programatically. It could allow for easy extension
of the abilities of the Wiki.
It's all coded in C# (well.. w/ Lua as a little Glue for putting
together the main pages and other stuff).
I've looked into using Jay and other parsers... but those look like
overkill. But if anyone thinks a parser/lexer could work
better/faster, please let me know.
Thomas Harning Jr.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 256 bytes
Desc: OpenPGP digital signature
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050911/2a34a97e/attachment.bin
More information about the Mono-devel-list