[Mono-list] System.Text.RegularExpressions

Dan Lewis dihlewis@yahoo.co.uk
29 Jan 2002 10:30:47 +0000


I've just about completed an implementation of the .NET regular
expressions package, System.Text.RegularExpressions. It's not part of
the corlib AFAIK, but is still a pretty useful package (how come Sun
never made a regex class for Java?). For those not familiar with the
Microsoft regular expressions library, it's got quite a lot of features

  * full bi-directional matching
  * unrestricted lookahead/lookbehind
  * optimizing constructs: lazy repeats, non-backtracking
  * named groups, backreferences and substitutions
  * capture histories for repeating groups
  * unicode character classes and block ranges
  * extended syntax with whitespace and comments
  * optional ECMAScript compliance

The only thing I haven't done yet is compilation of patterns to CIL.
This is something I plan to add when the interpreter stabilizes a bit.

Although I haven't really profiled the package under a large corpus yet,
I included some of the more common regex optimizations. Currently the
compiler and interpreter support anchor points, simplified repeats and
boyer-moore style fast substring matching. We'll be able to tweak the
code later as more profiling data is collected, and perhaps turn it into
the behemoth that Perl's (extremely fast) regexec.c has become.

At the moment I'm using the Perl regex test suite to verify behavior
against the .NET implementation. I had to take some tests out (mostly
the ones using the ?{...} evaluation construct) for .NET compliance, but
after that my code passes all 727 tests. Since they're Perl tests
however, there's a lot of .NET-specific features they don't exercise. So
if anyone is feeling brave and would like to try using this package in
their code, I'd be grateful of the bug reports. I'm sure there's loads
of little things I've missed.


PS how do I go about contributing this code to the class library?

Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com