[Mono-bugs] [Bug 541823] Regex class doesn't match when it should

bugzilla_noreply at novell.com bugzilla_noreply at novell.com
Sun Jun 12 18:05:33 EDT 2011


https://bugzilla.novell.com/show_bug.cgi?id=541823

https://bugzilla.novell.com/show_bug.cgi?id=541823#c5


--- Comment #5 from Michael Letterle <michael.letterle at gmail.com> 2011-06-12 22:05:31 UTC ---
What appears to be happening is alternations are more eager then the .NET
implementation (or most other ones for that matter), in the case where the
alternation is optional.

Consider the following regex: (a|ab)?c

Given the following input: abc

NET, ruby, and JavaScript all match "abc" however, Mono only matches "c".  It
seems when interpreting the alternation, it gets to the first 'a', looks to see
if it's next to c, since it's not AND the alternation is optional, it skips the
entire alternation.  Removing the optional flag to the alternation (i.e. making
the regex '(a|ab)c') causes abc to be matched. Note that this occurs with both
the CIL compiled regexs and the interpreted regexs, which lead me to believe
it's a problem in the parser.

Outputs of tracing information with both the old Regex compiler and the CIL
compiler:


mono RegexTester.exe "(a|ab)?c" "abc" 
Regex:'(a|ab)?c'
Input:'abc'
    info group count 1 match_min 1 match_max 3
    anchor reverse False offset 0 tail L1
    true
L1:    repeat min 0 max 1 lazy False until L2
    open 1
    branch next L4
    character a negate False ignore False reverse False
    jmp target L3
L4:    branch next L5
    string 'ab' ignore False reverse False
    jmp target L3
L5:    false
L3:    close 1
L2:    end until L2
    character c negate False ignore False reverse False
    true
True: '(a|ab)?c' => 'c'

 env MONO_NEW_RX=1 MONO_TRACE_RX=1 MONO_TRACE_RX_COMPILE=1 mono RegexTester.exe
"(a|ab)?c" "abc" 
Regex:'(a|ab)?c'
Input:'abc'
evaluating: Anchor at pc: 11, strpos: 0, cge: 0
evaluating: Repeat at pc: 17, strpos: 0, cge: 0
evaluating: Until at pc: 53, strpos: 0, cge: 0
recurse with count 1.
evaluating: OpenGroup at pc: 28, strpos: 0, cge: 0
evaluating: Branch at pc: 31, strpos: 0, cge: 0
evaluating: Char at pc: 34, strpos: 0, cge: 0
evaluating: Jump at pc: 36, strpos: 1, cge: 0
evaluating: CloseGroup at pc: 50, strpos: 1, cge: 0
evaluating: Until at pc: 53, strpos: 1, cge: 0
matching tail: 1 pc=54
evaluating: Char at pc: 54, strpos: 1, cge: 0
backtracking to 0 expr=28 pc=53
evaluating: Char at pc: 54, strpos: 0, cge: 0
evaluating: Repeat at pc: 17, strpos: 1, cge: 0
evaluating: Until at pc: 53, strpos: 1, cge: 0
recurse with count 1.
evaluating: OpenGroup at pc: 28, strpos: 1, cge: 0
evaluating: Branch at pc: 31, strpos: 1, cge: 0
evaluating: Char at pc: 34, strpos: 1, cge: 0
evaluating: Branch at pc: 39, strpos: 1, cge: 0
evaluating: String at pc: 42, strpos: 1, cge: 0
evaluating: False at pc: 49, strpos: 1, cge: 0
matching tail: 1 pc=54
evaluating: Char at pc: 54, strpos: 1, cge: 0
evaluating: Repeat at pc: 17, strpos: 2, cge: 0
evaluating: Until at pc: 53, strpos: 2, cge: 0
recurse with count 1.
evaluating: OpenGroup at pc: 28, strpos: 2, cge: 0
evaluating: Branch at pc: 31, strpos: 2, cge: 0
evaluating: Char at pc: 34, strpos: 2, cge: 0
evaluating: Branch at pc: 39, strpos: 2, cge: 0
evaluating: String at pc: 42, strpos: 2, cge: 0
evaluating: False at pc: 49, strpos: 2, cge: 0
matching tail: 2 pc=54
evaluating: Char at pc: 54, strpos: 2, cge: 0
evaluating: True at pc: 56, strpos: 3, cge: 0
True: '(a|ab)?c' => 'c'

references:

http://www.regular-expressions.info/alternation.html
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
http://www.rubular.com/
http://regexpal.com/

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.


More information about the mono-bugs mailing list