[Mono-list] Regular expressions help

Loren Bandiera lorenb at mmgsecurity.com
Tue Oct 11 14:51:29 EDT 2005

I'm trying to write a parser for the Mozilla/Firefox history file. The
format of the file is very ugly.

The first starts off with a comment marking the version:

// <!-- <mdb:mork:z v="1.4"/> -->

Then you get the table that defines what the fields mean:

< <(a=c)> // (f=iso-8859-1)

After that you start to get into the data:

    =L$00o$00r$00e$00n$00 $00B$00a$00n$00d$00i$00e$00r$00a$00$19 s$00

This is where I start to run into problems. I want to extract that block
of data which appears to be in the format:

<(key=value)(...repeating pattern...)>

I read the file into a string and then get rid of the first line comment.
Next I use the following Regex to get the key table:

Regex keyTable = new Regex (@"\s*<\(a=c\)>\s*(?:\/\/)?\s*(\(.+?\))\s*>",
   RegexOptions.Compiled | RegexOptions.Singleline);

m = keyTable.Match (morkData);

I can then use the Match and parse the table fine. The next thing I do is
create a substring starting from where the key table ends to the rest of
the data I read from the file.

I then use the following Regex to pull out the value table:

Regex valueTable = new Regex (@"<\s*(\(.+?\))\s*>",
   RegexOptions.Compiled | RegexOptions.Singleline);

sub = morkData.Substring (pos);
m = valueTable.Match (sub);

This doesn't work at all. I get a chuck of the data (around 3623 bytes)
but I'm expecting more like 800,000. The strange thing is the last part of
the string I get back is :

"// <!-- <mdb:mork:z v="1.4"/> -->< <(a=c)>"

That shouldn't even be there. I'm not sure where that is coming from.

I get the same output on Mono as I do with MS.NET so it appears the
problem is something I'm doing.

I've tried looking at the some of the other solutions to this problem and
see what they do:


But that didn't really help. Does anyone have any suggestions on how I can
extract that value data ("<(key=value)(...)>") from the string?

Loren Bandiera, CISSP <lorenb at mmgsecurity.com>
MMG Security, Inc.

More information about the Mono-list mailing list