[Mono-bugs] [Bug 76102][Nor] New - XmlTextReader fails if Unicode
surrogate pair straddles 1024-character boundary
bugzilla-daemon at bugzilla.ximian.com
bugzilla-daemon at bugzilla.ximian.com
Wed Sep 14 11:14:49 EDT 2005
Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.
Changed by brion at pobox.com.
http://bugzilla.ximian.com/show_bug.cgi?id=76102
--- shadow/76102 2005-09-14 11:14:49.000000000 -0400
+++ shadow/76102.tmp.28008 2005-09-14 11:14:49.000000000 -0400
@@ -0,0 +1,56 @@
+Bug#: 76102
+Product: Mono: Class Libraries
+Version: 1.1
+OS: GNU/Linux [Other]
+OS Details: Ubuntu Hoary/x86
+Status: NEW
+Resolution:
+Severity:
+Priority: Normal
+Component: Sys.XML
+AssignedTo: atsushi at ximian.com
+ReportedBy: brion at pobox.com
+QAContact: mono-bugs at ximian.com
+TargetMilestone: ---
+URL:
+Cc:
+Summary: XmlTextReader fails if Unicode surrogate pair straddles 1024-character boundary
+
+Description of Problem:
+If the input stream read by an XmlTextReader contains a high Unicode
+character (thus a UTF-16 surrogate pair in the 16-bit char stream)
+straddling 1024-character intervals, the pair isn't recognized properly and
+an exception is thrown.
+
+Steps to reproduce the problem:
+1. Create an XML file with an open tag, a series of ASCII characters
+padding out to 1023 bytes, a single high character (>=0x10000) in UTF-8,
+and a close tag. (Sample attached)
+2. Read the .xml file with an XmlTextReader (sample program attached)
+
+Actual Results:
+Unhandled Exception: System.Xml.XmlException: Not allowed character was
+found. Line 1, position 1024.
+in <0x002f8> System.Xml.XmlTextReader:ReadText (Boolean notWhitespace)
+in <0x0021c> System.Xml.XmlTextReader:ReadContent ()
+in <0x00174> System.Xml.XmlTextReader:Read ()
+in <0x00070> MainClass:Main (System.String[] args)
+
+
+Expected Results:
+No exception; program should print "Done!"
+
+How often does this happen?
+Every time, if the character straddles a 1024-character boundary. (Insert
+or remove one "*" from the file and it will start working; insert another
+1024 stars and it will continue to fail.)
+
+
+Additional Information:
+XmlTextReader appears to keep a 1024-char read buffer in peekChars;
+probably the surrogates aren't handled right at this boundary.
+
+Tested on Mono SVN r49928 on Linux/x86 and 1.1.9 release on Mac OS X.
+
+This bug caused processing of an article export dump from Wikipedia to fail
+unexpectedly.
More information about the mono-bugs
mailing list