[Mono-bugs] [Bug 76102][Nor] New - XmlTextReader fails if Unicode surrogate pair straddles 1024-character boundary

bugzilla-daemon at bugzilla.ximian.com bugzilla-daemon at bugzilla.ximian.com
Wed Sep 14 11:14:49 EDT 2005


Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.

Changed by brion at pobox.com.

http://bugzilla.ximian.com/show_bug.cgi?id=76102

--- shadow/76102	2005-09-14 11:14:49.000000000 -0400
+++ shadow/76102.tmp.28008	2005-09-14 11:14:49.000000000 -0400
@@ -0,0 +1,56 @@
+Bug#: 76102
+Product: Mono: Class Libraries
+Version: 1.1
+OS: GNU/Linux [Other]
+OS Details: Ubuntu Hoary/x86
+Status: NEW   
+Resolution: 
+Severity: 
+Priority: Normal
+Component: Sys.XML
+AssignedTo: atsushi at ximian.com                            
+ReportedBy: brion at pobox.com               
+QAContact: mono-bugs at ximian.com
+TargetMilestone: ---
+URL: 
+Cc: 
+Summary: XmlTextReader fails if Unicode surrogate pair straddles 1024-character boundary
+
+Description of Problem:
+If the input stream read by an XmlTextReader contains a high Unicode 
+character (thus a UTF-16 surrogate pair in the 16-bit char stream) 
+straddling 1024-character intervals, the pair isn't recognized properly and 
+an exception is thrown.
+
+Steps to reproduce the problem:
+1. Create an XML file with an open tag, a series of ASCII characters 
+padding out to 1023 bytes, a single high character (>=0x10000) in UTF-8, 
+and a close tag. (Sample attached)
+2. Read the .xml file with an XmlTextReader (sample program attached)
+
+Actual Results:
+Unhandled Exception: System.Xml.XmlException: Not allowed character was 
+found.  Line 1, position 1024.
+in <0x002f8> System.Xml.XmlTextReader:ReadText (Boolean notWhitespace)
+in <0x0021c> System.Xml.XmlTextReader:ReadContent ()
+in <0x00174> System.Xml.XmlTextReader:Read ()
+in <0x00070> MainClass:Main (System.String[] args)
+
+
+Expected Results:
+No exception; program should print "Done!"
+
+How often does this happen? 
+Every time, if the character straddles a 1024-character boundary. (Insert 
+or remove one "*" from the file and it will start working; insert another 
+1024 stars and it will continue to fail.)
+
+
+Additional Information:
+XmlTextReader appears to keep a 1024-char read buffer in peekChars; 
+probably the surrogates aren't handled right at this boundary.
+
+Tested on Mono SVN r49928 on Linux/x86 and 1.1.9 release on Mac OS X.
+
+This bug caused processing of an article export dump from Wikipedia to fail 
+unexpectedly.


More information about the mono-bugs mailing list