[Mono-bugs] [Bug 76102][Nor] New - XmlTextReader fails if Unicode surrogate pair straddles 1024-character boundary

bugzilla-daemon at bugzilla.ximian.com bugzilla-daemon at bugzilla.ximian.com
Wed Sep 14 11:14:49 EDT 2005

Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.

Changed by brion at pobox.com.


--- shadow/76102	2005-09-14 11:14:49.000000000 -0400
+++ shadow/76102.tmp.28008	2005-09-14 11:14:49.000000000 -0400
@@ -0,0 +1,56 @@
+Bug#: 76102
+Product: Mono: Class Libraries
+Version: 1.1
+OS: GNU/Linux [Other]
+OS Details: Ubuntu Hoary/x86
+Status: NEW   
+Priority: Normal
+Component: Sys.XML
+AssignedTo: atsushi at ximian.com                            
+ReportedBy: brion at pobox.com               
+QAContact: mono-bugs at ximian.com
+TargetMilestone: ---
+Summary: XmlTextReader fails if Unicode surrogate pair straddles 1024-character boundary
+Description of Problem:
+If the input stream read by an XmlTextReader contains a high Unicode 
+character (thus a UTF-16 surrogate pair in the 16-bit char stream) 
+straddling 1024-character intervals, the pair isn't recognized properly and 
+an exception is thrown.
+Steps to reproduce the problem:
+1. Create an XML file with an open tag, a series of ASCII characters 
+padding out to 1023 bytes, a single high character (>=0x10000) in UTF-8, 
+and a close tag. (Sample attached)
+2. Read the .xml file with an XmlTextReader (sample program attached)
+Actual Results:
+Unhandled Exception: System.Xml.XmlException: Not allowed character was 
+found.  Line 1, position 1024.
+in <0x002f8> System.Xml.XmlTextReader:ReadText (Boolean notWhitespace)
+in <0x0021c> System.Xml.XmlTextReader:ReadContent ()
+in <0x00174> System.Xml.XmlTextReader:Read ()
+in <0x00070> MainClass:Main (System.String[] args)
+Expected Results:
+No exception; program should print "Done!"
+How often does this happen? 
+Every time, if the character straddles a 1024-character boundary. (Insert 
+or remove one "*" from the file and it will start working; insert another 
+1024 stars and it will continue to fail.)
+Additional Information:
+XmlTextReader appears to keep a 1024-char read buffer in peekChars; 
+probably the surrogates aren't handled right at this boundary.
+Tested on Mono SVN r49928 on Linux/x86 and 1.1.9 release on Mac OS X.
+This bug caused processing of an article export dump from Wikipedia to fail 

More information about the mono-bugs mailing list