[Mono-bugs] [Bug 76247][Nor] New - XmlTextReader corrupts UTF-16 surrogate characters in strings

bugzilla-daemon at bugzilla.ximian.com bugzilla-daemon at bugzilla.ximian.com
Tue Sep 27 08:07:41 EDT 2005


Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.

Changed by brion at pobox.com.

http://bugzilla.ximian.com/show_bug.cgi?id=76247

--- shadow/76247	2005-09-27 08:07:41.000000000 -0400
+++ shadow/76247.tmp.12762	2005-09-27 08:07:41.000000000 -0400
@@ -0,0 +1,60 @@
+Bug#: 76247
+Product: Mono: Class Libraries
+Version: 1.1
+OS: 
+OS Details: Ubuntu Hoary/x86
+Status: NEW   
+Resolution: 
+Severity: 
+Priority: Normal
+Component: Sys.XML
+AssignedTo: atsushi at ximian.com                            
+ReportedBy: brion at pobox.com               
+QAContact: mono-bugs at ximian.com
+TargetMilestone: ---
+URL: 
+Cc: 
+Summary: XmlTextReader corrupts UTF-16 surrogate characters in strings
+
+Description of Problem:
+High Unicode characters (values over 0x10000) are represented in 16-bit
+strings using UTF-16 surrogate pairs of two pseudocharacters.
+
+When reading an XML file containing such characters with XmlTextReader,
+strings read by ReadString() etc include corrupted UTF-16 surrogate pairs
+instead of the correct values.
+
+
+Steps to reproduce the problem:
+1. Read a file with an XmlTextReader containing the high char U-000289c0
+2. Read a value into a string, eg with the ReadString() method
+3. Examine the characters in the string
+
+
+Actual Results:
+The character is read as the invalid pair: 0xd801, 0x65c0
+
+
+Expected Results:
+Should be read as: 0xd862, 0xddc0
+
+
+How often does this happen? 
+Every time.
+
+
+Additional Information:
+The PeekChars method correctly combines surrogate pairs in the input, but
+there are several methods that put things back into strings which all use
+an incorrect formula for creating surrogate pairs:
+
+WRONG:
+ (char) (ch / 0x10000 + 0xD800 - 1)
+ (char) (ch % 0x10000 + 0xDC00)
+
+RIGHT:
+ (char) ((ch - 0x10000) / 0x400 + 0xD800)
+ (char) ((ch - 0x10000) % 0x400 + 0xDC00)
+
+
+Tested SVN r50834 on Linux and 1.1.9 release on Mac OS X.


More information about the mono-bugs mailing list