[Mono-bugs] [Bug 76247][Nor] New - XmlTextReader corrupts UTF-16
surrogate characters in strings
bugzilla-daemon at bugzilla.ximian.com
bugzilla-daemon at bugzilla.ximian.com
Tue Sep 27 08:07:41 EDT 2005
Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.
Changed by brion at pobox.com.
http://bugzilla.ximian.com/show_bug.cgi?id=76247
--- shadow/76247 2005-09-27 08:07:41.000000000 -0400
+++ shadow/76247.tmp.12762 2005-09-27 08:07:41.000000000 -0400
@@ -0,0 +1,60 @@
+Bug#: 76247
+Product: Mono: Class Libraries
+Version: 1.1
+OS:
+OS Details: Ubuntu Hoary/x86
+Status: NEW
+Resolution:
+Severity:
+Priority: Normal
+Component: Sys.XML
+AssignedTo: atsushi at ximian.com
+ReportedBy: brion at pobox.com
+QAContact: mono-bugs at ximian.com
+TargetMilestone: ---
+URL:
+Cc:
+Summary: XmlTextReader corrupts UTF-16 surrogate characters in strings
+
+Description of Problem:
+High Unicode characters (values over 0x10000) are represented in 16-bit
+strings using UTF-16 surrogate pairs of two pseudocharacters.
+
+When reading an XML file containing such characters with XmlTextReader,
+strings read by ReadString() etc include corrupted UTF-16 surrogate pairs
+instead of the correct values.
+
+
+Steps to reproduce the problem:
+1. Read a file with an XmlTextReader containing the high char U-000289c0
+2. Read a value into a string, eg with the ReadString() method
+3. Examine the characters in the string
+
+
+Actual Results:
+The character is read as the invalid pair: 0xd801, 0x65c0
+
+
+Expected Results:
+Should be read as: 0xd862, 0xddc0
+
+
+How often does this happen?
+Every time.
+
+
+Additional Information:
+The PeekChars method correctly combines surrogate pairs in the input, but
+there are several methods that put things back into strings which all use
+an incorrect formula for creating surrogate pairs:
+
+WRONG:
+ (char) (ch / 0x10000 + 0xD800 - 1)
+ (char) (ch % 0x10000 + 0xDC00)
+
+RIGHT:
+ (char) ((ch - 0x10000) / 0x400 + 0xD800)
+ (char) ((ch - 0x10000) % 0x400 + 0xDC00)
+
+
+Tested SVN r50834 on Linux and 1.1.9 release on Mac OS X.
More information about the mono-bugs
mailing list