[Mono-devel-list] Problems with UTF-8 Decoder
Svetlana Zholkovsky
svetlanaz at mainsoft.com
Sun Feb 27 06:07:58 EST 2005
Hi, All!
I am using a UTF-8 Encoding to encode/decode the following unicode strings:
"\u4f00\u302a\ud800\udc00\u4f01",
"\uFEFF",
"\u0041\u2262\u0391\u002e",
"\ud55c\uad6d\uc5b4",
"\u65e5\u672c\u8a9e",
"\ufeff\u233b4"
The encoding works fine and code looks like exact implementation of RFC
3629 spec, but the decoder
does not return original characters.
The character "\uFEFF" (bytes FE BB BF) does not returned
at all.
I've checked the UTF8Encoding.cs - and I have admit that in opposite to
encoder - decoder does some strange logic which tries to decode
sequences of 5 or 6 bytes (the standard defines only 1 - 4 bytes
sequences for the valid Unicode characters)
So, before I'll try to fix the problem - may be someone can clarify me
the current UTF-8 decoder implementation logic?
I've opened a bug http://bugzilla.ximian.com/show_bug.cgi?id=73086 on
UTF-8.
Thanks,
Svetlana.
More information about the Mono-devel-list
mailing list