[Mono-devel-list] Problems with UTF-8 Decoder
monoman at gmail.com
Mon Feb 28 13:53:20 EST 2005
You are using outdated documentation for the utf-8 standard as of
unicode 3.x, we have more than 1 million codepoints (20 bits) and
utf-8 was extended to expand some of those in 5 or 6 bytes.
Get some updated documentation.
Also from the top of my mind \uFEFF is the continuation prefix in
utf-16, that is what CLI strings contain, if so, you trying to give
the encoder an invalid character...
On Sun, 27 Feb 2005 13:07:58 +0200, Svetlana Zholkovsky
<svetlanaz at mainsoft.com> wrote:
> Hi, All!
> I am using a UTF-8 Encoding to encode/decode the following unicode strings:
> The encoding works fine and code looks like exact implementation of RFC
> 3629 spec, but the decoder
> does not return original characters.
> The character "\uFEFF" (bytes FE BB BF) does not returned
> at all.
> I've checked the UTF8Encoding.cs - and I have admit that in opposite to
> encoder - decoder does some strange logic which tries to decode
> sequences of 5 or 6 bytes (the standard defines only 1 - 4 bytes
> sequences for the valid Unicode characters)
> So, before I'll try to fix the problem - may be someone can clarify me
> the current UTF-8 decoder implementation logic?
> I've opened a bug http://bugzilla.ximian.com/show_bug.cgi?id=73086 on
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
Rafael "Monoman" Teixeira
I'm trying to become a "Rosh Gadol" before my own eyes.
See http://www.joelonsoftware.com/items/2004/12/06.html for enlightment.
More information about the Mono-devel-list