[Mono-dev] Maybe a System.Data.OracleClient.dll bug
Jonathan Pryor
jonpryor at vt.edu
Mon Sep 25 20:51:41 EDT 2006
On Mon, 2006-09-25 at 21:36 -0300, Rafael Teixeira wrote:
> Just some info, UTF-8 for Unicode 3.x, goes up to 6 bytes per character.
>
> :|
IIRC, UTF-8 should never be 6 bytes per character. It *can* be, to
encode the entire 31-bit address space of UCS-4, but since IIRC they
limited Unicode & ISO 10646 to be a 21-bit character set you'll
generally only see 3-byte long UTF-8 sequences as a maximum.
(They limited it to 21-bit sequences for UTF-16, as the "escape"
mechanism within UTF-16 can only support up to a 21 bit space.)
Regardless, allocating two bytes/character isn't valid. You'd need at
least 4, assuming a UTF-8 encoding.
The "correct" value should be returned by Encoding.GetMaxByteCount().
- Jon
More information about the Mono-devel-list
mailing list