[Mono-dev] Maybe a System.Data.OracleClient.dll bug

Jonathan Pryor jonpryor at vt.edu
Mon Sep 25 20:51:41 EDT 2006


On Mon, 2006-09-25 at 21:36 -0300, Rafael Teixeira wrote:
> Just some info, UTF-8 for Unicode 3.x, goes up to 6 bytes per character.
> 
> :|

IIRC, UTF-8 should never be 6 bytes per character.  It *can* be, to
encode the entire 31-bit address space of UCS-4, but since IIRC they
limited Unicode & ISO 10646 to be a 21-bit character set you'll
generally only see 3-byte long UTF-8 sequences as a maximum.

(They limited it to 21-bit sequences for UTF-16, as the "escape"
mechanism within UTF-16 can only support up to a 21 bit space.)

Regardless, allocating two bytes/character isn't valid.  You'd need at
least 4, assuming a UTF-8 encoding.

The "correct" value should be returned by Encoding.GetMaxByteCount().

 - Jon





More information about the Mono-devel-list mailing list