[Mono-dev] Maybe a System.Data.OracleClient.dll bug
    Jonathan Pryor 
    jonpryor at vt.edu
       
    Mon Sep 25 20:51:41 EDT 2006
    
    
  
On Mon, 2006-09-25 at 21:36 -0300, Rafael Teixeira wrote:
> Just some info, UTF-8 for Unicode 3.x, goes up to 6 bytes per character.
> 
> :|
IIRC, UTF-8 should never be 6 bytes per character.  It *can* be, to
encode the entire 31-bit address space of UCS-4, but since IIRC they
limited Unicode & ISO 10646 to be a 21-bit character set you'll
generally only see 3-byte long UTF-8 sequences as a maximum.
(They limited it to 21-bit sequences for UTF-16, as the "escape"
mechanism within UTF-16 can only support up to a 21 bit space.)
Regardless, allocating two bytes/character isn't valid.  You'd need at
least 4, assuming a UTF-8 encoding.
The "correct" value should be returned by Encoding.GetMaxByteCount().
 - Jon
    
    
More information about the Mono-devel-list
mailing list