[Mono-devel-list] Re: [patch] support utf8 strings in c# module

Joel Reed joel.reed at ddiworld.com
Tue Nov 2 21:05:54 EST 2004


On Tue, Nov 02, 2004 at 06:53:32PM -0200, Rafael Teixeira wrote:
> See the /codepage option for mcs. Also mcs really try to detect from
> bytemarks if the file is in some unicode encoding (utf8, utf16 little
> endian or big endian...) but defaults to iso-8859-1 or cp1252. mbas
> still uses the Encoding.Default class and so follows the LANG
> environment variable.

thanks Rafael. what i'm referring to is .Net String's which have been
initialized with C/C++ strings via pinvoke, in particular i was examining
Marshal.PtrToStringAnsi and Marshal.PtrToStringUni.

under mono 8bit strings fed to Marshal.PtrToStringAnsi are assumed to 
be utf8, afaict, whereas under M$ .net they are assumed to be cp1252.
the "/codepage" option for mcs seems to deal with reading files, so 
i assume it would not help here. whether or not the mono team decides
to be more consistent with M$ on this API, swig will benefit, imho,
from (optional?) typemaps that support passing UTF8 strings back and 
forth between c/c++ and .Net, since utf8 is easier to deal with than
utf16 (which also isn't yet supported in swig), and since APIs working
only with the CP1252 charset doesn't get us that far anyway.

jr


> 
> My 2 bits,
> 
> 
> On Tue, 2 Nov 2004 13:54:21 -0500, Joel Reed <joel.reed at ddiworld.com> wrote:
> > > >Afaict, SWIG cvs will only work properly across M$ .net and mono for
> > > >ASCII strings. 8-bit single byte encodings will not work
> > > >properly because M$ interperts them as ANSI, whereas mono expects UTF8.
> > > >
> > > Is this something that Mono are aware of and what is specified in the C#
> > > standard? The Mono team will probably want to fix any inconsistencies
> > > with MS .NET.
> > 
> > i did ask on #mono and it was an acknowledged difference
> > that _might_ be changed if someone submitted a patch which
> > determined the charset based on LANG environmental variable.
> > But assuming 8bit strings are UTF8 happens to make working with
> > Gtk# nice for them - which i think they care about alot.
> > 
> > after thinking about the two options i actually like mono's
> > choice better than assuming 8-bit strings are ANSI (cp1252).
> > 
> > jr
> > 
> > --
> > ------------------------------------------------------------
> > Joel W. Reed                                    412-257-3881
> > ----------   http://home.comcast.net/~joelwreed/  ----------
> > _______________________________________________
> > Mono-devel-list mailing list
> > Mono-devel-list at lists.ximian.com
> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> > 
> 
> 
> -- 
> Rafael "Monoman" Teixeira
> ---------------------------------------
> Just the 'crazy' me in a sane world, or would it be the reverse? I dunno...

-- 
------------------------------------------------------------
Joel W. Reed                                    412-257-3881
----------   http://home.comcast.net/~joelwreed/  ----------



More information about the Mono-devel-list mailing list