[Mono-list] Encoding problems
Francisco Figueiredo Jr.
fxjrlists@yahoo.com.br
Tue, 11 Jan 2005 15:50:53 -0200
Jonathan Pryor wrote:
> On Mon, 2005-01-10 at 22:31 -0200, Francisco Figueiredo Jr. wrote:
>
>>I received a report about problems with encoding on mono.
>
>
> It probably isn't Mono, but I'm willing to be proven wrong. :-)
>
:)
>>From the outset, I'm guessing that this is a codepage/charset issue. US
> English and Spanish use different codepages, and characters within one
> codepage may map to a different character in another codepage. In
> particular, only ASCII is consistent between them; everything above
> codepoint 127 will differ, and this is where "funky" characters like n-
> tilde and a-acute are placed.
>
> The only way to preserve sanity is to ensure that (1) you only use
> characters that are in both codepages (read: stick with ASCII), or
> (2) use a codepage that represents the union of all required codepages.
> That's Unicode, typically UTF-8.
>
Ok.
>
>>The following text isn't being returned correctly from database:
>>
>>Magriñá
>>
>>The chars n-tilde and a-acute is appearing as strange chars.
>>
>>On mono 1.0.4 on linux if you change LANG to en_US the text reads
>>correctly, with es_ES not.
>
>
> Is it LANG=en_US or LANG=en_US.UTF-8? The text after the '.' specifies
> the codepage to use. If the codepage isn't explicitly specified, then
> the default is used (latin1 for english, latin2 for spanish, IIRC).
> This is likely where you're experiencing problems.
>
Hmmmm, I admit that I forget to set the code page. I just tested with
en_US. I will try with en_US.UTF-8 and es_ES.UTF-8.
>
>>I tested here with svn version and with both en_US and es_ES it works.
>>Only if I export LANG= it returns wrong chars. What is the default
>>encoding when I don't set LANG?
>
>
> You say that you tested "here", which potentially implies that it's a
> different machine than the one experiencing the problem. Is this
> correct?
Yeap. It is different. Also the user reported that nothing works on windows.
>
> Regardless, the default LANG value varies between distros; in FC2 it's
> set in /etc/sysconfig/i18n (read by /etc/profile.d/lang.sh, read
> by /etc/profile, read by bash). I'm sure where it's set will also vary.
>
I'm using Gentoo. I will try to find where it is set.
> Furthermore, the only distro I'm aware of that defaults to using UTF-8
> throughout is Red Hat and associated distros such as Fedora Core. This
> may have changed (I hope so; it's been 3 years since I heard anything
> about this), but until all distros migrate to UTF-8 there will be
> behavioral differences in *any* locale-aware program. (Just look at the
> locale-related problems in Gnome and the use of G_BROKEN_FILENAMES...)
>
Ok, thx for info! :)
>
>>Do you know if there is any problem with 1.0.4 or 1.0.5 and if so if
>>there is any fix?
>
>
> The fix is to *always* specify your codepage and consistently use it.
> This may (will) require configuring your database so that it ua ses the
> correct codepage to store strings (as Aleksandar Dezelin mentioned, SQL
> Server requires the nchar data type for Unicode strings).
>
> Mono isn't a mind reader, and can't tell what codepage a given string is
> in. It's up to you to ensure codepages are correct and consistent.
>
Ok, I will do some more tests to check what can be happening. I think
that the problem is what you said about the encoding of the system.
Thanks for your info.
Regards,
Francisco Figueiredo Jr.