[Mono-list] Encoding problems

Francisco Figueiredo Jr. fxjrlists@yahoo.com.br
Tue, 11 Jan 2005 15:50:53 -0200


Jonathan Pryor wrote:
> On Mon, 2005-01-10 at 22:31 -0200, Francisco Figueiredo Jr. wrote:
> 
>>I received a report about problems with encoding on mono.
> 
> 
> It probably isn't Mono, but I'm willing to be proven wrong. :-)
> 

:)

>>From the outset, I'm guessing that this is a codepage/charset issue.  US
> English and Spanish use different codepages, and characters within one
> codepage may map to a different character in another codepage.  In
> particular, only ASCII is consistent between them; everything above
> codepoint 127 will differ, and this is where "funky" characters like n-
> tilde and a-acute are placed.
> 

> The only way to preserve sanity is to ensure that (1) you only use
> characters that are in both codepages (read: stick with ASCII), or 
> (2) use a codepage that represents the union of all required codepages.
> That's Unicode, typically UTF-8.
> 

Ok.

> 
>>The following text isn't being returned correctly from database:
>>
>>Magriñá
>>
>>The chars n-tilde and a-acute is appearing as strange chars.
>>
>>On mono 1.0.4 on linux if you change LANG to en_US the text reads
>>correctly, with es_ES not.
> 
> 
> Is it LANG=en_US or LANG=en_US.UTF-8?  The text after the '.' specifies
> the codepage to use.  If the codepage isn't explicitly specified, then
> the default is used (latin1 for english, latin2 for spanish, IIRC).
> This is likely where you're experiencing problems.
> 

Hmmmm, I admit that I forget to set the code page. I just tested with 
en_US. I will try with en_US.UTF-8 and es_ES.UTF-8.


> 
>>I tested here with svn version and with both en_US and es_ES it works.
>>Only if I export LANG= it returns wrong chars. What is the default
>>encoding when I don't set LANG?
> 
> 
> You say that you tested "here", which potentially implies that it's a
> different machine than the one experiencing the problem.  Is this
> correct?

Yeap. It is different. Also the user reported that nothing works on windows.

> 
> Regardless, the default LANG value varies between distros; in FC2 it's
> set in /etc/sysconfig/i18n (read by /etc/profile.d/lang.sh, read
> by /etc/profile, read by bash).  I'm sure where it's set will also vary.
> 

I'm using Gentoo. I will try to find where it is set.

> Furthermore, the only distro I'm aware of that defaults to using UTF-8
> throughout is Red Hat and associated distros such as Fedora Core.  This
> may have changed (I hope so; it's been 3 years since I heard anything
> about this), but until all distros migrate to UTF-8 there will be
> behavioral differences in *any* locale-aware program.  (Just look at the
> locale-related problems in Gnome and the use of G_BROKEN_FILENAMES...)
> 

Ok, thx for info! :)

> 
>>Do you know if there is any problem with 1.0.4 or 1.0.5 and if so if
>>there is any fix?
> 
> 
> The fix is to *always* specify your codepage and consistently use it.
> This may (will) require configuring your database so that it ua ses the
> correct codepage to store strings (as Aleksandar Dezelin mentioned, SQL
> Server requires the nchar data type for Unicode strings).
> 
> Mono isn't a mind reader, and can't tell what codepage a given string is
> in.  It's up to you to ensure codepages are correct and consistent.
> 

Ok, I will do some more tests to check what can be happening. I think 
that the problem is what you said about the encoding of the system.

Thanks for your info.

Regards,

Francisco Figueiredo Jr.