[Mono-dev] mcs default encoding: Latin1 or not

Mon Aug 29 04:49:19 EDT 2005

Hi,

>> We shouldn't use non-ASCII characters inside code for identifiers but we
>> can
>> use other characters in strings and comments. Of course we could use
>> ASCII
>> but I think UTF-8 is a better solution.
>
> Actually I don't reasonably understand the reason why you can't use
> non-ASCII identifier but in general it's ok.

I said we shouldn't because we could as the compiler a the runtime supports
Unicode identifiers.

We shouldn't use them because we use English as the communication language
and use English in the code as well. Special characters are not allowed in
identifiers and allowed non-ASCII letters are not used by English.

Furthermore if you use non-Latin writing systems (CJK, Cyrillic, Greek,
Arabic, ...) a lot of developers who can read only Latin characters will not
be able to read the identifiers even if they have a text editior that
displays it correctly.

But we could use special characters represented in Unicode in comments
and/or strings for example.
And we could use names in original forms. You could use even Japanese
characters to write your name to files. But I think when using non-Latin
writing systems the name in using Latin characters should be listed as
well.:)

> What I said is that vim *on cygwin* does not support UTF-8. Actually
> there is no way on vim side to support utf-8 since cygwin itself does
> not support utf-8 console output.

It is Windows that no Unicode console. WriteConsole for example supports
Unicode output but the console itself not. If you want to take advantage of
Unicode use graphical text editors. Note that if vim has UTF-8 support you
will be able to edit files even if you see '?'s instead of characters and
you will see the correct characters if you use only characters that are
listed in the ACP of Windows.

>> At least on Windows you can open texts in any code page, edit them and
>> when
>> you save them no characters will be corrupted so you can open it again
>> using
>> the correct code page. This is true for UTF-8 as well. Some control chars
>
> This is simply not true. As I wrote before, usually Japanese text
> editors (including notepad and vs.net) don't support Latin1 encoding.

What I said is that editor will not corrupt UTF-8 text even if they don't
understand UTF-8. They will interpret the text file according to a code page
then save it using the same that means you will get the same binary
representation after a roundtrip. This is true for all the SBCS code pages.
I don't know DBCS behaviour.

Kornél