Fwd: [mono-vb] UTF-8 Processing

Rafael Teixeira Rafael Teixeira <monoman@gmail.com>
Wed, 25 Aug 2004 15:27:49 -0300


Sorry, didn't reply to the list.


---------- Forwarded message ----------
From: Rafael Teixeira <monoman@gmail.com>
Date: Wed, 25 Aug 2004 15:26:38 -0300
Subject: Re: [mono-vb] UTF-8 Processing
To: "Reese, Terry" <terry.reese@oregonstate.edu>

You may try to use the GetEncoding member, with the windows codepage
1252, or other if you now it.

Console.WriteLine("Source File: ")
sSource = Console.ReadLine()
sr = New System.IO.StreamReader(sSource,
System.Text.Encoding.GetEncoding(1252))

Nevertheless the Read* methods will return UTF-16 unicode characters,
as that is the reason you have to pass the encoding to the constructor
(the default is utf-8).

If you REALLY want to access the bytes as they are in the file, and
translate yourself (maybe you are using an unsupported encoding) you
have to use a BinaryReader, probably using the ReadBytes method (or
the ReadByte method maybe), to read bytes that you will then map to
bytes or characters, as needed.

But don't try to reimplement what the class library does for you,
unless it is really needed.

Just for you to memorize: Mono/.NET chars are 16 bit UTF-16 encoded
unicode chars (always), and StreamReader/StreamWriter already do the
appropriate conversions when reading/writing from/to text files.

Hope it is clear now,



----- Original Message -----
From: Reese, Terry <terry.reese@oregonstate.edu>
Date: Wed, 25 Aug 2004 11:03:22 -0700
Subject: RE: [mono-vb] UTF-8 Processing
To: Rafael Teixeira <monoman@gmail.com>

I sent a code snippet in my second email, but here's how I'm reading
the files again:

Dim objChar As meCharacterSet.MARCDictionary
Dim sr As System.IO.StreamReader
Dim sSource, sDest As String

Console.WriteLine("Source File: ")
sSource = Console.ReadLine()
sr = New System.IO.StreamReader(sSource, System.Text.Encoding.Default)

The problem is definitely related to the codepage on the system.  So
here's my question -- how then would you get just plain ansi text from
the StreamReader class.  I tried using the GetEncoding function to
retrieve the encoding page and then try to re-set it, but still no
luck.  Since my files use an encoded ANSI stream, its important that I
be able to read the unencoded characters (which is what the windows
ANSI method appears to allow)

--Terry

>-----Original Message-----
>From: Rafael Teixeira [mailto:monoman@gmail.com]
>Sent: Wednesday, August 25, 2004 10:54 AM
>To: Reese, Terry
>Subject: Re: [mono-vb] UTF-8 Processing
>
>
>Could you show your code? At least the specific part where you
>get the unconverted strings?
>
>If you are using the console input probably Linux will do some
>filtering itself, as it always happen to follow what is set in
>LC_ALL. Normally in Fedora it is set to something like
>en_us.utf-8, what means that everything you type is already
>translated to utf-8, that have explicit rules for the
>multibyte sequences, so that it could not even look like some
>other encoding.
>
>Hope it helps,
>
>----- Original Message -----
>From: Reese, Terry <terry.reese@oregonstate.edu>
>Date: Wed, 25 Aug 2004 08:38:45 -0700
>Subject: [mono-vb] UTF-8 Processing
>To: mono-vb@lists.ximian.com
>
>
>
>I have a quick question.  I've been playing with the VB
>runtime support in mono and I've run into a problem between
>platforms.  I do most of my development on Windows using
>VS.2003 but after I compile a project, I test it against the
>mono runtimes and I'd created a sample console program to test
>the mapping from one character encoding to another.  On
>Windows, the test program works perfectly.  I'm able to pass a
>string of one encoding type into my conversion assembly and
>have the program go through the requisite lookup tables to
>pull the corresponding UTF-8 values and return the re-mapped
>string.  However, when I move this test case to fedora, all
>the special characters are filtered out.  I'm wondering if
>anyone might have any advice.
>
>Thanks,
>
>--Terry
>
>
>
>--
>Rafael "Monoman" Teixeira
>---------------------------------------
>Cognition is not a representation of an objectively existing
>world but is a bringing forth of a world in the process of living.
>-- Fritjof Capra, citing
>   Humberto Maturana and Francisco Varella's "Santiago Theory
>of Cognition"
>

--
Rafael "Monoman" Teixeira
---------------------------------------
Cognition is not a representation of an objectively existing world
but is a bringing forth of a world in the process of living.
-- Fritjof Capra, citing
  Humberto Maturana and Francisco Varella's "Santiago Theory of Cognition"



-- 
Rafael "Monoman" Teixeira
---------------------------------------
Cognition is not a representation of an objectively existing world
but is a bringing forth of a world in the process of living.
-- Fritjof Capra, citing 
   Humberto Maturana and Francisco Varella's "Santiago Theory of Cognition"