[mono-vb] UTF-8 Processing

Rob.Tillie@Student.tUL.EDU Rob.Tillie@Student.tUL.EDU
Thu, 26 Aug 2004 11:45:22 +0200

Hey Reese,

First off all: I didn't give you the solution, I believe Rafael did :).
Second: You can best send this to the mono-list because you're using Class
Library classes. This list is for the vb compiler and vb runtime
specifically, and is very low traffic. You will get much more response on
the high-traffic mono-list.

-- Rob.

-----Original Message-----
From: Reese, Terry
To: mono-vb@lists.ximian.com
Sent: 26-8-04 0:50
Subject: [mono-vb] UTF-8 Processing

I'd forwarded these test programs to Rob, but he'd said I should post
these to this list as well.  As I mentioned in the text below, the
solution was as Rob stated, explicitly invoking a particular codepage
(in my case, 1252) to ensure my desired result.  My only concern is that
the behavior of the System.Text.Encoding.Default isn't consistant with
the .NET documentation.  The documentation states that this property
should invoke the default ANSI codepage, but on Linux, I've found this
not to be the case.  The two zipped solutions work exactly the same on
my windows machine using either the .NET or MONO runtimes.  Simply
reading the btest.txt file and then rewritting its contents creates
identifical files.  However, on two Linux systems that I've tried (one
where ASCII was the default code page and the other where I've specified
UTF-8) the text file will not be openned and then re-written without
changes.  On the ASCII default system, the Default encoding that mono
uses is the ASCII encoding, so all high ascii data is represented as ?,
while on the UTF-8 system, MONO uses UTF-8 as the Default encoding so
the high ascii data are actually dropped since they aren't in the UTF-8
characterset.  This behavior is occurs in both VB.NET and C#.  On linux,
if I specify the codepage, then it works -- but I just through I'd bring
this up since it may be a cross platform issue and I wasn't sure if this
was desired behavior.



P.S.  I'm not sure if these attachments will work.  If they don't, I've
posted them here:

-----Original Message-----
From: Rob.Tillie@Student.tUL.EDU [mailto:Rob.Tillie@Student.tUL.EDU]
Sent: Wed 8/25/2004 1:57 PM
To: Reese, Terry
Subject: RE: [mono-vb] UTF-8 Processing
Hey Terry,


Sorry, I thought you where using some VB specific functions. I'm only
in the VB runtime.

You should post the list to see if this is the desired behaviour for
Text.Encoding or if it should behave otherwise, the people who
that can answer if it is the desired behaviour or not.



-- Rob.

-----Original Message-----
From: Reese, Terry
Sent: Wed 8/25/2004 11:33 AM
To: 'Rob.Tillie@Student.tUL.EDU'
Subject: RE: [mono-vb] UTF-8 Processing

Sure, I'll get something leaned down so it just isolates the problem.
However, just as an FYI, I was able to figure out how to get it to work
like I wanted.  My assumption was that Mono would use the ANSI codepage
when setting encoding using the System.Text.Encoding.Default property.
It doesn't.  It appears that it takes whatever the system's default code
page is -- which isn't what the .NET specs specify.  To get around the
problem, I needed to specify the ansi code page using GetEncoding.
Here's how I did it:

        Dim sr As System.IO.StreamReader
        Dim sSource, sDest As String
        Dim test As String

        Console.WriteLine("Source File: ")
        sSource = Console.ReadLine()

        Dim myEncoding As System.Text.Encoding

        myEncoding = System.Text.Encoding.GetEncoding(1252)
        sr = New System.IO.StreamReader(sSource, myEncoding)
        Dim sw As System.IO.StreamWriter

I specified the codepage number (1252) and at least on Fedora, it worked
perfectly.  BTW, this isn't just something that occurs in VB.NET, but
occurs in the C# compiler as well (see the attached zips which includes
a vb and a C# project and file that I'm working with.).  Because you can
specify the code page, I'm not sure this is a problem, but it doesn't
match the default behavior on a Windows System -- which is just
something that I need to file away as a platform incompatiblity.


 <<ConsoleApplication2.zip>>  <<ConsoleApplication4.zip>>  <<btest.zip>>

-----Original Message-----
From: Rob.Tillie@Student.tUL.EDU [mailto:Rob.Tillie@Student.tUL.EDU] 
Sent: Wednesday, August 25, 2004 11:07 AM
To: Reese, Terry; mono-vb@lists.ximian.com
Subject: RE: [mono-vb] UTF-8 Processing

Hey Terry,
The VB runtime is in alpha state right now and under heavy development.
Please send your program to me, to the list, or create a bugzilla entry
for it, and I'll have a look at it.
-- Rob.

From: Reese, Terry [mailto:terry.reese@oregonstate.edu] 
Sent: Wednesday, August 25, 2004 5:39 PM
To: mono-vb@lists.ximian.com
Subject: [mono-vb] UTF-8 Processing
I have a quick question.  I've been playing with the VB runtime support
in mono and I've run into a problem between platforms.  I do most of my
development on Windows using VS.2003 but after I compile a project, I
test it against the mono runtimes and I'd created a sample console
program to test the mapping from one character encoding to another.  On
Windows, the test program works perfectly.  I'm able to pass a string of
one encoding type into my conversion assembly and have the program go
through the requisite lookup tables to pull the corresponding UTF-8
values and return the re-mapped string.  However, when I move this test
case to fedora, all the special characters are filtered out.  I'm
wondering if anyone might have any advice.

 <<ConsoleApplication2.zip>>  <<ConsoleApplication4.zip>>  <<btest.zip>>