[Mono-list] Unhandled Exception: System.ArgumentException: Arg_InvalidUTF8

btouchet btouchet@drakonis.dyndns.org
27 Jan 2003 22:43:11 -0500


--=-QC6m/r6JYl0ngpyHfLGH
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

I have application which generates the following message when trying to
read a line with a character with a value greater than 0x7f (in this
particular case it is the (C) symbol in a text file):

Unhandled Exception: System.ArgumentException: Arg_InvalidUTF8
Parameter name: bytes
in <0x003a2> 00 System.Text.UTF8Encoding:InternalGetChars
(byte[],int,int,char[],int,uint&,uint&,bool,bool)
in <0x00039> 00 .UTF8Decoder:GetChars (byte[],int,int,char[],int)
in <0x00374> 00 System.IO.StreamReader:ReadBuffer ()
in <0x00088> 00 System.IO.StreamReader:Read ()
in <0x00049> 00 System.IO.StreamReader:ReadLine ()
in <0x002c3> 00 lc.Class1:ConvertFile (System.IO.FileInfo)
in <0x001cc> 00 lc.Class1:Main (string[])

I have tried this same code in MS .NET and they just discard the
character, where instead in mono it throws the above exception.

I followed the code down to UTF8Encoding.cs and it seems that when i hit
the InternalGetChars if leftsize =3D=3D 0 and the character leftover isn't =
a
value under 0x80 or a UTF start value then an exception is thrown.

I replaced the following code=20
if (leftSize =3D=3D 0) {
...
...
} else {
	// Invalid UTF-8 start character.
	if (throwOnInvalid) {
		throw new ArgumentException (_("Arg_InvalidUTF8"), "bytes1");
	}

with :

if (leftSize =3D=3D 0) {
...
...
} else {
	if (posn >=3D length) {
		throw new ArgumentException (_("Arg_InsufficientSpace"), "chars");
	}
	chars[posn++] =3D (char)ch;

and my program is now happy.


What i would like to know is, wouldn't be better if the character were
just added to the buffer without any fuss (or least discarded without an
exception being thrown like what seems to be happening under .NET)?=20

--=20
btouchet <btouchet@drakonis.dyndns.org>

--=-QC6m/r6JYl0ngpyHfLGH
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQA+NfxPvko8cCR4S6IRAizlAKCRtXGbF0xsNc5kxzhr8z/RC+nMLACgrSbi
VaoLfsUlusQzCASSl2bB0uk=
=52CT
-----END PGP SIGNATURE-----

--=-QC6m/r6JYl0ngpyHfLGH--