[Mono-list] ASCII bytes to string?
Ian Norton
ian.norton-badrul at thales-esecurity.com
Fri Jan 11 14:48:11 UTC 2013
On Fri, Jan 11, 2013 at 02:20:26PM +0000, edward.harvey.mono wrote:
> > From: mono-list-bounces at lists.ximian.com [mailto:mono-list-
> > bounces at lists.ximian.com] On Behalf Of mickeyf
> >
> > I am reading bytes from hardware device as a stream, as abbreviated here:
> >
> > The string itself displays as expected, but shows a length of twice the
> > number of characters, as if String.Length is reporting the number of bytes
> > (UTF16) rather than the number of Unicode characters in the string.
> >
> > If I simply assign a literal string:
> >
> > s = "abcdefg";
> >
> > The length reported is as expected (7 in this case).
> >
> > The documentation for string.length says "number of characters", not
> > "number
> > of bytes", and this is what I have always seen. I'm quite sure I have done
> > this same thing successfully in Windows .NET with the behavior differing
> > from what I'm seeing now in mono. The C# (not mono) docs, if I am
> > understanding them correctly, say that GetString() should return a unicode
> > string, which apparently it does (?).
>
> I'm not completely sure what your question was, but it seems to be just some general confusion about strings and characters?
>
> It's not like the old days - when we could just assume a string was actually an array of chars, and every char was the same size. Depending on the encoding and the individual char, each character may be a different size, from 1 to 4 bytes, but typically 2.
>
> When you're reading a byte array from your device and converting to a string, each byte gets translated separately to a char, and in this case, apparently ends up being typically 2 bytes per character. But that would be different, if only the byte values you read were different. Some of them would become a 1 byte char, and some become up to 4 bytes char.
>
> The length of a string is the number of chars, not the number of bytes. I don't know how you find the number of bytes.
In my experience (on windows too) Encoding.X.GetString( buf ) will not
terminate your string on NUL,
eg:
csharp> Encoding.ASCII.GetBytes("foo");
{ 102, 111, 111 }
csharp> Encoding.ASCII.GetString( new byte[] { 102, 111, 111, 0 } ).Length;
4
csharp> Encoding.ASCII.GetString( new byte[] { 102, 111, 111, 0, 0 } ).Length;
5
This can be quite annoying. The resulting string contains NUL characters ( a
char with a value of zero ).
You can use a little bit of linq to trim these off.
var buf = read_from_device_or_pinvoke();
var strbuf = buf.TakeWhile( (c) => { return c != 0x00; } ).ToArray();
var str = Encoding.ASCII.GetString(strbuf);
Ian
> _______________________________________________
> Mono-list maillist - Mono-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list
More information about the Mono-list
mailing list