[Mono-list] ASCII bytes to string?

Fri Jan 11 14:48:11 UTC 2013

On Fri, Jan 11, 2013 at 02:20:26PM +0000, edward.harvey.mono wrote:
> > From: mono-list-bounces at lists.ximian.com [mailto:mono-list-
> > bounces at lists.ximian.com] On Behalf Of mickeyf
> > 
> > I am reading bytes from hardware device as a stream, as abbreviated here:
> > 
> > The string itself displays as expected, but shows a length of twice the
> > number of characters, as if String.Length is reporting the number of bytes
> > (UTF16) rather than the number of Unicode characters in the string.
> > 
> > If I simply assign a literal string:
> > 
> > s = "abcdefg";
> > 
> > The length reported is as expected (7 in this case).
> > 
> > The documentation for string.length says "number of characters", not
> > "number
> > of bytes", and this is what I have always seen. I'm quite sure I have done
> > this same thing successfully in Windows .NET with the behavior differing
> > from what I'm seeing now in mono. The C# (not mono) docs, if I am
> > understanding them correctly, say that GetString() should return a unicode
> > string, which apparently it does (?).
> 
> I'm not completely sure what your question was, but it seems to be just some general confusion about strings and characters?
> 
> It's not like the old days - when we could just assume a string was actually an array of chars, and every char was the same size.  Depending on the encoding and the individual char, each character may be a different size, from 1 to 4 bytes, but typically 2.  
> 
> When you're reading a byte array from your device and converting to a string, each byte gets translated separately to a char, and in this case, apparently ends up being typically 2 bytes per character.  But that would be different, if only the byte values you read were different.  Some of them would become a 1 byte char, and some become up to 4 bytes char.
> 
> The length of a string is the number of chars, not the number of bytes.  I don't know how you find the number of bytes.

In my experience (on windows too)   Encoding.X.GetString( buf ) will not
terminate your string on NUL,

eg:

csharp> Encoding.ASCII.GetBytes("foo");
{ 102, 111, 111 }

csharp> Encoding.ASCII.GetString( new byte[] { 102, 111, 111, 0 } ).Length; 
4
csharp> Encoding.ASCII.GetString( new byte[] { 102, 111, 111, 0, 0 } ).Length; 
5

This can be quite annoying.  The resulting string contains NUL characters ( a
char with a value of zero ).

You can use a little bit of linq to trim these off.

var buf = read_from_device_or_pinvoke();
var strbuf = buf.TakeWhile( (c) => { return c != 0x00; } ).ToArray();
var str = Encoding.ASCII.GetString(strbuf);

Ian

> _______________________________________________
> Mono-list maillist  -  Mono-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list