[Mono-dev] Handling UTF8 strings containing nul

Rob Wilkens robwilkens at gmail.com
Sun Jun 24 23:51:56 UTC 2012


I am not an expert, just have a suggestion, and i don't know that my
suggestion is any better than your solution.  But i figure it couldn't
hurt to share.

>From what i saw someone replied to your message here about how to do it:
https://mail.gnome.org/archives/gtk-list/2012-June/msg00023.html

The realloc's i agree may be bad, so not knowing anything else, i wonder
if you couldn't pre-alloc a buffer up front of length x 2 (from 8 bit to
16 bit in theory is double size, presuming that's the difference between
utf8 and utf16 and i don't know).

Something like (and this is pseudo code, untested, and probably won't
work anywhere near as written)

buf = malloc (length * 2);
memset(buf,0,length*2);
bufpos=0;
while (bufpos <= length) {
  ut =
g_utf8_to_utf16(text+bufpos,length,&bytes_read,&words_written,&error);
  if (there is an error) break;
  memcpy(buf+(bufpos*2), ut,
(bytes_read<(length-bufpos)?bytes_read*2:(length-bufpos)*2);
  bufpos+=((bytes_read+1)*2);
}

That was pulled out of my head, and i am not familiar enough with utf
strings to know if it would work.  I'm just guessing your converting
from something that's 8 bits to something that's 16 bits so it would be
length*2 to alloc.

Use my code above more as a guide of what _i_ have in mind whether or
not it is right, someone else should feel free to correct me.

I am _not_ an expert, just a newbie with a little bit of c programming
experience in my very distant past.

-Rob

On 06/24/2012 07:03 PM, Weeble wrote:
> Having diagnosed this bug (when an attribute has a string argument and
> the string contains nul, it gets truncated), I've been trying to find
> a way to fix it: https://bugzilla.xamarin.com/show_bug.cgi?id=5732
>
> My first attempt just tried to use the available functions in glib,
> but it wasn't acceptable because it involved potentially a great many
> inefficient reallocs: https://github.com/mono/mono/pull/346
>
> In that pull request, Rodrigo Kumpera recommends that since mono has
> its own implementation of glib, it would be better to introduce a new
> version of g_utf8_to_utf16 that can handle embedded nuls, which will
> probably be useful in other places as well.
>
> Perhaps naively, I have had a go at implementing this. However, when I
> tried to add tests for my new function in the eglib test suite, I
> realised that the tests are compiled and built against the native glib
> as well, so introducing new tests against a new API results in build
> failures. You can see what I've tried to do here:
> https://github.com/weeble/mono/commit/f545596052125b90ebdd0a302fa3473d768f9d52
>
> I'm willing to keep trying at this if anyone is able to give me some
> pointers. Does eglib's API already diverge from glib? If so, are there
> any conditional #defines to allow the tests for eglib-specific
> functions to run only against eglib and not glib? If not, is it
> definitely okay to introduce divergence?
>
> Regards,
>
> Weeble.
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list




More information about the Mono-devel-list mailing list