[Mono-dev] Handling UTF8 strings containing nul

Rob Wilkens robwilkens at gmail.com
Sun Jun 24 23:57:33 UTC 2012


I just wanted to note - there is at least one bug in my pseudo-code i
found already (multiplying bufpos+=(whatever)*2 -- probably shouldn't be
multiplied there because it's multiplied where it's needed i think).

There's probably other bugs in my pseudo-code, i won't correct them all
just this one to illustrate why it's not wise to copy my untested code
word for word without knowing what you're doing (because, again, i don't
know what i'm doing).


On 06/24/2012 07:51 PM, Rob Wilkens wrote:
> I am not an expert, just have a suggestion, and i don't know that my
> suggestion is any better than your solution.  But i figure it couldn't
> hurt to share.
>
> From what i saw someone replied to your message here about how to do it:
> https://mail.gnome.org/archives/gtk-list/2012-June/msg00023.html
>
> The realloc's i agree may be bad, so not knowing anything else, i wonder
> if you couldn't pre-alloc a buffer up front of length x 2 (from 8 bit to
> 16 bit in theory is double size, presuming that's the difference between
> utf8 and utf16 and i don't know).
>
> Something like (and this is pseudo code, untested, and probably won't
> work anywhere near as written)
>
> buf = malloc (length * 2);
> memset(buf,0,length*2);
> bufpos=0;
> while (bufpos <= length) {
>   ut =
> g_utf8_to_utf16(text+bufpos,length,&bytes_read,&words_written,&error);
>   if (there is an error) break;
>   memcpy(buf+(bufpos*2), ut,
> (bytes_read<(length-bufpos)?bytes_read*2:(length-bufpos)*2);
>   bufpos+=((bytes_read+1)*2);
> }
>
> That was pulled out of my head, and i am not familiar enough with utf
> strings to know if it would work.  I'm just guessing your converting
> from something that's 8 bits to something that's 16 bits so it would be
> length*2 to alloc.
>
> Use my code above more as a guide of what _i_ have in mind whether or
> not it is right, someone else should feel free to correct me.
>
> I am _not_ an expert, just a newbie with a little bit of c programming
> experience in my very distant past.
>
> -Rob
>
> On 06/24/2012 07:03 PM, Weeble wrote:
>> Having diagnosed this bug (when an attribute has a string argument and
>> the string contains nul, it gets truncated), I've been trying to find
>> a way to fix it: https://bugzilla.xamarin.com/show_bug.cgi?id=5732
>>
>> My first attempt just tried to use the available functions in glib,
>> but it wasn't acceptable because it involved potentially a great many
>> inefficient reallocs: https://github.com/mono/mono/pull/346
>>
>> In that pull request, Rodrigo Kumpera recommends that since mono has
>> its own implementation of glib, it would be better to introduce a new
>> version of g_utf8_to_utf16 that can handle embedded nuls, which will
>> probably be useful in other places as well.
>>
>> Perhaps naively, I have had a go at implementing this. However, when I
>> tried to add tests for my new function in the eglib test suite, I
>> realised that the tests are compiled and built against the native glib
>> as well, so introducing new tests against a new API results in build
>> failures. You can see what I've tried to do here:
>> https://github.com/weeble/mono/commit/f545596052125b90ebdd0a302fa3473d768f9d52
>>
>> I'm willing to keep trying at this if anyone is able to give me some
>> pointers. Does eglib's API already diverge from glib? If so, are there
>> any conditional #defines to allow the tests for eglib-specific
>> functions to run only against eglib and not glib? If not, is it
>> definitely okay to introduce divergence?
>>
>> Regards,
>>
>> Weeble.
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>




More information about the Mono-devel-list mailing list