[Mono-dev] [Dotnet-runtime-dev] ASCII Strings Proposal

Thu Jul 28 14:07:23 UTC 2016

Hello,

While this is indeed possible, we would not be able to leverage the fact
that 7-bit encoded strings could be copied without conversions when going
out on a P/Invoke with "Ansi" settings (which in Mono, we have overloaded
to mean "utf-8").

And Unix is predominantly a utf-8 friendly world.   Hence, the encoding is
better for our purposes.

Miguel

On Thu, Jul 28, 2016 at 3:33 AM, Jonathan Gilbert <logic at deltaq.org> wrote:

> Another thought: It would make more sense for the single-byte encoding to
> be ISO-8559-1 (Latin-1) than ASCII, because ASCII is either constrained to
> 128 code points, or, most typically extended by code page 437 in North
> American computers (and, of course, it cannot be assumed to be code page
> 437 in the local encoding) requires a look-up table to convert to/from
> Unicode, whereas Latin-1 simply is the first 256 code points of Unicode,
> making the conversion a simple cast between System.Char/wchar_t and byte.
>
> Thanks,
>
> Jonathan Gilbert
>
> On Thu, Jul 28, 2016 at 2:15 AM, Jonathan Gilbert <logic at deltaq.org>
> wrote:
>
>> Phew :-) I must have gotten the wrong idea from this:
>> http://www.mono-project.com/docs/advanced/runtime/docs/ascii-strings/#disabling-fixed-on-strings
>>
>> Thanks,
>>
>> Jonathan Gilbert
>>
>> On Thu, Jul 28, 2016 at 12:06 AM, Miguel de Icaza <
>> miguel.de.icaza at gmail.com> wrote:
>>
>>> Hello Jonathan,
>>>
>>> I personally think it is a terrible idea to make Mono completely unable
>>> to run code that compiles and runs just fine on Microsoft's .NET framework.
>>> Could get_OffsetToStringData be made to convert the ASCII
>>> representation back to UCS-2 on-the-fly for that edge case where the code
>>> actually uses the fixed (char *ptr = str) pattern? It's not a very
>>> common pattern, so the overhead of the conversion, while defeating the
>>> purpose of using that pattern in the first place, would affect only the
>>> tiniest minority of code.
>>>
>>>
>>> If this were to become a standard part of Mono, that would have to be
>>> done.
>>>
>>> The reason it is not done in the current patch is that we needed to
>>> identify all the spots with issues so they could adjusted to deal with the
>>> two encodings, purely a bootstrapping side effect.
>>>
>>> And we need the spots adjusted, so we do not needlessly create duplicate
>>> strings on demand, otherwise one of the benefits of this work (reduce
>>> memory pressure) would go out the window.
>>>
>>> If this were the direction taken, it might be nice also to provide a way
>>> to force an ASCII-capable string to be UCS-2 anyway, in case there are
>>> people who want the fixed (char *ptr = str) pattern to remain
>>> performant -- perhaps an environment variable?? Obviously we wouldn't want
>>> the Mono runtime to scan the environment block every time it allocates a
>>> string, so perhaps it could do the check & cache the result once on
>>> startup, and then allow some innocuous method that's already doing a lot of
>>> work, such as string.IsInterned, to re-check it. This avoids adding
>>> Mono-specific API, so that code written to be aware of Mono's peculiarity
>>> still runs just fine on other frameworks.
>>>
>>>
>>> Something like that.
>>>
>>> Miguel.
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dot.net/pipermail/mono-devel-list/attachments/20160728/0c9a0d8b/attachment.html>