[Mono-dev] [Dotnet-runtime-dev] ASCII Strings Proposal

Jonathan Gilbert logic at deltaq.org
Thu Jul 28 07:33:31 UTC 2016


Another thought: It would make more sense for the single-byte encoding to
be ISO-8559-1 (Latin-1) than ASCII, because ASCII is either constrained to
128 code points, or, most typically extended by code page 437 in North
American computers (and, of course, it cannot be assumed to be code page
437 in the local encoding) requires a look-up table to convert to/from
Unicode, whereas Latin-1 simply is the first 256 code points of Unicode,
making the conversion a simple cast between System.Char/wchar_t and byte.

Thanks,

Jonathan Gilbert

On Thu, Jul 28, 2016 at 2:15 AM, Jonathan Gilbert <logic at deltaq.org> wrote:

> Phew :-) I must have gotten the wrong idea from this:
> http://www.mono-project.com/docs/advanced/runtime/docs/ascii-strings/#disabling-fixed-on-strings
>
> Thanks,
>
> Jonathan Gilbert
>
> On Thu, Jul 28, 2016 at 12:06 AM, Miguel de Icaza <
> miguel.de.icaza at gmail.com> wrote:
>
>> Hello Jonathan,
>>
>> I personally think it is a terrible idea to make Mono completely unable
>> to run code that compiles and runs just fine on Microsoft's .NET framework.
>> Could get_OffsetToStringData be made to convert the ASCII representation
>> back to UCS-2 on-the-fly for that edge case where the code actually uses
>> the fixed (char *ptr = str) pattern? It's not a very common pattern, so
>> the overhead of the conversion, while defeating the purpose of using that
>> pattern in the first place, would affect only the tiniest minority of code.
>>
>>
>> If this were to become a standard part of Mono, that would have to be
>> done.
>>
>> The reason it is not done in the current patch is that we needed to
>> identify all the spots with issues so they could adjusted to deal with the
>> two encodings, purely a bootstrapping side effect.
>>
>> And we need the spots adjusted, so we do not needlessly create duplicate
>> strings on demand, otherwise one of the benefits of this work (reduce
>> memory pressure) would go out the window.
>>
>> If this were the direction taken, it might be nice also to provide a way
>> to force an ASCII-capable string to be UCS-2 anyway, in case there are
>> people who want the fixed (char *ptr = str) pattern to remain performant
>> -- perhaps an environment variable?? Obviously we wouldn't want the Mono
>> runtime to scan the environment block every time it allocates a string, so
>> perhaps it could do the check & cache the result once on startup, and then
>> allow some innocuous method that's already doing a lot of work, such as
>> string.IsInterned, to re-check it. This avoids adding Mono-specific API,
>> so that code written to be aware of Mono's peculiarity still runs just fine
>> on other frameworks.
>>
>>
>> Something like that.
>>
>> Miguel.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dot.net/pipermail/mono-devel-list/attachments/20160728/bb4aa69e/attachment.html>


More information about the Mono-devel-list mailing list