[Mono-dev] ASCII Strings Proposal

Jonathan Gilbert logic at deltaq.org
Thu Jul 28 03:31:05 UTC 2016


I personally think it is a terrible idea to make Mono completely unable to
run code that compiles and runs just fine on Microsoft's .NET framework.
Could get_OffsetToStringData be made to convert the ASCII representation
back to UCS-2 on-the-fly for that edge case where the code actually uses
the fixed (char *ptr = str) pattern? It's not a very common pattern, so the
overhead of the conversion, while defeating the purpose of using that
pattern in the first place, would affect only the tiniest minority of code.

If this were the direction taken, it might be nice also to provide a way to
force an ASCII-capable string to be UCS-2 anyway, in case there are people
who want the fixed (char *ptr = str) pattern to remain performant --
perhaps an environment variable?? Obviously we wouldn't want the Mono
runtime to scan the environment block every time it allocates a string, so
perhaps it could do the check & cache the result once on startup, and then
allow some innocuous method that's already doing a lot of work, such as
string.IsInterned, to re-check it. This avoids adding Mono-specific API, so
that code written to be aware of Mono's peculiarity still runs just fine on
other frameworks.

Jonathan Gilbert

On Wed, Jul 27, 2016 at 1:45 PM, Jon Purdy jopur-at-microsoft.com
|mono-list subscription/Example Allow| <5jyv9xgrqt at sneakemail.com> wrote:

> For reference, only the following small patches were required to run
> Xamarin Studio:
>
>
>
> libgit2sharp:
> https://github.com/evincarofautumn/libgit2sharp/commit/4508aa2157448456a6a35733e0040ae2686302dd
>
> Roslyn:
> https://github.com/evincarofautumn/roslyn/commit/8945af94ece76c54525facb1a2458e5370d56a09
>
> maccore:
> https://github.com/evincarofautumn/maccore/commit/f67a77d27ae51864e38ebc1857ec58ea7ac23519
>
>
>
> These are of course experimental, but I want to give a sense of how much
> work it is to patch code that depends on the current String representation.
>
>
>
> *From: *Jonathan Purdy <jopur at microsoft.com>
> *Date: *Wednesday, July 27, 2016 at 11:35 AM
> *To: *"mono-devel-list at lists.dot.net" <mono-devel-list at lists.dot.net>
> *Cc: *"dotnet-runtime-dev at lists.dot.net" <dotnet-runtime-dev at lists.dot.net
> >
> *Subject: *ASCII Strings Proposal
>
>
>
> I have written a description of my prototype implementation of adaptive
> ASCII/UTF-16 strings in Mono:
>
>
>
> http://www.mono-project.com/docs/advanced/runtime/docs/ascii-strings/
>
>
>
> Introduction:
>
>
>
> > For historical reasons, System.String uses the UCS-2 character encoding,
> that is, UTF-16 without surrogate pairs.
>
>
>
> > However, most strings in typical .NET applications consist solely of
> ASCII characters, leading to wasted space: half of the bytes in a string
> are likely to be null bytes!
>
>
>
> > Since strings are immutable, we can scan the character data when the
> string is constructed, then dynamically select an encoding, thereby saving
> 50% of string memory in most cases.
>
>
>
> I would like to solicit feedback on this proposal from runtime developers
> and users alike. In particular:
>
>
>
> - Specific objections regarding performance characteristics, compatibility
> issues, &c.
>
> - Questions about unclear or underspecified parts of the proposal
>
> - Real-world use cases that would benefit from this optimization
>
> - Suggestions for suitable real-world benchmarks
>
>
>
> Thank you!
>
>
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.dot.net
> http://lists.dot.net/mailman/listinfo/mono-devel-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dot.net/pipermail/mono-devel-list/attachments/20160727/17821fc9/attachment-0001.html>


More information about the Mono-devel-list mailing list