[Mono-dev] [PATCH] Implement internal Encodings using unified code base

Thu Mar 20 11:59:18 EDT 2008

On 03/20/08 Kornél Pál wrote:
> - Unsafe code is required anyway because Encoding has public unsafe methods. 
> Handling strings and character arrays in a single function requires unsafe 
> code as well. Using a single implementation avoids accidental differences 
> between string, array and pointer based public methods.

Sure, the existing unsafe methods are perfectly fine, there is no need
to add more virtual unsafe methods (this is about 2.0). In 1.1 the
unsafe methods would simply become internal.

> - All argument checks are done in the abstract Encoding.cs and encoding 
> implementations only have to provide the actual implementation. I believe 
> that this is more secure than using different checks is each encoding 
> implementation because they can easily go out of sync.

Non-argument as the complex checks in my proposal would go into common code in
static methods.

> - Argument validation could be moved to static methods in Encoding.cs that 
> could result in better security (by not exposing unchecked unsafe code) but 
> Microsoft implementation has bad desingn. There are a lot of unnecessary 
> buffer copy operations (mostly related to string <-> char[] conversion). To 
> fix https://bugzilla.novell.com/show_bug.cgi?id=363713 while avoiding 
> unecessary buffer copy operations there has to be some internal virtual 
> methods with a check like in IsInternal property.

I don't see any need for a different infrastructure to fix that bug,
it's simply a bug in our implementation. In 2.0 all the GetByteCount ()
overloads must call the unsafe overload to do the work. There is no need
for additional virtual methods, because the method is there already.
Look at UTF8Encoding for GetByteCount(): it does already mostly the right
thing (it calls the internal implementation directly instead of going
through the virtual unsafe method, trivially fixable).

> - After a lot of brainstorming I think that having these four virtual unsafe 
> methods provides the solution. (GetStringImpl is required in .NET 1.x only 
> because a bug in MS implementation.) Only single implementation per method 
> per encoding is required, argument checks are shared between encodings, and 

Ok, let's see the different cases.
GetByteCount() can be implemented this way without infrastructure
changes. GetCharCount() looks like the same. Same with GetChars ().
And I see nothing in the GetBytes() interface that would prevent it.
This is all with the assumption that the 363713 bug report suggests that
all the overloads are expected to call the unsafe virtual variant for
the actual implementation in the end (though it is likely that, for
example, GetBytes(char[]) would call GetBytes(char[],int,int), this
doesn't change the conclusion).
Is this assumption not correct?
So please make a concrete example that would require the different
infrastructure as I don't see the need (though you're definitely right
that our Encoding classes need fixes).

> Encoding.cs can call faster code paths than MS.NET does without overriding 
> methods not overridden in MS.NET.

Concrete examples where this would apply, please.

> - Also note that I have plans to integrate Encoders/Decoders to this 
> infrastructure by adding an Encoder/Decoder parameter at the end of the 
> argument list of these four ...Impl methods that will let us use 
> Encoder/Decoder as a state store rather than having actual code 
> implementation.

Please describe why we'd need this.

> These goals could only be achieved without these four virtual unsafe methods 
> by adding some more internal virtual (not unsafe) methods and having a lot 
> of duplicate code. This would be much more difficult to maintain so I 

I don't see where the duplicate code would come from.

> Currently our implementations has weak fallback support. For example 
> UnicodeEncoding has none while MS.NET has.
> Eventually we will have to say goodby to count * 2 because that is not 
> compatible with fallbacks.
> .NET 1.x has no fallback support so there could be some performance gain by 
> avoiding GetByteCountInternal but I think having a centralized 
> implementation is worth to have little performance loss on .NET 1.x because 
> people will eventually use later versions instead.

Let's talk with a concrete example and assume that for
UnicodeEncoding.GetByteCount() we need to do something more complicated
than count * 2. What is the advantage of doing it in a separate function
instead of in the actual method?

> This is the only simple bug fix:
> >> - if (charCount < 0 || charIndex + charCount > s.Length)
> >> + if (charCount < 0 || charIndex > (s.Length - charCount))
> 
> If you approve it, I'll commit it.

Sure, this bugfix shuld be committed, both in trunk and in the 1.9
branch. Thanks.

> I tried to defend the infrastructure proposed by me. I see that the one 
> proposed by you is cleaner but I think that it does not address all the 
> issuse that mine does and extending it to address all the issues would 

Please make a concrete example of a bug that requires an infrastructure
change. 363713 I have already shown that is simply a bug in our
implementation and it doesn't require a new infrastructure.

> result in a lot of extra code. I believe that having four internal virtual 
> unsafe methods is not a big problem. If it is a big issue sorry for wasting 
> time but I didn't get your reasons against them.

With more unsafe code and unsafe code codepaths the effort required to
review the code for security issues increases. It's as simple as that.
Then of course is the issue that if we don't really need them, we
shouldn't make the code more complex by using them, even if they were
non-unsafe.
If a valid case is presented where we might need them, we will
reconsider.
Thanks.

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better