[Mono-dev] [PATCH] Implement internal Encodings using unified code base
Kornél Pál
kornelpal at gmail.com
Thu Mar 20 10:32:58 EDT 2008
Hi,
>From: "Paolo Molaro" <lupus at ximian.com>
...
>I would much prefer to see the argument validation close to the unsafe
>code and I don't particularly like the use of these internal virtual
>wrapper methods.
>The code duplication can be avoided by using simple static validation
>methods like (only for the cases when more that two lines of checks are
>needed):
>
>static void ValidateGetBytes (...same arguments as GetBytes ()...) {
>}
>
>Basically using virtual+unsafe should be severely limited in our managed
>assemblies or it's going to increase the cost of security checks
>significantly.
My reasons for the new Encoding infrastructure I propose:
- Unsafe code is required anyway because Encoding has public unsafe methods.
Handling strings and character arrays in a single function requires unsafe
code as well. Using a single implementation avoids accidental differences
between string, array and pointer based public methods.
- All argument checks are done in the abstract Encoding.cs and encoding
implementations only have to provide the actual implementation. I believe
that this is more secure than using different checks is each encoding
implementation because they can easily go out of sync.
- Argument validation could be moved to static methods in Encoding.cs that
could result in better security (by not exposing unchecked unsafe code) but
Microsoft implementation has bad desingn. There are a lot of unnecessary
buffer copy operations (mostly related to string <-> char[] conversion). To
fix https://bugzilla.novell.com/show_bug.cgi?id=363713 while avoiding
unecessary buffer copy operations there has to be some internal virtual
methods with a check like in IsInternal property.
- After a lot of brainstorming I think that having these four virtual unsafe
methods provides the solution. (GetStringImpl is required in .NET 1.x only
because a bug in MS implementation.) Only single implementation per method
per encoding is required, argument checks are shared between encodings, and
Encoding.cs can call faster code paths than MS.NET does without overriding
methods not overridden in MS.NET.
- Also note that I have plans to integrate Encoders/Decoders to this
infrastructure by adding an Encoder/Decoder parameter at the end of the
argument list of these four ...Impl methods that will let us use
Encoder/Decoder as a state store rather than having actual code
implementation.
These goals could only be achieved without these four virtual unsafe methods
by adding some more internal virtual (not unsafe) methods and having a lot
of duplicate code. This would be much more difficult to maintain so I
believe that this would be less secure than having four virtual unsafe
methods that lets us to use much cleaner design in Encoding implementations.
>> - if (charCount < 0 || charIndex + charCount > s.Length)
>> + if (charCount < 0 || charIndex > (s.Length - charCount))
>These fixes should be in their own separate patch instead of being
>hidden in a large change.
This is the only one. (Maybe I'm wrong but no other intentional fixes of
this kind.)
>> - return count * 2;
>> + return GetByteCountInternal (chars, index, count);
>For these simple cases I think that hiding count*2 inside
>GetByteCountInternal() doesn't help code readability or any other
>property of the code. This would look much better if it was instead like
>this:
Currently our implementations has weak fallback support. For example
UnicodeEncoding has none while MS.NET has.
Eventually we will have to say goodby to count * 2 because that is not
compatible with fallbacks.
.NET 1.x has no fallback support so there could be some performance gain by
avoiding GetByteCountInternal but I think having a centralized
implementation is worth to have little performance loss on .NET 1.x because
people will eventually use later versions instead.
The same applies for complicating other currently simple implementations.
>What I suggest is the following:
>1) a small patch with the bugfixes
This is the only simple bug fix:
>> - if (charCount < 0 || charIndex + charCount > s.Length)
>> + if (charCount < 0 || charIndex > (s.Length - charCount))
If you approve it, I'll commit it.
Other bugs can only be fixed by infrastructural changes.
>2) another patch that introduces the validation methods
>and reduces the code duplication
>3) separate patches for each encoding with the optimizations
Because of using IsInternal property each encodings can be committed
separatedly only Encoding.cs has to go first.
I tried to defend the infrastructure proposed by me. I see that the one
proposed by you is cleaner but I think that it does not address all the
issuse that mine does and extending it to address all the issues would
result in a lot of extra code. I believe that having four internal virtual
unsafe methods is not a big problem. If it is a big issue sorry for wasting
time but I didn't get your reasons against them.
Kornél
More information about the Mono-devel-list
mailing list