[Mono-list] System.Text.ASCIIEncoding patch

mkestner@speakeasy.net mkestner@speakeasy.net
Fri, 15 Mar 2002 09:26:29 -0800


On 15 Mar 2002, Piers Haken wrote:

> I'd say, first write the test cases you talked
> about, then if they pass on the original code
> then nothing really needs to change. Unless,
> of course, your code is significantly faster.

Okay, here's a GetBytes scenario:

Encoding::GetBytes calls Encoder::GetBytes which calls back to Encoding::IConvGetBytes which is an internal call to ves_icall_iconv_get_bytes which calls iconv_convert() which calls iconv() in libiconv.

vs:

a managed comparison to 0x7f, which is most likely all that iconv() is doing anyway.

I suspect I can leave the profiler in the closet.

Having taken a closer look at the iconv wrappers, we have bigger issues anyway.  If someone feeds in a char > 0x7f, it apears mono will assert in iconv_convert() and not convert any of the chars instead of just replacing the offending char with an ASCII '?' as the docs specify.  I'll confirm this with a test this weekend.   

Also, though this isn't directly ASCII related, the decoder/encoder implementations are not stateful. The docs specify that passing 2 consecutive byte[] refs to a decoder that have multibyte character overlap should result in the overlapping character being converted properly. There is no code currently that I can see that handles this. 

My general impression of the System.Text module at this point is that it has a tendancy to use a sledgehammer to drive finishing nails.  It also seems to be engineered for properly formatted input, ignoring error scenarios. I'll try to codify these concerns with a test suite to expose them this weekend. 

> I agree that GetMaxByteCount needs some work,
> but it might be best to fix this so that it
> can work correctly for all the
> Encoding-derived classes if at all possible.

This simply is not a good candidate for abstraction. There is a reason why the MS implementation overrides these methods for all the subclasses. UTF8 is probably the only standard encoding that it makes sense to use iconv for the Count related functions.

> I'll try to get the class status page to more
> accurately handle the case when a missing
> method is implemented in a base class. Sorry
> about that.

Well, I think it's good to have the tool throw red flags up where methods have been overridden.  Unless we have really good reason to ignore the hint, I think we should follow the documentation's advice.

Mike