[Mono-dev] Patch for HttpRequest.cs

Tue May 9 09:09:03 EDT 2006

Hi,

Currently mod_mono and XSP uses UTF-8 to decode query string and UrlDecode 
uses UTF-8 as well.

Do we prefer to use UTF-8 for query string anyway or should we use 
ContentEncoding for query string as well just like MS.NET does?

Note that if we want to use ContentEncoding for query string as well XSP and 
mod_mono should be modified to return the original byte array in 
GetQueryStringRawBytes().

Kornél

----- Original Message ----- 
From: "Kornél Pál" <kornelpal at gmail.com>
To: "Juraj Skripsky" <js at hotfeet.ch>
Cc: "Miguel de Icaza" <miguel at ximian.com>; 
<mono-devel-list at lists.ximian.com>
Sent: Monday, May 08, 2006 5:16 PM
Subject: Re: [Mono-dev] Patch for HttpRequest.cs

> Hi,
>
> The difference you missed (actually I missed this as well as I don't use 
> this constructor:) is that public HttpRequest(string filename, string url, 
> string queryString) of MS.NET seems to initialize QueryString to a 
> NameValueCollection using Encoding.Default as a hard coded encoding (when 
> I changed default code page in Control Panel the encoding used changed so 
> this is Encoding.Default for sure, and no configuration is used). When 
> HttpRequest is created from a HttpWorkerRequest query string is processed 
> only when QueryString property is accessed for the first time.
>
> As query string is processed only once, changing 
> HttpRequest.ContentEncoding has effect only if you change (either 
> programatically, in configuration or using HTTP headers) it before 
> accessing QueryString.
>
> So query string parsing should be moved to a separate method that takes 
> encoding as a parameter. And call with Encoding.Default in the public 
> constructor and with HttpRequest.ContentEncoding in QueryString (and of 
> course only when it was not yet parsed).
>
> Kornél
>
> ----- Original Message ----- 
> From: "Juraj Skripsky" <js at hotfeet.ch>
> To: "Kornél Pál" <kornelpal at gmail.com>
> Cc: "Miguel de Icaza" <miguel at ximian.com>; 
> <mono-devel-list at lists.ximian.com>
> Sent: Monday, May 08, 2006 1:41 PM
> Subject: Re: [Mono-dev] Patch for HttpRequest.cs
>
>
>> Hello,
>>
>> I was talking about the encoding used during the URL decoding only. My
>> patch fixes that. Running the attached test program demonstrates the
>> need to call HttpUtility.UrlDecode with Latin1 encoding to match MS
>> behaviour. No matter what encoding is set in
>> HttpRequest.ContentEncoding, MS.NET always URL decodes "%e4" to "ä", so
>> it must _always_ be calling
>>
>> HttpUtility.UrlDecode("%e4", Encoding.GetEncoding("latin1"))
>>
>> Or am I missing something? Any feedback appreciated!
>>
>> - Juraj
>>
>>
>> On Mon, 2006-05-08 at 12:57 +0200, Kornél Pál wrote:
>>> Hi,
>>>
>>> You are wrong. HttpRequest.QueryString does the following on MS.NET:
>>>
>>> The only encoding it uses is HttpRequest.ContentEncoding. It tries to 
>>> obtain
>>> HttpWorkerRequest.GetQueryStringRawBytes(). If it fails then falls back 
>>> to
>>> HttpWorkerRequest.GetQueryString(). When it was able to obtain the byte
>>> array it will decode it using HttpRequest.ContentEncoding.GetString(). 
>>> As
>>> such query string is decoded correctly. When no byte array is available 
>>> in
>>> HttpWorkerRequest or the query string was set either in constructor or 
>>> using
>>> HttpContext.RewritePath for example the string is assumed to be decoded
>>> correctly so no decoding is done.
>>>
>>> Now we have a string that still may be URL encoded. MS.NET probably 
>>> calls
>>> HttpUtility.UrlDecode just like we do but MS.NET passes
>>> HttpRequest.ContentEncoding as well because query string is assumed to 
>>> be
>>> URL encoded using that encoding.
>>>
>>> Note that obtaining query string from HttpWorkerRequest in the 
>>> constructor
>>> as we currently do is a wrong implementation as 
>>> HttpRequest.ContentEncoding
>>> can be changed before HttpRequest.QueryString is first accessed.
>>>
>>> We should do the following:
>>> - delay query string processing until it is needed (don't obtain query
>>> string in the constructor)
>>> - try HttpWorkerRequest.GetQueryStringRawBytes() as well
>>> - use HttpRequest.ContentEncoding to decode the byte array and for
>>> HttpUtility.UrlDecode
>>>
>>> Kornél
>>>
>>> ----- Original Message ----- 
>>> From: "Juraj Skripsky" <js at hotfeet.ch>
>>> To: "Miguel de Icaza" <miguel at ximian.com>
>>> Cc: <mono-devel-list at lists.ximian.com>
>>> Sent: Monday, May 08, 2006 12:22 PM
>>> Subject: Re: [Mono-dev] Patch for HttpRequest.cs
>>>
>>>
>>> > Hello,
>>> >
>>> > After running more tests, I've found out that on MS.NET the decoding 
>>> > in
>>> > HttpRequest.QueryString does _not_ depend on
>>> > HttpRequest.ContentEncoding. In fact, MS seems to be always using 
>>> > Latin1
>>> > here. All other standard encodings fail.
>>> >
>>> > A revised patch is attached, including a NUnit test case. If no one
>>> > objects, I'll commit.
>>> >
>>> > - Juraj
>>> >
>>> >
>>> > On Sat, 2006-05-06 at 13:47 -0400, Miguel de Icaza wrote:
>>> >> Hello Juraj,
>>> >>
>>> >> > The attached patch makes sure that the get-parameters in 
>>> >> > QueryString
>>> >> > are
>>> >> > url-decoded using the proper encoding (when creating the
>>> >> > NameValueCollection).
>>> >> >
>>> >> > May I commit?
>>> >>
>>> >> Could you provide NUnit tests for this case?
>>> >>
>>> >> Miguel
>>> >>
>>> >
>>>
>>>
>>> --------------------------------------------------------------------------------
>>>
>>>
>>> > _______________________________________________
>>> > Mono-devel-list mailing list
>>> > Mono-devel-list at lists.ximian.com
>>> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>> >
>>>
>>
>