[Mono-dev] Re - mono WebResponse caching
Mark Lintner
mlintner at sinenomine.net
Wed Jun 5 01:54:11 UTC 2013
Hi Greg,
Great input. I have not gotten that far yet. Also the cache level is very naïve. Everything caches except Bypass cache which doesn't cache. To check the headers it would check the WebResponse Headers property coming into TryInsert in DirectoryBackedResponseCache.cs . I would combine the result of checking the header with the conditional ShouldCache which also needs work. What we are looking for is comments like this, about what you think needs to be in a usable implementation, gotchas, things to look out for. Also if we are on the wrong track doing it this way I don't want to waste time doing something which will not be useful. Assuming I'm not on the wrong path, I will add the header check next. This is the kind of feedback we need. I only tried to make a prototype of how caching could work and then did some performance measurements. Obviously didn't add any of the details yet. If this is a viable approach I will continue to fill out the implementation guided by research and suggestions. Probably want to limit it to what we really need to have. Any other suggestions, comments, criticism would be greatly appreciated.
Thanks,
________________________________
From: Greg Young [gregoryyoung1 at gmail.com]
Sent: Tuesday, June 04, 2013 6:49 PM
To: Mark Lintner
Cc: mono-devel-list at lists.ximian.com
Subject: Re: [Mono-dev] mono WebResponse caching
I would expect this to look at http caching headers in response when caching such as expiration, whether something is cachable (public/private) etc but don't see any of that code (or maybe I missed it). Can you point to where it is in your implementation?
On Tuesday, June 4, 2013, Mark Lintner wrote:
We have noticed that parts of the Mono Web functionality are not yet implemented. We are hoping to be able to add some of these missing pieces. Caching of
HttpWebResponses is the first one we took a look at. It would appear that Microsoft takes advantage of the internet explorer caching. Since it would appear
that an HttpWebResponse is fully populated when it is returned from the cache and the only content in the urls that are cached are the bytes that come in on
the wire, Microsoft must return things from the cache at a very low level and and let the response populate the normal way. This did not seem possible to
emulate when I looked at mono. I also wanted to cache to a directory so that the urls can be inspected or deleted.
After looking at the code for awhile I realized that I could save additional information from the response, ie: headercollection, cookiecontainer, Status
code etc in the same file as the responsestream. A response can be serialized but not all of it, most importantly the responsestream is not serialized. I
wrote a prototype with some helper functions, the only change to existing mono code would be in HttpWebRequest.GetResponse. Before calling BeginGetResponse
I call a function that tries to retrieve the url from the cache. If it is not there, BeginGetResponse is called as normal, the response is grabbed from
EndGetResponse and passed to a method to insert into the cache. This insert method will open a filename in a directory designated as mono_cache with the name
produced by calling HttpUtility.Encode so it is a legal filename. Also you can only read from the responsestream once, which means that if read from it, save
it to a file, then the responsestream is no longer accessible from the WebResponse when it is returned to the calling code. If sream copy worked it could be
used here. One way to fix the problem is to call GetResponseStream(), read it to the end, which produces a string, then finally convert the string to bytes.
Then you can serialize the bytes to the open file, serialize the headercollection and ultimately individually add any field you want in the response when it
is retrieved from the cache, then close the file. Then create a new WebConnectionData instance and populate it with headers and all the relevent fields from
the old response, finally passing the bytes of data to a memory stream constructor and assign it to the WebConnectionData's stream member. Then I construct a
new HttpWebResponse, passing the WebConnectionData and other arguments which are taken from the existing request and the recently retrieved response. The
response is then returned to getResponse then finally to the calling code, good as new. Next time the url is requested the cached file is deserialized, one
member at a time then a WebConnectionData is setup and finally, a HttpWebResponse is constructed with the info read from the file, the WebConnectionData and
from the members of the new request. I concentrated on prototyping caching of HttpWebResponses. Only a simple cache algorithm, no Ftp, no File etc. No cache
aging etc. There are many other things that would have to be done for this to be usable.
Using the prototype I requested www.google.com<http://www.google.com> and it took .1601419 sec to retrieve and cache it, including all the mechanics described above. The
second time returning it from the cache, building the HttpWebResponse took .0007671. This is a 288 times speedup. That kind of performance increase, from a
prototype that is not yet optimized, is compelling. I must add that I measured the same experiment with google under the .Net 4.5 implementation and the
speedup is 769x. Obviously thats something to work toward. Even falling short of that level of speedup, some implementation of response caching for mono
would greatly benefit the users who need better performance of services, rest etc.
So far this is the only mono code that would change to support caching,
from HttpWebRequest.cs:
public override WebResponse GetResponse()
{
WebResponse response = ResponseCache.TryRetrieve (this);
if (response == null)
{
WebAsyncResult result = (WebAsyncResult) BeginGetResponse (null, null);
response = EndGetResponse (result);
response.IsFromCache = false;
HttpWebResponse response2 = ResponseCache.TryInsert ((HttpWebRequest)this, (HttpWebResponse)response);
response.Close ();
return response2;
}
return response;
}
The code can stand some improvement. It is a prototype and it shows. I have included the file ResponseCache.cs which the above
code hooks into. We were wondering whether anyone has tried to implement response caching before and whether there is any intent to in the future. What are
your thoughts about the approach described above and shown in the ResponseCache.cs file and the 3 other files I included. Bear in mind it
would be refactored so that it fit into the mono architecture a bit better, it needs to be generalized, different strategies for different concrete response
types need to be plugged in, configuration. and a plethora of things I missed. If you feel this approach has merit I can detail all that seperatly. If you
feel a different approach would work better, please let us know, in the end were looking to make mono more robust in some of the ways that are expected but
not yet implemented.
The main thing I want to show here is how it is possible, to wedge a cache strategy into mono's request pipeline without changing or destabilizing
mono itself. The cache implementation could change also, which is one way that this is code falls short is it totally violates open-closed. Is this a
viable approach? There would of course be much more to do, that I have not even mentioned here. Your comments would be appreciated.
The files can be found here.
http://pastebin.com/FCefsiDx
http://pastebin.com/8qBh8mMW
http://pastebin.com/xVKLaidG
http://pastebin.com/NMVNv30T
--
Le doute n'est pas une condition agréable, mais la certitude est absurde.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ximian.com/pipermail/mono-devel-list/attachments/20130605/f8ee5713/attachment.html>
More information about the Mono-devel-list
mailing list