Memcached with Apache - handling stale objects - apache

I'm using memcached and Apache with the following default configuration
CacheEnable socache /
CacheSocache memcache:IP:PORT
MemcacheConnTTL 30
What will the behavior be when 30 seconds expire and a request for the same URL comes in? Is there a way to configure the cache key? I.e. what are the info which make a request unique?
What if the server can't get an answer? (like timeout to fetch the newly updated object) Can it be configured to serve the old object?
Thanks

What will the behavior be when 30 seconds expire and a request for the same URL comes in
Apache would simply create a new connection to memcached. It doesn’t mean something would happen to the data stored in memcached
https://httpd.apache.org/docs/2.4/mod/mod_socache_memcache.html#memcacheconnttl
Set the time to keep idle connections with the memcache server(s)
alive (threaded platforms only).
If you need to control for how long an object will be stored in a cache, check out CacheDefaultExpire
Is there a way to configure the cache key
An url is used to build the key, but you can partially configure which parts of the url are used, check out
CacheIgnoreQueryString, CacheIgnoreURLSessionIdentifiers
I.e. what are the info which make a request unique
https://httpd.apache.org/docs/2.4/mod/mod_cache.html#cacheenable
The CacheEnable directive instructs mod_cache to cache urls at or
below url-string
Notice that not all requests can be cached, there’s a lot of rules on it
What if the server can't get an answer? Can it be configured to serve the old object
You need CacheStaleOnError
https://httpd.apache.org/docs/2.4/mod/mod_cache.html#cachestaleonerror
When the CacheStaleOnError directive is switched on, and when stale
data is available in the cache, the cache will respond to 5xx
responses from the backend by returning the stale data instead of the
5xx response

Related

How to throttle requests to sites instead of to proxy server in scrapy?

I am using a proxy and have set AUTO_AUTOTHROTTLE_ENABLED to True. I had the impression that scrapy throttles the sites which I am crawling, instead it seems like scrapy throttles requests to proxy itself. How do I throttle requests to sites instead of proxy?
Update: I am manually setting proxy in meta while making each request, instead of using the proxy middleware.
I don't think this is possible to do solely from the spider side. By looking at the throttling algorithm and at the AutoThrottle extension source code, you can see that the delay being used is the time difference between sending a request and getting back a response. Everything that happens in between is added up to this delay (including the proxy delay).
To further verify this, consider the steps:
AutoThrottle is using latency information from the response, found
in the response.meta['download_latency] (see here)
The latency information ('download_latency') is set in the dedicated callback once the download is completed, by subtracting the start time from the current time (see here).
The start time is actually set just before the download agent is instructed to download the request, which means everything in between is added up to the final latency (see here).
If you want to actually throttle according to target latency through a proxy, this will have to be handled by the proxy itself. I suggest using some of the managed proxy pool solutions like Crawlera.

How to forcefully flush HTTP headers in Apache httpd?

I need to periodically generate HTTP headers for clients and those headers need to be flushed to the client directly after one header is created. I can't wait for a body or anything else, I create a header and I want that Apache httpd sends it to the client.
I've already tried using autoflush, manual flush, large header data around 8k of data, disabled deflate modules and whatever could stand in may way, but httpd seems to ignore my wished until all headers are created and only afterwards flushes them. Depending on how fast I generate headers, the httpd process even increases memory to some hundreds of megabytes, so seems to buffer all headers.
Is there any way to get httpd to flush individual headers or is it impossible?
The answer is using NPH-scripts, which by default bypass the buffer of the web server. One needs to name the script nph-* and normally a web server should stop buffering headers and send them directly as they are printed and how they are. This works in my case, though using Apache httpd one needs to be careful:
Apache2 sends two HTTP headers with a mapped "nph-" CGI

Does Apache log the http requests for same resource from different clients at same point of time

I have create custom access log file and logging all the requests into the file.
Doubt Scenario1
When Does apache log to the access file??
when request hit the server or when the server responded with response?
Doubt Scenario2
Requesting same resource from different clients at the same time . How apache serves the requests to all the clients?? Does the server logs the same time for all the requests??
My Point is .... will the requests from different clients to apache log at same time, does apache logs all the records with the same time into custom access log file(without any millisecond differences at least)?
1) Apache will write to the log when the response has finished being sent, it has to wait until this point as you can configure the response time to be written to the log file.
2) A log line will be written for each individual request, so multiple clients requesting the same resource at the same time will each create an entry in the log file. Depending on how granular your time string is configured in the custom log, these may all appear with an equal date/time.

Apache Mod_cache configuration with Tomcat and Max-age directives

I have the following system configured:
Tomcat -> Apache
Now, I have some URLs on which I have Max-Age, LastModified and Etags set.
My expectation is when Client1 makes a call to the server, the page should get served from tomcat, but should get cached in the mod_cache module of Apache. So that when next client makes a call, the page is served from Apache and it doesnt have to hit the Tomcat server, if the page is still fresh. If the page isnt fresh, Apache should make a Conditional Get to validate the content it has.
Can someone tell me if there is any fundamental mistake in this thinking? It doesnt happen to work that way. In my case, when client2 makes a call, it goes straight to the Tomcat server(not even a Conditional Get).
Is my thinking incorrect or my Apache configuration incorrect?! Thanks
The "What can be cached" section of the docs has a good summary of factors - such as response codes, GET request, presence of Authorization header and so on - which permit caching.
Also, set the Apache LogLevel to debug in httpd.conf, and you will get a clear view of whether or not each request got cached. Check the error logs.
You should be able to trace through what is happening based on these two.

HTTP caching headers settings weblogic

Does anyone know how to modify weblogic settings to set the HTTP cache header to a far future date?
For example in my current setup weblogic sets the http cache headers to expire in 5 hours (as a response of HTTP/1.1 304 Not Modified).
This is the cache header value on a .gif file ... Date: Tue, 16 Mar 2010 20:39:13 GMT.
I have re-checked and it's always 5 hours. There must be some for of settings that I can tweak to change it.
Thanks for your time!
You can use this property :
<wls:container-descriptor>
<wls:resource-reload-check-secs>-1</wls:resource-reload-check-secs>
</wls:container-descriptor>
The element is used to perform metadata caching for cached resources that are found in the resource path in the Web application scope. This parameter identifies how often WebLogic Server checks whether a resource has been modified and if so, it reloads it.
The value -1 means metadata is cached but never checked against the disk for changes. In a production environment, this value is recommended for better performance.
Static content is served by a weblogic.servlet.FileServlet that all web applications have by default but I couldn't find any way to configure HTTP headers. So either replace this servlet with your own servlet or use a Filter.
But the above comment is right, using a web server to serve static content is the "right" way to go: a web server does a better job at this and the application server has other things to do than serving static files.