HTTP Content-type header for cached files - apache

Using Apache with mod_rewrite, when I load a .css or .js file and view the HTTP headers, the Content-type is only set correctly the first time I load it - subsequent refreshes are missing Content-type altogether and it's creating some problems for me.
I can get around this by appending a random query string value to the end of each filename, eg. http://www.site.com/script.js?12345
However, I don't want to have to do that, since caching is good and all I want is for the Content-type to be present. I've tried using a RewriteRule to force the type but still didn't solve the problem. Any ideas?
Thanks, Brian

The answer depends on information you've not provided here, specifically where are you seeing these headers?
Unless it's from sniffing the network traffic between the browser and client, then you can't be sure if you are looking at a real request to the server or a request which has been satisfied from the cache. Indeed changing the URL as you describe is a very simple way to force a reload from the server rather than a load from the cache.
I don't think its as broken as you seem to. Fire up Wireshark and see for yourself - or just disable caching for these content types.
C.

Related

MIME type conflict with TYPO3 compressed CSS and JS resources

I am rather new to TYPO3. Recently I noticed some very weird behavior in my installation: Some CSS-files in the directory typo3temp/assets/compressed got the MIME-type text/html instead of the expected text/css. Therefore my browser received a 403 Forbidden status code from the webserver for these resources. That resulted in some parts of the backend being shown without styling.
I tried clearing all caches and deleting the typo3temp/assets/compressed directory, however now all the stuff in there (CSS and JS) is served with MIME-type text/html. Getting the backend without JavaScript means, that I am now basically locked out of the backend. I can however still reach and use the install tool.
Do you have any ideas how this might happen and how to fix it?
Some details of my setup:
TYPO3 v10.4.13 (recently updated from 10.4.9)
Apache web server (I don't have access to its config and have to rely on .htaccess files)
I suggest to set
TYPO3_CONF_VARS/FE/compressionLevel=0
TYPO3_CONF_VARS/BE/compressionLevel=0
in order not have these kind of problems. The problem is that this compression creates compressed files but relies on webserver configuration in order to deliver them as text/css and NOT applying the default webserver's transport compression to them (or they could end up double-compressed and you might not even easily notice - some browsers can deal with that, others not).
It is a kind of micro-optimization that sounded useful in times when we avoided https:// because of the processing overhead...
Here's some docs (the first statement is outdated in my oppinion): https://docs.typo3.org/m/typo3/reference-skinning/master/en-us/BackendCssApi/CssCompression/Index.html

Fail2Ban ignore 404 of local redirect

Assume a bad actor scripts access to an Apache server to probe for vulnerabilities. With Fail2Ban we can catch some number of 404's and ban the IP. Now assume a single web page has a bad local reference to a CSS, JS, or image file. Repeated hits by the same legitimate site visitor will result in some number of 404s, and possibly an IP ban.
Is there a good way to separate these local requests from remote so that we don't ban the valued visitor?
I know all requests are remote, in that a page gets returned to a browser and the content of the page triggers more requests for assets. The thing is, how do we know the difference between that kind of page load pattern, and a script query for the same resource?
If we do know that a request is coming in based on a link that we just generated, we could do a 302 redirect rather than returning a 404, thus avoiding the banning process.
The HTTP Referer header can be used. If the Refer is the same origin as the requested page, or the same as the local site FQDN then we should not ban. But that header can be spoofed. So is this a good tool to use?
I'm thinking cookies can be used, or a session nonce, where a request might come in for assets from a page without a current session cookie. But I don't know if something like that is a built-in feature.
The best solution is obviously to make sure that all pages generated on a site include a valid reference back to the site, but we all know that's not possible. Some CMS add version info to files, or they adjust image paths to include an image size based on the client device/size. Any of these generated headers might simply be wrong until we can find and fix the code that creates them. Between the time we deploy something faulty and the time we fix it, I'm concerned about accidentally banning legitimate visitors with Fail2Ban (and other tools) that do not factor in where the request originates.
Is there another solution to this challenge? Thanks!
how do we know the difference between that kind of page load pattern
You don't in normal case (at least without some white- or black-list).
But you know URI- or paths segments, file extensions etc which would be rather never a target of such attack vectors, which you can ignore.
Some CMS add version info to files, or they adjust image paths to include an image size based on the client device/size.
But you surely knows the prefixes that where correct, so an RE allowing some paths segments would be possible. For instance this one:
# regex ignoring site and cms paths:
^<HOST> -[^"]*\"[A-Z]{3,}\s+/(?!site/|cms/)\S+ HTTP/[^"]+" 40\d\s\d+
will ignore this one:
192.0.2.1 - - [02/Mar/2021:18:01:06] "GET /site/style.css?ver=1.0 HTTP/1.1" 404 469
and match this one:
192.0.2.1 - - [02/Mar/2021:18:01:06] "GET /xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 469
Similar you can write an regex with negative lookahead to ignore certain extensions like .css or .js or arguments like ?ver=1.0.
Another possibility would be to make a special fallback location logging completely worse requests in special log-file (not into access or error logs), like described in wiki :: Best practice so this way it would be possible to consider evildoers with definitely wrong URIs did not matching any proper location which can be handled by web server.
Or simply disable logging of 404 in known as valid locations (paths, prefixes, extensions whatever).
To ensure or completely avoid false positives you can firstly increase maxretry or reduce findtime and observe it a bit (so evildoers with too many attempts going banned and legitimate users with "broken" requests causing 404 but with not so large count of them will be still ignored). So you can cumulate whole list of "valid" 404 request of your application (in order to write more precise regex or filter it in some locations).

Browser cache, Last-Modified header

Let's assume there are two URLs:
http://example.com/myaccount?user=12345
http://example.com/myaccount?user=34567
As far as I understand the browser will cache them separately and will not use the Last-Modified header from the first request to revalidate the second.
Is it possible to force the browser to use the Last-Modified header in this case?
Could you please explain why does it work this way?
I think that in certain situations the web server will ignore the query string. You can try for instance on a out-of-the-box Apache server do curl -I http://example.com/styles.css | grep Last-Modified and then run the same command for http://example.com/styles.css?v=2. Assuming the file exists, you'll probably get the same Last-Modified timestamp.
It might be the browsers (my guess is that they do that) who consider the ?v=2 as a different file or a file with an updated content. Also, I think that most Content Delivery Networks will be configured that way which allows to serve a fresh copy of a file if its query string differs.
It's an interesting question anyway. I'll read up on it more. I hope that somebody might explain this more closely here.

$_SERVER["CONTENT_LENGTH"] not set

Is there any php.ini setting which could cause isset($_SERVER["CONTENT_LENGTH"]) to never be set on one server, but work fine on another, where the application code is the same and php.ini upload settings are the same? In an uploader library I'm using, the content length check always fails because of this issue. On PHP5.3, CentOS and Apache. Thanks for any help
EDIT: I should add that in the Request Headers, Content-Length:33586 - but when trying to process $_SERVER["CONTENT_LENGTH"], it isn't set.
Content-Length is sent by the server application, it's not part of the HTTP request.
Your application is the one that will be setting that, however you should not be doing that from within PHP as PHP does this automatically.
If you're dealing with input from something like an upload, then you will only get the Content-Length if the HTTP request is not CHUNKED. When sending a chunked request, the data length is not known to the recipient until all the chunks have been sent.

AFNetworking - only load from cache if server responded http header status 304?

I have a question about using AFNetworking with SDURLCache.
In my application. I can see they work together fine, for repeated queries, responses from local cache will be used correctly.
But, let's say the server xml is updated one minute ago, in my app, cache is still used.
This doesn't match my client's expectation, the client's expectation is:
I need to check http header, if it is 200, load the file from server, if it is 304 (not modified), just use local cache from disk/
Any chance this is a built-in behaviour? Or I have to manually hack the class somewhere so that it can behave like this?