What is performing Transparent Content Negotiation in Apache - apache

I inherited a fairly complex Java web application that exhibits a mysterious behaviour and I need to know what causes it.
The application requests a file file.css. If file.css exists it is returned. If file.css does not exist, but file.css.gz does exist, the gzipped file is returned, with the following unusual headers:
Content-Location: file.css.gz
Content-Type: application/x-gzip
TCN: choice
Vary: negotiate
The presence of the TCN header means that a request was transparently negotiated, most likely by an Apache RewriteRule, but I can't find where the rule is defined. I've located and searched every Apache config file on the server (multiple files are referenced with Include) and commented-out every mention of "gzip" or ".gz". Across all config files there is only a single RewriteRule and it's for SSL. After restarting Apache I still can't disable the behaviour.
Is this the default behaviour of Apache, or does this look like the behaviour of a certain module?
The server's OS is RHEL 5.8, Apache is 2.2.

The culprit was Apache MultiViews. This was a frustrating investigation because configuring MultiViews involves no mention of RewriteRule or any of the file extensions it will automatically substitute. You have to already know about MultiViews before you can understand that it is causing this behaviour.

Related

How to configure Apache 2.4 to use 'content negotiation' in order to serve webp images?

In my HTML file there are several <img src="images/<filename>.jpeg">
The directory "images" holds these files:
<filename>.jpeg
as well as
<filename>.webp
and
<filename>.jpeg.webp
The latter two are identical webp versions of the jpeg file.
Now I want to configure Apache 2.4 on Oracle Linux 8.6 for 'content negotiation'. I am expecting that Apache returns a .webp file instead of the requested .jpeg file, if the browser supports .webp. I don't want to use the HTML <picture> tag or 'srcset' for several reasons, but leave the code untouched.
I have found several promissing configuration examples for nginx, but unfortunatly only litte on Apache:
https://gist.github.com/sergejmueller/5500879
https://stackoverflow.com/a/58857260/4335480
These two links outline 'rewrites' that are to go to the .htaccess file in the /images directory. I tried them both as '.htaccess' in the 'image' directory and it didn't work. I also put them directly in the httpd.conf and it didn't work either. And I tried these lines in the root directory's .htaccess
'AllowOverride All' is included in all section. Even the 'images' directory is explicitly listed.
In Chrome Dev Tools I verified that the request headers include 'image/webp'.
Probably not necessary: In my despair I have disabled nosniff on the Apache server and verified in the response header that it isn't set.
Whatever I try, the server only returns the jpeg file. I can verify this not only by the file name but also by the content-length field in the response header.
So what can I do to have Apache serve avif, webp and (fall back) jpeg in that order, whenever a jpeg file is requested?
Found the error myself. Note to self: don't just copy code snippets to use them. Read and understand them to find errors or identify necessary adaptions.
Vincent Orback's code is often cited for this problem, so I blindly trusted and used it: https://github.com/vincentorback/WebP-images-with-htaccess
It contains the following line:
RewriteCond %{DOCUMENT_ROOT}/$1.webp -f
The outcome is that .webp images are only searched for in the web server root directory. On my site, images are in a subdirectory called 'images'.
Trying to load an image in the browser would fail (deliver the jpeg, not the webp version):
https://<my domain>/images/<image name>.jpeg
But after altering above line to
RewriteCond %{DOCUMENT_ROOT}/images/$1.webp -f
eventually everything worked!
All the other things were unnecessary. You only need one AllowOverride All before the virtual host containers for <Directory / > and all servers and subdirectories would have .htaccess enabled, if present. For this problem, only one .htaccess in the image subdir was necessary, none in the root and no special httpd.conf entries. I turned nosniff on again. The alternativ .webp files just need the extention .webp, not .jpeg.webp

How would you block all referring domains via .htaccess while allowing a certain one through. Not IP address. Script isn't working

I need to have a .htaccess file made to the following specifications.
Allow ONLY "exampledomain.com/sometext" to access "mydomain.com/folder"
Block all other referrers and redirect them to "google.com"
But I am not sure how to go about doing this.
So far this is what I have got. But it is not working.
<If "%{HTTP_HOST} != 'google.com'">
Redirect / http://www.yahoo.com/
</If>
Using the latest Apache. I really appreciate the help here. I have gone through a few other posts but can't seem to figure it out.
Does the 'google.com' have to include the full url or is their a way to make it be a wildcard like *google.com*?
This should point you into the right direction:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^mydomain\.com$
RewriteCond %{HTTP_REFERER} !^exampledomain\.com$
RewriteRule ^ https://www.google.com [R=301,END]
It is a good idea to start out with a 302 temporary redirection and only change that to a 301 permanent redirection later, once you are certain everything is correctly set up. That prevents caching issues while trying things out...
In case you receive an internal server error (http status 500) using the rule above then chances are that you operate a very old version of the apache http server. You will see a definite hint to an unsupported [END] flag in your http servers error log file in that case. You can either try to upgrade or use the older [L] flag, it probably will work the same in this situation, though that depends a bit on your setup.
This implementation will work likewise in the http servers host configuration or inside a distributed configuration file (".htaccess" file). Obviously the rewriting module needs to be loaded inside the http server and enabled in the http host. In case you use a distributed configuration file you need to take care that it's interpretation is enabled at all in the host configuration and that it is located in the host's DOCUMENT_ROOT folder.
And a general remark: you should always prefer to place such rules in the http servers host configuration instead of using distributed configuration files (".htaccess"). Those distributed configuration files add complexity, are often a cause of unexpected behavior, hard to debug and they really slow down the http server. They are only provided as a last option for situations where you do not have access to the real http servers host configuration (read: really cheap service providers) or for applications insisting on writing their own rules (which is an obvious security nightmare).

.htaccess file on localhost

I have a localhost on ubuntu 16. In the root localhost directory (/var/www/html/) i put this htaccess file.
AddDefaultCharset utf-8
RewriteEngine on
RewriteRule ^index?$ index.php
When I type localhost/index apache says me
The requested URL /index was not found on this server.
Is that an error in Apache configuration?
Basicly I want to make redirects to index.php in the root of my site and here I want to parse something like this localhost/cart/item/1 to array and then realize MVC. I am new in web dev and do not realy understand how can I do it, please help me.
You have to enable the interpretation of such dynamic configuration files (".htaccess" style files) first. They are disabled by default, since they slow down the server considerably. Usually it is preferable to place such rules directly in the servers static configuration files.
To enable them take a look at the AllowOverride command: https://httpd.apache.org/docs/2.4/mod/core.html#allowoverride
<Directory "/var/www/html">
AllowOverride All
</Directory>
So since you have to modify that configuration anyway... why don't you place your rewrite rules in there too? Easier, more robust and faster too...
Apart from that it sometimes is a good idea to implement rewrite rules in such way that they work in both locations, the http servers (virtual) host configuration and dynamic configuration files:
RewriteEngine on
RewriteRule ^/?index/?$ index.php [L]
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).
Maybe you need to create a folder, and then, work in your application inside there. Ex:
Create a var/www/html/project;
Put yout .htaccess inside;
Work with your application inside that folder.

How can I rewrite URLs in XML with Apache 2.4?

Apache 2.4 includes mod_proxy_html and that's great, it's catching all kinds of URLs inside the HTML coming back from the server and fixing them. But I've got a Seam app that sends back text/xml files to the client sometimes with fully qualified URLs that also need to be rewritten and mod_proxy_html doesn't fix them.
Apparently there was a mod_proxy_xml that used to exist separately from mod_proxy_html but Apache didn't include that. Is there a way to get mod_proxy_html configured to do the same thing? I need it to fix URLs in both the HTML and XML files coming back from a server.
Follow up:
I continue to fight with this and I've tried a few different solutions with no success including using mod_substitute (which somehow I'm configuring incorrectly because it never seems to substitute anything for anything) and using the force flag mod_proxy_html has to try and force it to do all files under a certain path.
This is an old question, but I just faced the same issue.
I tried with mod_proxy_html, compiled mod_proxy_xml, nothing worked.
#JonLin's suggestion is spot on, it works with mod_sed.
The only if is mod_sed is documented to work inside Directory nodes.
If you declare a Location though and do a SetOutputFilter instead of AddOutputFilter (which requires a mime type) it works beautifully.
The config that works is:
<Location "/">
SetOutputFilter Sed
OutputSed "s,http://internal:80,https://external.com,g"
</Location>

mod_wsgi and static pages (no django)

On page: http://code.google.com/p/modwsgi/wiki/FileWrapperExtension , Graham Dumpleton recommends the following:
"Do note however that for the best performance, static files should
always be served by a web server. In the case of mod_wsgi this means
by Apache itself rather than mod_wsgi or the WSGI application."
I'd like to pre-build a large number of static pages, then have a python program (running under apache/mod_wsgi 3.3/python3.1, daemon mode, no django involved) decide which of them to serve to each user. I'd like the python program to decide, for example, that this guy needs "12345.html" and have it tell Apache, "please serve static file '12345.html' to this guy", rather than having to use python to open the file, read the contents, turn it into a python string, and return it to mod_wsgi as "[output]".
Is this possible? If so, how?
If not, what's the best way to do this?
There are numerous ways one could do it.
X-Sendfile implemented by mod_xsendfile and Apache.
Location/mod_rewrite tricks using mod_wsgi daemon mode.
X-Accel-Redirect if also using nginx as front end to Apache.
Read up on (1) and (3) as more widely used options.
Update with instructions for (2).
Have the WSGI application return a 200 response with empty body and 'Location' response header with URL path to local resource hosted on same Apache server and mod_wsgi when daemon mode is being used will trigger an internal redirect to that URL.
Thus if your Apache has:
Alias /generated-files/ /some/path/
<Directory /some/path>
Order allow, deny
Allow from all
</Directory>
then generate your file as /some/path/foo.txt in file system and then have the 'Location' response header have value '/generated-files/foo.txt' and it will be served up.
Note that anything under '/generated-files' is publicly accessible. If you didn't want this and wanted it to be private and so only returnable via the specific request which generated the 'Location' response header, you need to add mod_rewrite magic that blocks access to that URL except for an internally generated sub request. That from memory needs to be something like:
RewriteCond %{IS_SUBREQ} false
RewriteRule ^/generated-files/ - [F]