Apache mod_cache: Vary cache based on cookie values - apache

Currently, I am using mod_cache to cache the page details of a web application.
I have the cache Vary based on User-Agent and Accept-Language, since there are different payloads for those situations.
Vary: User-Agent, Accept-Language
We have plans to have region-specific information on each page, but this is where we are trying to determine our caching strategy.
We have a cookie that persists to indicate the region we geolocated for, but obviously the cache does not vary based on this cookie.
It is possible to vary based on the value for certain cookies or headers in general? (Note I say certain cookies, as we wouldn't want the session identifier to collide with this) - something like a regex match to this:
location=(.+?);

That is possible using Apache. It can parse cookie value and pass it to custom header, then you need to Vary by this header:
# Set languageC cookie value to environment variable "siteLanguage"
RewriteCond %{HTTP_COOKIE} ^.*lunetics_locale.*$ [NC]
RewriteCond %{HTTP_COOKIE} (?:^|;\s*)lunetics_locale=([^;]*) [NC]
RewriteRule ^(.*)$ - [env=siteLanguage:%1]
# If no languageC cookie present. Set "siteLanguage" environment variable to "en"
RewriteCond %{HTTP_COOKIE} !^.*lunetics_locale.*$ [NC]
RewriteRule ^(.*)$ - [env=siteLanguage:en]
# Set enviroment variable "siteLanguage" value to custom header "SiteLanguage"
RequestHeader set X-Language "%{siteLanguage}e" env=siteLanguage
and add Vary X-Language to your response headers.
I'm not sure this is a best way, I have related question and problems with this: Is it possible to vary page caches (to have cache versions) with the same url and different cookie value (language)?

Related

Mod_rewrite rules not working in .htaccess to change the URL

I'm trying to rewrite the below URL but the URLs just don't change, no errors.
Current URL:
https://example.com/test/news/?c=value1&s=value2&id=9876
Expected URL:
https://example.com/test/news/value1/value2
My .htaccess
RewriteEngine On
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
but I've seen many articles where a url such as example.com/display_article.php?articleId=my-article can be rewritten as example.com/articles/my-article for example with .htaccess
But the important point here (that I think you are missing) is that the URL must already have been changed internally in your application - in all your internal links. It is a common misconception that .htaccess alone can be used to change the format of the URL. Whilst .htaccess is an important part of this, it is only part of it.
Yes, you can implement a redirect in .htaccess to redirect from the old to new URL - and this is essential to preserve SEO (see below), but it is not critical to your application working. If you don't first change the URL in your internal links then:
The "old" URL is still exposed in the HTML source. When a user hovers over or copies the link, they are seeing and copying the "old" URL.
Every time a user clicks one of your internal links they are externally redirected to the "new" URL. This is slow for your users, bad for SEO (you should never link to a URL that is redirected) and bad for your server, as it potentially doubles the number of requests hitting your server (OK, 301s are cached locally).
To quote from #IMSoP's answer to this reference question on the subject:
Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly
So, once you have changed your internal links to the "new" (expected) format, eg. /test/news/value1/value2 (or should that be /test/news/value1/value2/id or even /test/news/id/value1/value2? See below), then you can do as follows...
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
This internally rewrites a request from /test/news/<value1>/<value2> to /test/news/?c=<value1>&s=<value2>&id=1. However, there are a couple of issues with this:
/test/news/ is not itself a valid endpoint. This requires further rewriting. Perhaps you are serving a DirectoryIndex document (eg. index.php)? This might appear seamless to you, but this requires an additional internal subrequest and makes the rule dependent on other elements of the config. You should rewrite directly to the file that handles the request. eg. /test/news/index.php?c=<value1>&s=<value2>&id=1 (remember, this is entirely hidden from the user).
You are hardcoding the id=1 parameter? Should every URL have the same id? Or should this be passed in the "new" URL (which is what I would expect)? What does the id represent? If this is critical to the routing of the URL then the id should appear earlier in the URL-path, in case the URL gets accidentally truncated when copy/pasted/shared.
If the id is required then it needs to be passed in the "new" URL. We only have the "new" URL to route the request, so the information can't be hidden.
So, if the "new" URL is now /test/news/<id>/<value1>/<value2> then the rewrite would need to be like this instead:
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]
Then (optionally*1) you can implement an external redirect in order to preserve SEO. This is for search engines that have indexed the "old" URLs or third party inbound links that cannot be updated - these need to be corrected to inform search engines of the change and get the user on the "new" canonical URL having followed an out-of-date inbound link.
(*1 It's not "optional" if you are changing an existing URL, but optional with regards to your application being functional.)
This "redirect" goes before the above rewrite:
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
The $0 backreference contains the full match from the RewriteRule pattern, ie. test/news/ in this case - this simply saves repetition.
The %1, %2 and %3 backreferences contain the values captured from the preceding condition. ie. the values of the c, s and id URL parameters respectively.
Note that the URL parameters / path segments should not be optional as in your original directive (ie. ([^/]*)). If they are optional and they are omitted, then the resulting URL becomes ambiguous. eg. <value2> becomes <value1> if <value1> is omitted.
Note that the URL parameters must be in the order as stated. If you have a mismatch of "old" URLs with these params in a different order (or even intermixed with other params) then this can be accounted for with additional complexity. (It may be easier to perform this redirect in your server-side script, instead of .htaccess.)
The first condition that checks against the REDIRECT_STATUS environment variable ensures that we only redirect direct requests and not rewritten requests by the later rewrite (which would otherwise result in a redirect loop). An alternative on Apache 2.4 is to use the END flag on the RewriteRule instead.
The QSD flag (Apache 2.4) discards the original query string from the request.
You should test first with a 302 (temporary) redirect to avoid potential caching issues and only change to a 301 (permanent) redirect once you have tested that everything works as intended. 301s are cached persistently by the browser so can make testing problematic.
Summary
Your complete .htaccess file should look something like this:
Options -MultiViews +FollowSymLinks
# If relying on the DirectoryIndex to handle the request
DirectoryIndex index.php
RewriteEngine On
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]

Best practice for a .htaccess internal path rewrite?

We have spend a considerable amount of time looking for a solution else where. We have read and tried the recommended threads. We most likely have a core misunderstanding as to why this, or something along these lines, does not work.
We get a request for a domain:
subdomain.domain.com/embed/34acb453bc4a53abc
We want to leave the URL as it is, but need to direct this to an internal vhost:
embed.example.com/34acb453bc4a53abc
Once the request is directed to this, our system can interpret the 34acb453bc4a53abc and return the appropriate data.
We tried the following (and variations of it) we just get nothing to work.
RewriteCond ^embed\/(.*)$ [NC]
RewriteRule ^ https://embed.example.com%{REQUEST_URI} [L,NE,P]
internal path rewrite
Just to clarify, you can't internally rewrite the request across different hosts. You need to configure a reverse proxy using mod_proxy and related modules. This is what the P flag on the RewriteRule directive is doing... it's passing the request to mod_proxy (providing this is already correctly configured in the server config).
RewriteCond ^embed\/(.*)$ [NC]
RewriteRule ^ https://embed.example.com%{REQUEST_URI} [L,NE,P]
However, this will send the request to https://embed.example.com/embed/34acb453bc4a53abc, not https://embed.example.com/34acb453bc4a53abc as you require.
You need to capture the part of the URL-path after /embded/ and use that instead. You are already capturing this in the RewriteCond directive, but you are not using it. You don't actually need the RewriteCond directive here.
Try the following instead:
RewriteCond %{HTTP_HOST} =subdomain.domain.com
RewriteRule ^embed/([a-z0-9]+)$ https://embed.example.com/$1 [P]
You state that the request is for subdomain.domain.com, so I've included that in the directive.
The L and NE flags are not required here. P implies L and there is nothing that requires the substitution to not be URL encoded. Slashes do not carry any special meaning in the regex, so do not need to be escaped.
I've also made the regex that matches the "code" more restrictive, rather than matching literally anything.
The $1 backreference then matches just the "code" that follows /embed/ in the URL-path.
Note that the order of directives is important. It needs to be before any directives that are likely to result in a conflict.
If the embed and subdomain hosts point to the same place on the filesystem then you can avoid the complexities and overhead of mod_proxy and simply "rewrite" the request on the same host.

Apache 2.4 Rewrite URL matching full URL

I'm using Apache 2.4 as Reverse Proxy, and I need to redirect to an URL, only if credential is passed into URL. For Example, this is my URL:
https://user:password#myserver.mydomain.com/site1.php?1
I use this Rewrite Condition:
RewriteCond %{HTTP_HOST} ^user:password#ohab\.marcolino7\.myds\.me$
But it do not match, i suppose because HTTP_HOST do not contain authentication data.
In there a way to match the URL with also authentication data and then so I can redirect?
Many Thanks
Marco
As you correctly say, that is not part of the host header, so wouldn't match like that.
You can do it using the REMOTE_USER variable. How you do it depends on the context of your rules. See the documentation and specifically the quote below.
%{LA-U:variable} can be used for look-aheads which perform an
internal (URL-based) sub-request to determine the final value of
variable. This can be used to access variable for rewriting which is
not available at the current stage, but will be set in a later phase.
For instance, to rewrite according to the REMOTE_USER variable from
within the per-server context (httpd.conf file) you must use
%{LA-U:REMOTE_USER} - this variable is set by the authorization
phases, which come after the URL translation phase (during which
mod_rewrite operates).
On the other hand, because mod_rewrite implements its per-directory
context (.htaccess file) via the Fixup phase of the API and because
the authorization phases come before this phase, you just can use
%{REMOTE_USER} in that context.
Something like this in your httpd.conf file:
RewriteCond %{LA-U:REMOTE_USER} =user
Or in .htaccess:
RewriteCond %{REMOTE_USER} =user

Content negotiation using mod_rewrite / RewriteCond rules

I have a use case is to host a set of files (same RDF content with different serialization formats such as RDF/XML, Turtle, and JSON-LD) in Github pages and use a w3id URL as a permanent identifier.
Further, I want to have content negotiation on that permanent URL. This would be trivial if I hosted my files in an Apache server but unfortunately Github pages don't support content negotiation. So I am trying to see to which extent I can do that with URL rewriting rules.
So the idea is similar to the following.
GET http://w3id.org/foo -- redirect to --> http://foo.git.io/content.ttl
Accept: text/turtle
GET http://w3id.org/foo -- redirect to --> http://foo.git.io/content.jsonld
Accept: application/ld+json
Currently my rules look like the following.
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_ACCEPT} ^.*application/rdf\+xml.*
RewriteRule ^foaf$ http://nandana.github.io/foaf/card.rdf [R=303,L]
RewriteCond %{HTTP_ACCEPT} ^.*text/turtle.*
RewriteRule ^foaf$ http://nandana.github.io/foaf/card.ttl [R=303,L]
RewriteCond %{HTTP_ACCEPT} ^.*application/ld\+json.*
RewriteRule ^foaf$ http://nandana.github.io/foaf/card.jsonld [R=303,L]
RewriteRule ^foaf$ http://nandana.github.io/foaf/card.html [R=303,L]
Though this works for majority of the cases, it would break for some corner cases. For example, if there is an accept header like the following
Accept: application/rdf+xml;q=0.9,text/turtle
This will return application/rdf+xml (because the first rule matches) though according to the content negotiation it should return turtle. Does anyone know a way to improve the rules to handle this corner case?
I think you would have to make 0-byte dummies of the actual files, e.g. foo.ttl, foo.jsonld etc and make sure their types are declared with AddType, then the regular content negotiation can work for http://example.com/foo.
But instead of serving the 0-byte files, have RewriteRule foo.ttl etc for each file so that they are redirected out to the 'real' location. Quite verbose, yes, but should then work correctly even for complex Accept headers with q= and multiple types.

How to change a cookie name with mod_rewrite?

I'm trying to change the name of a cookie that's set by an AWS ELB, but keep its value with a rewrite condition and rewrite rule.
Here's what I've managed so far:
RewriteCond %{HTTP_COOKIE} AWSELB=(^BD.*) [NC]
RewriteRule ^(.*) - [CO=SIMELB:%1:.amazonaws.com:lifetime:-1]
Obviously the RewriteRule is incorrect, but could someone help me with the right syntax?
Ok, following the comment thread, I think there's enough info to get started. Foremost, your
pattern doesn't work because of the (^BD.*) capture group, and in particular because of
the ^ anchor. Instead, capture (BD[^;]+) to grab everything up to the next ; (or the
end of the string if there isn't one).
To explicitly unset the previous cookie, other examples use the INVALID modifier, though I
cannot find the documentation for it.
Apache mod_rewrite documentation on
Cookies
RewriteCond %{HTTP_COOKIE} AWSELB=(BD[^&]+) [NC]
# Delete the old one
RewriteRule ^ - [CO=AWSELM:INVALID:.amazonaws.com:0:/:-1]
# Add the new one
# Specify your lifetime in minutes or 0 for the browser session (60 below)...
# ALso add the path
# Assumimg the -1 is for insecure cookies
RewriteRule ^ - [CO=SIMELB:%1:.amazonaws.com:60:/:-1]
For the old cookie to be successfully unset, both the domain and the path will need to
exactly match those originally set by AWS. Inspect the cookies currently being set and make
sure you match the domain & path.
And really, it isn't necessary to match BD... You could just as well do AWSELB=([^;]+) because it must only match up to the following semicolon anyway.
Addendum:
If the value is being lost, it may be because the the RewriteCond is only applied to the first subsequent matching RewriteRule. You can always just repeat the RewriteCond. This is ugly, unfortunately, but I tested it and found it to work correctly.
# no capture group the first time since you don't use it until later
RewriteCond %{HTTP_COOKIE} AWSELB=BD.+ [NC]
RewriteRule ^ - [CO=AWSELM:INVALID:.amazonaws.com:0:/:-1]
# This will continue to execute since the previous didn't have [L]
RewriteCond %{HTTP_COOKIE} AWSELB=(BD[^&]+) [NC]
RewriteRule ^ - [CO=SIMELB:%1:.amazonaws.com:60:/:-1]
(Note: you won't see the cookie value updated until a subsequent HTTP request; that is, if you tried to inspect it from your script right after setting it with Apache, the new value won't be present because the cookie header has to make a round trip back to the client)
Instead of trying to rewrite the cookie name, I tested with mod_header directives and seem to have addressed my issue with Amazon's ELB cookie breaking session affinity with another Amazon ELB.
RequestHeader edit Cookie AWSELB SIMELB
RequestHeader edit Cookie APPELB AWSELB
Header always edit Set-Cookie AWSELB APPELB
Header edit Set-Cookie AWSELB APPELB
This so far seems to work, relying on the browser to maintain the memory for me because after the retrieving the value of the first AWSELB on request, when I get the set-Cookie response back from the second AWSELB, the browser sees APPELB={value} and recalls the correct request cookie obtained from the first AWSELB.