mod rewrite question: how to match empty path but preserve query params? - apache

AEM Cloud CMS uses apache (they call it dispatcher), but they dont allow "DirectoryIndex" so you have to do your own rewriting.
We would like to map: "www.mysite.com" to the index page: /uk/home.html with passthrough (so user sees "www.mysite.com" instead of "www.mysite.com/uk/home.html".
However, we also want to preserve parameters.
Something like this:
RewriteRule "^$" "/uk/home.html" [PT]
Will work, but presumably only if there are no query params.
Given: "www.mysite.com?hello=world" or Given: "www.mysite.com/?hello=world" or "www.mysite.com/#something"
we want "www.mysite.com?hello=world" to show in the address bar, but the contents page /uk/home.html to be displayed.
Maybe we should be using several rules?
# case: www.mysite.com?xxx
RewriteRule "^\?.*$" "/uk/home.html?$1" [PT]
# case: www.mysite.com
RewriteRule "^$" "/uk/home.html" [PT]
# case: www.mysite.com/
RewriteRule "^/$" "/uk/home.html" [PT]
# case: www.mysite.com/?xxx
RewriteRule "^/\?.*$" "/uk/home.html?$1" [PT]
# case: www.mysite.com#xxx
RewriteRule "^\#.*$" "/uk/home.html#$1" [PT]
# case: www.mysite.com/#xxx
RewriteRule "^/\#.*$" "/uk/home.html/#$1" [PT]
etc.
AEM Cloud takes around 2-4 hours to deploy to to test a one line change, so trial and error takes a very long time.

Related

htaccess rewrite rule from folder to folder with querystring

I am looking to include a rewrite rule for the following but can't seem to get it to work. I don't want to pass any query string in but I need to add one to the rule.
I want this URL:
https://example.co.uk/vehicles/
to point to:
https://example.co.uk/search-results/?category=1
but keep the first URL in the address bar.
I need to pass in a variable called category with a value.
I tried the following but it didn't work for me:
rewriterule ^vehicles/$ search-results/?category=1 [NC, L]
Any help would be appreciated.
RewriteEngine On
RewriteRule .* ? [F,L]
RewriteRule ^(.*)$ https://example.co.uk/$1 [R,L]
RewriteRule ^ad/(.*/)?([0-9]+)$ view-ad/?ad=$2 [NC,L]
RewriteRule ^vehicles/$ search-results/?category=1 [NC,L]
I managed to solve it. It was due to an Ajax load on the page.
Glad you solved your initial query, however, the following two directives in your posted .htaccess file will break your site, so presumably, these have already been removed?
RewriteRule .* ? [F,L]
RewriteRule ^(.*)$ https://example.co.uk/$1 [R,L]
The first directive simply blocks all access to your site, returning a 403 Forbidden response. And the second directive will result in a redirect loop.
Now that it works is it possible to have another rewrite rule that does this https://example.co.uk/vehicles/?something=1 and rewrite to https://example.co.uk/vehicles/?category=1&appendsomething=1 but only display https://example.co.uk/vehicles/
I assume you mean https://example.co.uk/search-results/?category=1&something=1 (as opposed to /vehicles/) - where something=1 is appended on the end of the query string?
You wouldn't be able to make this "display as https://example.co.uk/vehicles/" - as this would conflict with your existing (working) directive.
However, you could potentially modify your existing directive to handle requests for /?something=1 and pass this through to the substitution. This would simply require the addition of the QSA flag (Query String Append). For example:
RewriteRule ^vehicles/$ search-results/?category=1 [QSA,NC,L]
The QSA flag results in the query string from the request being appended to the end of the query string specified in the RewriteRule substitution.
UPDATE: To redirect HTTP to HTTPS, you would need something like the following instead:
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://example.co.uk/$1 [R,L]
Note the preceding RewriteCond directive - this ensures that only HTTP requests are redirected, not everything (HTTP and HTTPS), so avoiding a redirect loop. Ultimately this should also be a 301 (permanent) redirect, so you should change R to R=301, but only when you are sure it's working OK.

mod_rewrite 301 redirect from old urls to new

Website has changed its url names due to SEO reasons, e.g. it was:
/category/filter1/f00/filter2/123/filter3/100-500/filter4/36.html
now:
/category/color/red/size/big/price/100-500/style/classic.html
I know the old and new names, they're fixed. Please help me to build a rewrite rule which will result in 301 redirect from old urls to new. I did research and I see that I cannot make it using RewriteMap for example, so I ended up making something like RewriteRule (.*)filter1(.*) $1color$2 [L] etc. Not only I don't like the way it looks, but also it doesn't give me a 301 redirect.
UPDATE: Note that at the moment I have several rules, one per filter name/value, e.g.:
RewriteEngine on
# make sure it's a catalog URL, not anything else
RewriteCond %{REQUEST_URI} !^/(category1|category2|category3|category4)
RewriteRule .* - [L]
# rewrite filter names
RewriteRule (.*)filter1(.*) $1color$2 [L]
RewriteRule (.*)filter2(.*) $1price$2 [L]
...etc...
It works as expected - changing all the names in URL, but setting R flag causes the stop on first rule and redirect to URL like:
/var/www/vhosts/site/htdocs/category/color/red/filter2/123/ etc...
I separated rules because any of filters may or may not exist in the URL. I will greatly appreciate the better solution.
Here is my own answer: it is possible to do with environment variables. We need to replace old filter names and values with new ones, and then make only one 301 redirect to new URL. Here what I've done using mod_rewrite and environment variables:
RewriteEngine on
RewriteRule /filter1/ - [E=filters:/color/]
RewriteRule /f00[.\/] - [E=filters:%{ENV:filters}red]
RewriteRule /0f0[.\/] - [E=filters:%{ENV:filters}green]
RewriteRule /00f[.\/] - [E=filters:%{ENV:filters}blue]
RewriteRule /filter2/ - [E=filters:%{ENV:filters}/size/]
RewriteRule /123[.\/] - [E=filters:%{ENV:filters}big]
RewriteRule /32[.\/] - [E=filters:%{ENV:filters}small]
RewriteRule /filter3/([^/^\.]+) - [E=filters:/price/$1]
RewriteRule /filter4/ - [E=filters:%{ENV:filters}/style/]
RewriteRule /36[.\/] - [E=filters:%{ENV:filters}classic]
RewriteRule /37[.\/] - [E=filters:%{ENV:filters}urban]
RewriteCond %{REQUEST_URI} ^/(category1|category2|category3|category4)/
RewriteCond %{ENV:filters} !^$
RewriteRule ^([^/]+)/ /$1%{ENV:filters}.html [L,R=301]
Basically, I've reformatted whole the URL in environment variable filters then checked if it's a category and not some else part of the website, and finally made redirect to this category+filters variable, appended .html at the end.
Even though the new URL looks prettier to a human, I'm not sure if there's a need to change the existing URL for SEO reasons.
To get a redirect instead of a rewrite, you must use the R|redirect flag. So your rule would look like
RewriteRule (.*)filter1(.*) $1color$2 [R,L]
But if you have multiple redirects, this might impact your SEO results negatively, see Chained 301 redirects should be avoided for SEO , but Google will follow 2 or 3 stacked redirects
Remember that ideally you shouldn’t have any stacked redirects or even a single redirect if you can help it, but if required Google will follow chained redirects
But every additional redirect will make it more likely that Google won’t follow the redirects and pass PageRank
For Google keep it to two and at a maximum three redirects if you have to
Bing may not support chained redirects at all
This means try to replace multiple filters at once
RewriteRule ^(.*)/filter1/(.*)/filter2/(.*)$ $1/color/$2/size/$3 [R,L]
and so on.
When the filters may come in an arbitrary order, you may use several rules and do a redirect at the end
RewriteRule ^(.*)filter1(.*)$ $1color$2 [L]
RewriteRule ^(.*)filter2(.*)$ $1price$2 [L]
RewriteRule ^(.*)filter3(.*)$ $1size$2 [L]
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule ^ %{REQUEST_URI} [R,L]
RewriteCond with REDIRECT_STATUS is there to prevent an endless loop.
When it works as it should, you may replace R with R=301. Never test with R=301.
A final note, be very careful with these experiments. I managed to kill my machine twice (it became unresponsive and I had to switch off) during tests.

Looking for a RewriteBase equivalent using RewriteRule

I'm facing an Apache configuration issue which can be summarized like follows.
On a unique hosting system I have a lot of different test sites, each one in its own subdirectory, so they are accessible through an url like myhostname.fr/sitename.
Hence in the corresponding .htaccess, the common practice is to have a RewriteBase /sitename before any of the RewriteCond+RewriteRule sets, and it works fine.
Now for one of these sites (say in the specialsite subdirectory) I had to create a dedicated domain so the url looks like domainname.myhostname.fr.
Then for this site to work the .htaccess now needs RewriteBase / instead of RewriteBase /specialsite, and it works fine too.
Here is the trick: being not so familiar with Apache I decided to experiment and wanted to also keep allowed to access this site through the common url myhostname.fr/specialsite.
So I had to find a way to conditionally use one of the above RewriteBase, depending on which is the current url.
The first way I tried was to work like this:
<If "%(HTTP_HOST) =~ domainname\.myhostname\.fr">
RewriteBase /
</If>
<If "%(HTTP_HOST) =~ myhostname\.fr/specialsite">
RewriteBase /specialsite
</If>
But I got a HTTP 500 error, and I take much time to understand that the <If> directive is available as of Apache 2.4, while my hosting only offers Apache 1.3!
So (thanks to some other SO answers) I thinked to another way, which is to first do:
RewriteCond %{HTTP_HOST} domainname\.myhostname\.fr
RewriteRule ^ - [E=VirtualRewriteBase:/]
RewriteCond %{HTTP_HOST} myhostname\.fr/specialsite
RewriteRule ^ - [E=VirtualRewriteBase:/specialsite/]
Then prepend all further RewriteRule replacement with the given VirtualRewriteBase, like in this one:
RewriteRule ^ %{ENV:VirtualRewriteBase}index.php [L]
But while it works fine for the domain-access version, it gives me an HTTP 404 error for the subdirectory-access version.
So in order to watch at how the replacement applied I changed the above rule for:
RewriteRule ^ %{ENV:VirtualRewriteBase}index.php [R,L]
And I observed that the redirected url looked like this:
http://myhostname.fr/kunden/homepages/7/d265580839/htdocs/specialsite/index.php
where kunden/homepages/7/d265580839/htdocs/ is the full document-root of my hosting.
You can notice that the document-root has been inserted between the two parts of the original url.
Moreover, the result is exactly the same whatever I put in place of /specialsite/ in my VirtualRewriteBase!
So here is my main question: why and how does this happen?
Also I'm obviously interested to a possible alternative solution to achieve the double-access availibility.
But above all I would like to understand...
But while it works fine for the domain-access version, it gives me an
HTTP 404 error for the subdirectory-access version.
That's because your second condition is never matched. Indeed, HTTP_HOST only contains the... http host ! The /specialsite is part of the REQUEST_URI (or can also be matched in RewriteRule directly).
This code should work (anyway, i don't know if it would solve totally your problem, but that's a first step)
RewriteCond %{HTTP_HOST} ^domainname\.myhostname\.fr$ [NC]
RewriteRule ^ - [E=VirtualRewriteBase:/]
RewriteCond %{HTTP_HOST} ^myhostname\.fr$ [NC]
RewriteCond %{REQUEST_URI} ^/specialsite(?:/|$) [NC]
RewriteRule ^ - [E=VirtualRewriteBase:/specialsite/]

mod_rewrite this - multiple conditions or can I just pass all vars over together?

Due to a reverse proxy setup I'm having to pass an extra query var which the proxy can't using mod_rewrite. The proxy is at /search however I'm using /find on all pages as a mod_rewrite to /search to pas the query var s=gsacollection.
See example:
# Direct link to search which passes collection var
# eg http://www.domain.com/find
RewriteRule ^find$ /search?s=gsacollection [NC]
#Rewrite all query vars
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^find(.*)$ /search?%1 [NC,L]
I'm trying to capture multiple variables for mod_rewrite that are being sent. The issue is I don't always know which ones are being sent over. This is an attempt to blanket capture them. Suggestions?
I want to pass all the query strings after /find? to /search?
EG here are some sample URIs coming in:
find?q=test&sort=date:D:L:d1&num=10&s=gsacollection&l=en&start=10
find?q=tfsa&sort=date:D:L:d1&num=10&s=gsacollection&l=en&filter=0
find?q=tfsa&filter=0&num=10&s=gsacollection&l=en&sort=date%3AD%3AS%3Ad1
If a blanket capture won't work then I will have to look at setting up multiple RewriteCond rules, wondering if there's a way I can combine these in a way I can pass vars from each condition to build the rewrite rule (eg group)?
# Grab everything after /find and replace with /search if these query vars exist
RewriteCond %{QUERY_STRING} q=(.*) [AND]
RewriteCond %{QUERY_STRING} s=(.*)
RewriteRule ^find(.*)$ /search$1
Try using this code in your .htaccess file under $DOCUMENT_ROOT:
Options +FollowSymLinks -MultiViews
RewriteEngine on
RewriteOptions MaxRedirects=10
RewriteRule ^find/?$ search?s=gsacollection [QSA,L,NC]
Make sure you don't have any other conflicting mod_rewrite rule here. QSA flag will make sure to append all query parameters to merge with s=gsacollection parameter.

Hidden features of mod_rewrite

There seem to be a decent number of mod_rewrite threads floating around lately with a bit of confusion over how certain aspects of it work. As a result I've compiled a few notes on common functionality, and perhaps a few annoying nuances.
What other features / common issues have you run across using mod_rewrite?
Where to place mod_rewrite rules
mod_rewrite rules may be placed within the httpd.conf file, or within the .htaccess file. if you have access to httpd.conf, placing rules here will offer a performance benefit (as the rules are processed once, as opposed to each time the .htaccess file is called).
Logging mod_rewrite requests
Logging may be enabled from within the httpd.conf file (including <Virtual Host>):
# logs can't be enabled from .htaccess
# loglevel > 2 is really spammy!
RewriteLog /path/to/rewrite.log
RewriteLogLevel 2
Common use cases
To funnel all requests to a single point:
RewriteEngine on
# ignore existing files
RewriteCond %{REQUEST_FILENAME} !-f
# ignore existing directories
RewriteCond %{REQUEST_FILENAME} !-d
# map requests to index.php and append as a query string
RewriteRule ^(.*)$ index.php?query=$1
Since Apache 2.2.16 you can also use FallbackResource.
Handling 301/302 redirects:
RewriteEngine on
# 302 Temporary Redirect (302 is the default, but can be specified for clarity)
RewriteRule ^oldpage\.html$ /newpage.html [R=302]
# 301 Permanent Redirect
RewriteRule ^oldpage2\.html$ /newpage.html [R=301]
Note: external redirects are implicitly 302 redirects:
# this rule:
RewriteRule ^somepage\.html$ http://google.com
# is equivalent to:
RewriteRule ^somepage\.html$ http://google.com [R]
# and:
RewriteRule ^somepage\.html$ http://google.com [R=302]
Forcing SSL
RewriteEngine on
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://example.com/$1 [R,L]
Common flags:
[R] or [redirect] - force a redirect (defaults to a 302 temporary redirect)
[R=301] or [redirect=301] - force a 301 permanent redirect
[L] or [last] - stop rewriting process (see note below in common pitfalls)
[NC] or [nocase] - specify that matching should be case insensitive
Using the long-form of flags is often more readable and will help others who come to read your code later.
You can separate multiple flags with a comma:
RewriteRule ^olddir(.*)$ /newdir$1 [L,NC]
Common pitfalls
Mixing mod_alias style redirects with mod_rewrite
# Bad
Redirect 302 /somepage.html http://example.com/otherpage.html
RewriteEngine on
RewriteRule ^(.*)$ index.php?query=$1
# Good (use mod_rewrite for both)
RewriteEngine on
# 302 redirect and stop processing
RewriteRule ^somepage.html$ /otherpage.html [R=302,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# handle other redirects
RewriteRule ^(.*)$ index.php?query=$1
Note: you can mix mod_alias with mod_rewrite, but it involves more work than just handling basic redirects as above.
Context affects syntax
Within .htaccess files, a leading slash is not used in the RewriteRule pattern:
# given: GET /directory/file.html
# .htaccess
# result: /newdirectory/file.html
RewriteRule ^directory(.*)$ /newdirectory$1
# .htaccess
# result: no match!
RewriteRule ^/directory(.*)$ /newdirectory$1
# httpd.conf
# result: /newdirectory/file.html
RewriteRule ^/directory(.*)$ /newdirectory$1
# Putting a "?" after the slash will allow it to work in both contexts:
RewriteRule ^/?directory(.*)$ /newdirectory$1
[L] is not last! (sometimes)
The [L] flag stops processing any further rewrite rules for that pass through the rule set. However, if the URL was modified in that pass and you're in the .htaccess context or the <Directory> section, then your modified request is going to be passed back through the URL parsing engine again. And on the next pass, it may match a different rule this time. If you don't understand this, it often looks like your [L] flag had no effect.
# processing does not stop here
RewriteRule ^dirA$ /dirB [L]
# /dirC will be the final result
RewriteRule ^dirB$ /dirC
Our rewrite log shows that the rules are run twice and the URL is updated twice:
rewrite 'dirA' -> '/dirB'
internal redirect with /dirB [INTERNAL REDIRECT]
rewrite 'dirB' -> '/dirC'
The best way around this is to use the [END] flag (see Apache docs) instead of the [L] flag, if you truly want to stop all further processing of rules (and subsequent passes). However, the [END] flag is only available for Apache v2.3.9+, so if you have v2.2 or lower, you're stuck with just the [L] flag.
For earlier versions, you must rely on RewriteCond statements to prevent matching of rules on subsequent passes of the URL parsing engine.
# Only process the following RewriteRule if on the first pass
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ...
Or you must ensure that your RewriteRule's are in a context (i.e. httpd.conf) that will not cause your request to be re-parsed.
if you need to 'block' internal redirects / rewrites from happening in the .htaccess, take a look at the
RewriteCond %{ENV:REDIRECT_STATUS} ^$
condition, as discussed here.
The deal with RewriteBase:
You almost always need to set RewriteBase. If you don't, apache guesses that your base is the physical disk path to your directory. So start with this:
RewriteBase /
Other Pitfalls:
1- Sometimes it's a good idea to disable MultiViews
Options -MultiViews
I'm not well verse on all of MultiViews capabilities, but I know that it messes up my mod_rewrite rules when active, because one of its properties is to try and 'guess' an extension to a file that it thinks I'm looking for.
I'll explain:
Suppose you have 2 php files in your web dir, file1.php and file2.php and you add these conditions and rule to your .htaccess :
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ file1.php/$1
You assume that all urls that do not match a file or a directory will be grabbed by file1.php. Surprise! This rule is not being honored for the url http://myhost/file2/somepath. Instead you're taken inside file2.php.
What's going on is that MultiViews automagically guessed that the url that you actually wanted was http://myhost/file2.php/somepath and gladly took you there.
Now, you have no clue what just happened and you're at that point questioning everything that you thought you knew about mod_rewrite. You then start playing around with rules to try to make sense of the logic behind this new situation, but the more you're testing the less sense it makes.
Ok, In short if you want mod_rewrite to work in a way that approximates logic, turning off MultiViews is a step in the right direction.
2- enable FollowSymlinks
Options +FollowSymLinks
That one, I don't really know the details of, but I've seen it mentioned many times, so just do it.
Equation can be done with following example:
RewriteCond %{REQUEST_URI} ^/(server0|server1).*$ [NC]
# %1 is the string that was found above
# %1<>%{HTTP_COOKIE} concatenates first macht with mod_rewrite variable -> "test0<>foo=bar;"
#RewriteCond search for a (.*) in the second part -> \1 is a reference to (.*)
# <> is used as an string separator/indicator, can be replaced by any other character
RewriteCond %1<>%{HTTP_COOKIE} !^(.*)<>.*stickysession=\1.*$ [NC]
RewriteRule ^(.*)$ https://notmatch.domain.com/ [R=301,L]
Dynamic Load Balancing:
If you use the mod_proxy to balance your system, it's possible to add a dynamic range of worker server.
RewriteCond %{HTTP_COOKIE} ^.*stickysession=route\.server([0-9]{1,2}).*$ [NC]
RewriteRule (.*) https://worker%1.internal.com/$1 [P,L]
A better understanding of the [L] flag is in order. The [L] flag is last, you just have to understand what will cause your request to be routed through the URL parsing engine again. From the docs (http://httpd.apache.org/docs/2.2/rewrite/flags.html#flag_l) (emphasis mine):
The [L] flag causes mod_rewrite to stop processing the rule set. In
most contexts, this means that if the rule matches, no further rules
will be processed. This corresponds to the last command in Perl, or
the break command in C. Use this flag to indicate that the current
rule should be applied immediately without considering further rules.
If you are using RewriteRule in either .htaccess files or in <Directory> sections, it is important to have some understanding of
how the rules are processed. The simplified form of this is that once
the rules have been processed, the rewritten request is handed back to
the URL parsing engine to do what it may with it. It is possible that
as the rewritten request is handled, the .htaccess file or <Directory>
section may be encountered again, and thus the ruleset may be run
again from the start. Most commonly this will happen if one of the
rules causes a redirect - either internal or external - causing the
request process to start over.
So the [L] flag does stop processing any further rewrite rules for that pass through the rule set. However, if your rule marked with [L] modified the request, and you're in the .htaccess context or the <Directory> section, then your modifed request is going to be passed back through the URL parsing engine again. And on the next pass, it may match a different rule this time. If you don't understand what happened, it looks like your first rewrite rule with the [L] flag had no effect.
The best way around this is to use the [END] flag (http://httpd.apache.org/docs/current/rewrite/flags.html#flag_end) instead of the [L] flag, if you truly want to stop all further processing of rules (and subsequent reparsing). However, the [END] flag is only available for Apache v2.3.9+, so if you have v2.2 or lower, you're stuck with just the [L] flag. In this case, you must rely on RewriteCond statements to prevent matching of rules on subsequent passes of the URL parsing engine. Or you must ensure that your RewriteRule's are in a context (i.e. httpd.conf) that will not cause your request to be re-parsed.
Another great feature are rewrite-map-expansions. They're especially useful if you have a massive amout of hosts / rewrites to handle:
They are like a key-value-replacement:
RewriteMap examplemap txt:/path/to/file/map.txt
Then you can use a mapping in your rules like:
RewriteRule ^/ex/(.*) ${examplemap:$1}
More information on this topic can be found here:
http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#mapfunc
mod_rewrite can modify aspects of request handling without altering the URL, e.g. setting environment variables, setting cookies, etc. This is incredibly useful.
Conditionally set an environment variable:
RewriteCond %{HTTP_COOKIE} myCookie=(a|b) [NC]
RewriteRule .* - [E=MY_ENV_VAR:%b]
Return a 503 response:
RewriteRule's [R] flag can take a non-3xx value and return a non-redirecting response, e.g. for managed downtime/maintenance:
RewriteRule .* - [R=503,L]
will return a 503 response (not a redirect per se).
Also, mod_rewrite can act like a super-powered interface to mod_proxy, so you can do this instead of writing ProxyPass directives:
RewriteRule ^/(.*)$ balancer://cluster%{REQUEST_URI} [P,QSA,L]
Opinion:
Using RewriteRules and RewriteConds to route requests to different applications or load balancers based on virtually any conceivable aspect of the request is just immensely powerful. Controlling requests on their way to the backend, and being able to modify the responses on their way back out, makes mod_rewrite the ideal place to centralize all routing-related config.
Take the time to learn it, it's well worth it! :)