apache redirect / Rewrite-Engine - apache

Is the following possible?
A user requests the url http://example1.com/example.php and the apache opens http:// example1.com/example.php?id=1
A user requests the url http://example2.com/example.php and the apache opens http:// example2.com/example.php?id=2
But the user should not see the id in his browser adress bar (the user should only see http://example1.com/example.php or http://example2.com/example.php).
You can say the id is invisible for the user but transfered to the example.php.
How can I implement this?

Is that the correct solution?
RewriteEngine On
RewriteRule ^/example.php http://example1.com/example.php$1 [P]
ProxyPassReverse /example.php?id=1 http:// example1.com/example.php
RewriteEngine On
RewriteRule ^/example.php http://example2.com/example.php$1 [P]
ProxyPassReverse /example.php?id=2 http:// example2.com/example.php

You have to understand several concept.
Once the server received the user requested url he can do several things
Take the requested path from the url and use it without modifications. That's the default solution
Map the requested path to any other physical path, things that can be done via Alias, AliasMatch or RewriteRules.
Map the requested path to another website while hiding the fact thtat another website is requested. That's the proxy solution, thta mod_proxy or mod_rewrite could handle (but you do not need that)
Redirect the user to another path, sending him a new url to use, making another client/server roundtrip, with Redirect instructions or mod_rewrite (the swiss knife). But you do no need that.
So you want a server-side only remapping of the requested path.
Let,s say we will use mod rewrite to make this mapping. If you check all tags available in RewriteRule (summary here) the interesting ones are:
passthrough|PT : Forces the resulting URI to be passed back to the URL mapping engine for processing of other URI-to-filename translators, such as Alias or Redirect.
qsappend|QSA: Appends any query string from the original request URL to any query string created in the rewrite target
last|L: Stop the rewriting process immediately and don't apply any more rules. Especially note caveats for per-directory and .htaccess context (see also the END flag)
nocase|NC: Makes the pattern comparison case-insensitive.
details on the PT flag shows that:
The target (or substitution string) in a RewriteRule is assumed to be a file path, by default.
Well, that,s maybe enough for you. But using PT is a good thing, if you have other apache configusation elements you should try to let them apply after mod_rewrite job.
So... assuming you may need to handle some query strings arguments and that this id argument is based on the domain name in the request, and that only the example.php script needs this behavior; you should start your research with such rules (untested):
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example1.com$ [nocase]
RewriteRule ^example\.php$ example.php?id=1 [passthrough,qsappend,last]
RewriteCond %{HTTP_HOST} ^example2.com$ [nocase]
RewriteRule ^example\.php$ example.php?id=2 [passthrough,qsappend,last]

Related

Mod_rewrite rules not working in .htaccess to change the URL

I'm trying to rewrite the below URL but the URLs just don't change, no errors.
Current URL:
https://example.com/test/news/?c=value1&s=value2&id=9876
Expected URL:
https://example.com/test/news/value1/value2
My .htaccess
RewriteEngine On
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
but I've seen many articles where a url such as example.com/display_article.php?articleId=my-article can be rewritten as example.com/articles/my-article for example with .htaccess
But the important point here (that I think you are missing) is that the URL must already have been changed internally in your application - in all your internal links. It is a common misconception that .htaccess alone can be used to change the format of the URL. Whilst .htaccess is an important part of this, it is only part of it.
Yes, you can implement a redirect in .htaccess to redirect from the old to new URL - and this is essential to preserve SEO (see below), but it is not critical to your application working. If you don't first change the URL in your internal links then:
The "old" URL is still exposed in the HTML source. When a user hovers over or copies the link, they are seeing and copying the "old" URL.
Every time a user clicks one of your internal links they are externally redirected to the "new" URL. This is slow for your users, bad for SEO (you should never link to a URL that is redirected) and bad for your server, as it potentially doubles the number of requests hitting your server (OK, 301s are cached locally).
To quote from #IMSoP's answer to this reference question on the subject:
Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly
So, once you have changed your internal links to the "new" (expected) format, eg. /test/news/value1/value2 (or should that be /test/news/value1/value2/id or even /test/news/id/value1/value2? See below), then you can do as follows...
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
This internally rewrites a request from /test/news/<value1>/<value2> to /test/news/?c=<value1>&s=<value2>&id=1. However, there are a couple of issues with this:
/test/news/ is not itself a valid endpoint. This requires further rewriting. Perhaps you are serving a DirectoryIndex document (eg. index.php)? This might appear seamless to you, but this requires an additional internal subrequest and makes the rule dependent on other elements of the config. You should rewrite directly to the file that handles the request. eg. /test/news/index.php?c=<value1>&s=<value2>&id=1 (remember, this is entirely hidden from the user).
You are hardcoding the id=1 parameter? Should every URL have the same id? Or should this be passed in the "new" URL (which is what I would expect)? What does the id represent? If this is critical to the routing of the URL then the id should appear earlier in the URL-path, in case the URL gets accidentally truncated when copy/pasted/shared.
If the id is required then it needs to be passed in the "new" URL. We only have the "new" URL to route the request, so the information can't be hidden.
So, if the "new" URL is now /test/news/<id>/<value1>/<value2> then the rewrite would need to be like this instead:
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]
Then (optionally*1) you can implement an external redirect in order to preserve SEO. This is for search engines that have indexed the "old" URLs or third party inbound links that cannot be updated - these need to be corrected to inform search engines of the change and get the user on the "new" canonical URL having followed an out-of-date inbound link.
(*1 It's not "optional" if you are changing an existing URL, but optional with regards to your application being functional.)
This "redirect" goes before the above rewrite:
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
The $0 backreference contains the full match from the RewriteRule pattern, ie. test/news/ in this case - this simply saves repetition.
The %1, %2 and %3 backreferences contain the values captured from the preceding condition. ie. the values of the c, s and id URL parameters respectively.
Note that the URL parameters / path segments should not be optional as in your original directive (ie. ([^/]*)). If they are optional and they are omitted, then the resulting URL becomes ambiguous. eg. <value2> becomes <value1> if <value1> is omitted.
Note that the URL parameters must be in the order as stated. If you have a mismatch of "old" URLs with these params in a different order (or even intermixed with other params) then this can be accounted for with additional complexity. (It may be easier to perform this redirect in your server-side script, instead of .htaccess.)
The first condition that checks against the REDIRECT_STATUS environment variable ensures that we only redirect direct requests and not rewritten requests by the later rewrite (which would otherwise result in a redirect loop). An alternative on Apache 2.4 is to use the END flag on the RewriteRule instead.
The QSD flag (Apache 2.4) discards the original query string from the request.
You should test first with a 302 (temporary) redirect to avoid potential caching issues and only change to a 301 (permanent) redirect once you have tested that everything works as intended. 301s are cached persistently by the browser so can make testing problematic.
Summary
Your complete .htaccess file should look something like this:
Options -MultiViews +FollowSymLinks
# If relying on the DirectoryIndex to handle the request
DirectoryIndex index.php
RewriteEngine On
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]

Rewrite URL in XAMPP VirtualHost

Been trying to play with mod_rewrite through .htaccess in one Xampp virtualhost, but I am not getting the results that I am looking for.
What I am tring to do is to rewrite the following: www.example.com/name/billy.html to: www.example.com/billy
Without trying to rewrite the URLs the virtualhost is working fine, I have access to all pages. However, when I add the .htaccess with the corresponding rewrite rule I get a 404 page not found. The regex is working as expected though. I see that the request to www.example.com/name/billy.html it's been rewritten to www.example.com/billy, but the page doesn't load.
The name folder exists in the file structure and the .htaccess is inside the example folder.
Currently, my vh configuration looks like this:
<VirtualHost *:80>
DocumentRoot "/Applications/XAMPP/xamppfiles/htdocs/example"
ServerName www.example.com
<Directory "/Applications/XAMPP/xamppfiles/htdocs/example">
Options Indexes FollowSymLinks ExecCGI Includes
AllowOverride All
Require all granted
</Directory>
</VirtualHost>
And this is the content of the .htaccess file:
RewriteEngine On
RewriteRule ^name\/([a-z]+).html$ /$1 [L,NC,R]
What is missing?
What I am tring to do is to rewrite the following: www.example.com/name/billy.html to: www.example.com/billy
You seem to have the process the wrong way round and possibly mixing up "rewrites" and "redirects"?
You should be internally rewriting from the visible "friendly" URL to the underlying file-path that actually handles the request. You are trying to do the opposite here. How is your system expected to handle a request for /billy? (It doesn't, and generates a 404.)
You may be thinking you can change the URL using .htaccess (mod_rewrite) alone? But no, that is not how this works.
RewriteRule ^name\/([a-z]+).html$ /$1 [L,NC,R]
You mention "rewrite", but this directive is in fact a "redirect" (as indicated by the R flag). Specifically, a 302 (temporary) redirect in this instance.
(You might actually want to implement a redirect like this, if you are changing an existing URL structure, but more on that later*1)
A URL "rewrite" is entirely internal to the server. The user only sees the public URL, they do not see the URL that it might be rewritten to. You (the user) can't "see" a rewrite.
A "redirect" on the other hand, usually refers to an external redirect, ie. a 3xx response sent back to the client with an instruction to make a new request to a different URL. The URL being redirected to is visible to the user. This is used when content has moved to a different URL.
So, following your example, you should be requesting/linking to (in your HTML source) the short/friendly URL /billy and internally rewriting the request to /name/billy.html that actually handles the request.
For example:
RewriteEngine On
# Rewrite from "billy" to "name/billy.html"
RewriteRule ^[a-z]+$ name/$0.html [NC,L]
The $0 backreference contains the entire URL-path that is matched by the RewriteRule pattern.
Only use the NC flag if you do need to match uppercase letters as well. But a request for /Billy won't serve /name/billy.html on a case-sensitive OS.
And that's really it, with regards to the URL-rewritting, you can stop reading here.
*1 Redirect from old to new
Regarding the external "redirect" mentioned above. You might choose to implement a redirect (in the opposite direction) if you are changing an existing URL structure and the old URLs have been indexed by search engines and/or linked (or bookmarked) to by external third parties - in order to preserve SEO and keep users happy.
For example, say your original URLs were of the form /name/billy.html and you later decided to change your URLs to /billy instead. You first change the URLs in the HTML source and implement the "rewrite" as mentioned above so the new URLs now work. You then might implement an external redirect from the old /name/billy.html URL to the new /billy URL.
For this, you would use a directive like you had initially, except you have to be careful of redirect-loops because you are already rewriting the request in the opposite directive. You only want to redirect "direct/initial" requests and not rewritten requests by the earlier rewrite (that is actually later in the file). An easy way to check for "direct" requests is to check against the REDIRECT_STATUS environment variable, which is empty on the initial request and set to 200 (as in 200 OK status) when the request is rewritten.
For example, the following "redirect" would go before the above "rewrite", immediately after the RewriteEngine directive:
# Redirect from "name/billy.html" to "/billy"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^name/([a-z]+)\.html$ /$1 [R=301,NC,L]
This should ultimately be a 301 (permanent) redirect since the URL has presumably changed permanently. However, always test with 302 (temporary) redirects to avoid potential caching issues.
Further reading:
Reference: mod_rewrite, URL rewriting and "pretty links" explained

Apache 301 redirect with get parameters

I am trying to do a 301 redirect with lightspeed webserver htaccess with no luck.
I need to do a url to url redirect without any related parameters.
for example:
from: http://www.example.com/?cat=123
to: http://www.example.com/some_url
I have tried:
RewriteRule http://www.example.com/?cat=123 http://www.example.com/some_url/ [R=301,L,NC]
Any help will be appreciated.
Thanks for adding your code to your question. Once more we see how important that is:
your issue is that a RewriteRule does not operate on URLs, but on paths. So you need something like that instead:
RewriteEngine on
RewriteRule ^/?$ /some_url/ [R=301,L,NC,QSD]
From your question it is not clear if you want to ignore any GET parameters or if you only want to redirect if certain parameters are set. So here is a variant that will only get applied if some parameter is actually set in the request:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)cat=123(?:&|$)
RewriteRule ^/?$ /some_url/ [R=301,L,NC,QSD]
Another thing that does not really get clear is if you want all URLs below http://www.example.com/ (so below the path /) to be rewritten, or only that exact URL. If you want to keep any potential further path component of a request and still rewrite (for example http://www.example.com/foo => http://www.example.com/some_url/foo), then you need to add a capture in your regular expression and reuse the captured path components:
RewriteEngine on
RewriteRule ^/?(.*)$ /some_url/$1 [R=301,L,NC,QSD]
For either of this to work you need to have the interpretation of .htaccess style files enabled by means of the AllowOverride command. See the official documentation of the rewriting module for details. And you have to take care that that -htaccess style file is actually readable by the http server process and that it is located right inside the http hosts DOCUMENT_ROOT folder in the local file system.
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using .htaccess style files. Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).

Does REQUEST_URI hide or ignore some filenames in .htaccess?

I'm having some difficulty with a super simple htaccess redirect.
All I want to do is rewrite absolutely everything, except a couple files.
htaccess looks like this:
RewriteEngine On
RewriteCond %{REQUEST_URI} !sitemap
RewriteCond %{REQUEST_URI} !robots
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The part that works is that everything gets redirected to new domain as it should be. And I can also access robots.txt without being forwarded, but not with sitemap.xml. If I try to go to sitemap.xml, the domain forwards along anyway and opens the sitemap file on the new domain.
I have this exact same issue when trying to "ignore" index.html. I can ignore robots, I can ignore alternate html or php files, but if I want to ignore index.html, the regex fails.
Since I can't actually SEE what is in the REQUEST_URI variable, my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI? I know this because of a stupid test. If I choose to ignore index.html like this:
RewriteCond %{REQUEST_URI} !index.html
Then if I type example.com/index.html I will be forwarded. But if I just type example.com/ the ignore actually works and it shows the content of index.html without forwarding!
How is it that when I choose to ignore the regex "index.html", it only works when "index.html" is not actually typed in the address bar!?!
And it gets even weirder! Should I type something like example.com/index.html?option=value, then the ignore rule works and I do NOT get forwarded when there are attributes like this. But index.html by itself doesn't work, and then just having the slash root, the rule works again.
I'm completely confused! Why does it seem like REQUEST_URI is not able to see some filenames like index.html and sitemap.xml? I've been Googling for 2 days and not only can I not find out if this is true, but I can't seem to find any websites which actually give examples of what these htaccess server variables actually contain!
Thanks!
my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI?
This is not true. There is no such special treatment of any requested URL. The REQUEST_URI server variable contains the URL-path (only) of the request. This notably excludes the scheme + hostname and any query string (which are available in their own variables).
However, if there are any other mod_rewrite directives that precede this (including the server config) that rewrite the URL then the REQUEST_URI server variable is also updated to reflect the rewritten URL.
index.html (Directory Index)
index.html is possibly a special case. Although, if you are explicitly requesting index.html as part of the URL itself (as you appear to be doing) then this does not apply.
If, on the other hand, you are requesting a directory, eg. http://example.com/subdir/ and relying on mod_dir issuing an internal subrequest for the directory index (ie. index.html), then the REQUEST_URI variable may or may not contain index.html - depending on the version of Apache (2.2 vs 2.4) you are on. On Apache 2.2 mod_dir executes first, so you would need to check for /subdir/index.html. However, on Apache 2.4, mod_rewrite executes first, so you simply check for the requested URL: /subdir/. It's safer to check for both, particularly if you have other rewrites and there is possibility of a second pass through the rewrite engine.
Caching problems
However, the most probable cause in this scenario is simply a caching issue. If the 301 redirect has previously been in place without these exceptions then it's possible these redirections have been cached by the browser. 301 (permanent) redirects are cached persistently by the browser and can cause issues with testing (as well as your users that also have these redirects cached - there is little you can do about that unfortunately).
RewriteCond %{REQUEST_URI} !(sitemap|index|alternate|alt) [NC]
RewriteRule .* alternate.html [R,L]
The example you presented in comments further suggests a caching issue, since you are now getting different results for sitemap than those posted in your question. (It appears to be working as intended in your second example).
Examining Apache server variables
#zzzaaabbb mentioned one method to examine the value of the Apache server variable. (Note that the Apache server variable REQUEST_URI is different to the PHP variable of the same name.) You can also assign the value of an Apache server variable to an environment variable, which is then readable in your application code.
For example:
RewriteRule ^ - [E=APACHE_REQUEST_URI:%{REQUEST_URI}]
You can then examine the value of the APACHE_REQUEST_URI environment variable in your server-side code. Note that if you have any other rewrites that result in the rewritting process to start over then you could get multiple env vars, each prefixed with REDIRECT_.
With the index.html problem, you probably just need to escape the dot (index\.html). You are in the regex pattern-matching area on the right-hand side of RewriteCond. With the un-escaped dot in there, there would need to be a character at that spot in the request, to match, and there isn't, so you're not matching and are getting the unwanted forward.
For the sitemap not matching problem, you could check to see what REQUEST_URI actually contains, by just creating an empty dummy file (to avoid 404 throwing) and then do a redirect at top of .htaccess. Then, in browser URL, type in anything you want to see the REQUEST_URI for -- it will show in address bar.
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
Credit MrWhite with that easy test method.
Hopefully that will show that sitemap in URL ends up as something else, so will at least partially explain why it's not pattern-matching and preventing redirect, when it should be pattern-matching and preventing redirect.
I would also test by being sure that the server isn't stepping in front of things with custom 301 directive that for whatever reason makes sitemap behave unexpectedly. Put this at the top of your .htaccess for that test.
ErrorDocument 301 default

apache rewrite map redirect to 404

My Situation:
I implemented an apache Rewrite Map to redirect incoming requests based on a database
RewriteEngine On
RewriteMap dbapp prg:/usr/local/somewhere/dbapp.rb
RewriteRule ^/(pattern)$ ${dbapp:$1} [R]
So far everything works fine, but I want to decide in the dbapp.rb script weather to redirect or give the client a http-status-code-404. I could just deliver a local page that doesn't exist but that doesn't seem right. I also want this to be usable on any server, and redirecting to "localhost" is also not an option ;-)
You could return -, which essentially means: 'no rewrite', but I don't know whether that's supported in a maps/[R] combination. Better may be to check with RewriteCond ${dbapp:$1} !^$ or something that it doesn't contain an empty string.