Rewrite rule and the_request - apache

How to rewrite search/2 from index.php?search="x"&&searc_by="y"&page_no=2?
If I am not wrong %REQUEST_URI is search/2, right? Also what is %THE_REQUEST in this case.
The page where search/2 link is located is rewritten as just home_page.

%{REQUEST_URI} and %{THE_REQUEST} are variables in mod_rewrite. These variables contain the following:
%{REQUEST_URI} will contain everything behind the hostname and before the query string. In the url http://www.example.com/its/a/scary/polarbear?truth=false, %{REQUEST_URI} would contain /its/a/scary/polarbear. This variable updates after every rewrite.
%{THE_REQUEST} is a variable that contains the entire request as it was made to the server. This is something in the form of GET /its/a/scary/polarbear?truth=false HTTP/1.1. Since the request that was made to the server is static in the lifespan of one such request, this variable does not change when a rewrite is made. It is therefore helpful in certain situations where you only want to rewrite if an external request contained something. It is often used to prevent infinite loops from happening.
A complete list of variables can be found here.
In your case you will have a link to search/2?search=x&search_by=y. You want to internally rewrite this to index.php?search=x&search_by=y&page_no=2. You can do this with the following rule:
RewriteRule ^search/([0-9]+)$ /index.php?page_no=$1 [QSA,L]
The first argument matches the external request that comes in. It is then rewritten to /index.php?page_no=2. The QSA (query string append) flag appends the existing query string to the rewritten query string. You end up with /index.php?search=x&search_by=y&page_no=2. The L flag stops this 'round' of rewriting. It's just an optimalization thing.

Related

htaccess url redirect with get parameters ID and reduce value

I want to do an url redirect to a new domain by retrieving the ID parameter but only taking the first 4 characters. Anyone know how to do this?
For example, an original url:
http://www.original.example/see/news/actualite.php?newsId=be9e836&newsTitle="blablabla"
To :
https://www.new.example/actualites/be9e
I have tested :
RewriteCond %{QUERY_STRING} ^newsId=(.*)$ [NC]
RewriteRule ^$ https://www.new.example/actualites/%1? [NC,L,R]
RewriteCond %{QUERY_STRING} ^newsId=(.*)$ [NC]
RewriteRule ^$ https://www.new.example/actualites/%1? [NC,L,R]
There are a couple of problems with this:
The regex ^$ in the RewriteRule pattern only matches the document root. The URL in your example is /see/news/actualite.php - so this rule will never match (and the conditions are never processed).
The regex ^newsId=(.*)$ is capturing everything after newsId=, including any additional URL parameters. You only need the first 4 characters of this particular URL param.
As an aside, your existing condition is dependent on newsId being the first URL parameter. Maybe this is always the case, maybe not. But it is relatively trivial to check for this URL parameter, regardless of order.
Also, do you need a case-insensitive match? Or is it always newsId as stated in your example. Only use the NC flag if this is necessary, not as a default.
Try the following instead:
RewriteCond %{QUERY_STRING} (?:^|&)newsId=([^&]{4})
RewriteRule ^see/news/actualite\.php$ https://www.new.example/actualites/%1 [QSD,R,L]
The %1 backreference now contains just the first 4 characters of the newsId URL parameter value (ie. non & characters), as denoted by the regex ([^&]{4}).
The QSD flag (Apache 2.4) discards the original query string from teh redirect response. No need to append the substitution string with ? (an empty query string), as would have been required in earlier versions of Apache.
UPDATE:
I have an anchor link (#) which is added at the end of the link, is there a possibility of deleting it to make a clean link? Example, currently I have: https://www.new.example/news/4565/#title Ideally : https://www.new.example/news/4565
The "problem" here is that the browser manages the "fragment identifier" (fragid) (ie. the "anchor link (#)") and preserves this through the redirect. In other words, the browser re-appends the fragid to the redirect response from the server. The fragid is never sent to the server, so we cannot detect this server side prior to issuing the HTTP redirect.
The only thing we can do is to append an empty fragid (ie. a trailing #) in the hope that the browser discards the original fragment. Unfortunately, you will likely end up with a trailing # on your redirected URLs (browser dependent).
For example (simplified):
:
RewriteRule .... https://example.com/# [R=301,NE,L]
Note that you will need the NE flag here to prevent Apache from URL-encoding the # in the redirect response.
Like I say above, browsers might handle this differently.
Further reading:
URL Fragment and 302 redirects
redirect is keeping hash
How to clear fragment identifier on 302 redirect?

Mod_rewrite rules not working in .htaccess to change the URL

I'm trying to rewrite the below URL but the URLs just don't change, no errors.
Current URL:
https://example.com/test/news/?c=value1&s=value2&id=9876
Expected URL:
https://example.com/test/news/value1/value2
My .htaccess
RewriteEngine On
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
but I've seen many articles where a url such as example.com/display_article.php?articleId=my-article can be rewritten as example.com/articles/my-article for example with .htaccess
But the important point here (that I think you are missing) is that the URL must already have been changed internally in your application - in all your internal links. It is a common misconception that .htaccess alone can be used to change the format of the URL. Whilst .htaccess is an important part of this, it is only part of it.
Yes, you can implement a redirect in .htaccess to redirect from the old to new URL - and this is essential to preserve SEO (see below), but it is not critical to your application working. If you don't first change the URL in your internal links then:
The "old" URL is still exposed in the HTML source. When a user hovers over or copies the link, they are seeing and copying the "old" URL.
Every time a user clicks one of your internal links they are externally redirected to the "new" URL. This is slow for your users, bad for SEO (you should never link to a URL that is redirected) and bad for your server, as it potentially doubles the number of requests hitting your server (OK, 301s are cached locally).
To quote from #IMSoP's answer to this reference question on the subject:
Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly
So, once you have changed your internal links to the "new" (expected) format, eg. /test/news/value1/value2 (or should that be /test/news/value1/value2/id or even /test/news/id/value1/value2? See below), then you can do as follows...
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
This internally rewrites a request from /test/news/<value1>/<value2> to /test/news/?c=<value1>&s=<value2>&id=1. However, there are a couple of issues with this:
/test/news/ is not itself a valid endpoint. This requires further rewriting. Perhaps you are serving a DirectoryIndex document (eg. index.php)? This might appear seamless to you, but this requires an additional internal subrequest and makes the rule dependent on other elements of the config. You should rewrite directly to the file that handles the request. eg. /test/news/index.php?c=<value1>&s=<value2>&id=1 (remember, this is entirely hidden from the user).
You are hardcoding the id=1 parameter? Should every URL have the same id? Or should this be passed in the "new" URL (which is what I would expect)? What does the id represent? If this is critical to the routing of the URL then the id should appear earlier in the URL-path, in case the URL gets accidentally truncated when copy/pasted/shared.
If the id is required then it needs to be passed in the "new" URL. We only have the "new" URL to route the request, so the information can't be hidden.
So, if the "new" URL is now /test/news/<id>/<value1>/<value2> then the rewrite would need to be like this instead:
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]
Then (optionally*1) you can implement an external redirect in order to preserve SEO. This is for search engines that have indexed the "old" URLs or third party inbound links that cannot be updated - these need to be corrected to inform search engines of the change and get the user on the "new" canonical URL having followed an out-of-date inbound link.
(*1 It's not "optional" if you are changing an existing URL, but optional with regards to your application being functional.)
This "redirect" goes before the above rewrite:
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
The $0 backreference contains the full match from the RewriteRule pattern, ie. test/news/ in this case - this simply saves repetition.
The %1, %2 and %3 backreferences contain the values captured from the preceding condition. ie. the values of the c, s and id URL parameters respectively.
Note that the URL parameters / path segments should not be optional as in your original directive (ie. ([^/]*)). If they are optional and they are omitted, then the resulting URL becomes ambiguous. eg. <value2> becomes <value1> if <value1> is omitted.
Note that the URL parameters must be in the order as stated. If you have a mismatch of "old" URLs with these params in a different order (or even intermixed with other params) then this can be accounted for with additional complexity. (It may be easier to perform this redirect in your server-side script, instead of .htaccess.)
The first condition that checks against the REDIRECT_STATUS environment variable ensures that we only redirect direct requests and not rewritten requests by the later rewrite (which would otherwise result in a redirect loop). An alternative on Apache 2.4 is to use the END flag on the RewriteRule instead.
The QSD flag (Apache 2.4) discards the original query string from the request.
You should test first with a 302 (temporary) redirect to avoid potential caching issues and only change to a 301 (permanent) redirect once you have tested that everything works as intended. 301s are cached persistently by the browser so can make testing problematic.
Summary
Your complete .htaccess file should look something like this:
Options -MultiViews +FollowSymLinks
# If relying on the DirectoryIndex to handle the request
DirectoryIndex index.php
RewriteEngine On
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]

Best practice for a .htaccess internal path rewrite?

We have spend a considerable amount of time looking for a solution else where. We have read and tried the recommended threads. We most likely have a core misunderstanding as to why this, or something along these lines, does not work.
We get a request for a domain:
subdomain.domain.com/embed/34acb453bc4a53abc
We want to leave the URL as it is, but need to direct this to an internal vhost:
embed.example.com/34acb453bc4a53abc
Once the request is directed to this, our system can interpret the 34acb453bc4a53abc and return the appropriate data.
We tried the following (and variations of it) we just get nothing to work.
RewriteCond ^embed\/(.*)$ [NC]
RewriteRule ^ https://embed.example.com%{REQUEST_URI} [L,NE,P]
internal path rewrite
Just to clarify, you can't internally rewrite the request across different hosts. You need to configure a reverse proxy using mod_proxy and related modules. This is what the P flag on the RewriteRule directive is doing... it's passing the request to mod_proxy (providing this is already correctly configured in the server config).
RewriteCond ^embed\/(.*)$ [NC]
RewriteRule ^ https://embed.example.com%{REQUEST_URI} [L,NE,P]
However, this will send the request to https://embed.example.com/embed/34acb453bc4a53abc, not https://embed.example.com/34acb453bc4a53abc as you require.
You need to capture the part of the URL-path after /embded/ and use that instead. You are already capturing this in the RewriteCond directive, but you are not using it. You don't actually need the RewriteCond directive here.
Try the following instead:
RewriteCond %{HTTP_HOST} =subdomain.domain.com
RewriteRule ^embed/([a-z0-9]+)$ https://embed.example.com/$1 [P]
You state that the request is for subdomain.domain.com, so I've included that in the directive.
The L and NE flags are not required here. P implies L and there is nothing that requires the substitution to not be URL encoded. Slashes do not carry any special meaning in the regex, so do not need to be escaped.
I've also made the regex that matches the "code" more restrictive, rather than matching literally anything.
The $1 backreference then matches just the "code" that follows /embed/ in the URL-path.
Note that the order of directives is important. It needs to be before any directives that are likely to result in a conflict.
If the embed and subdomain hosts point to the same place on the filesystem then you can avoid the complexities and overhead of mod_proxy and simply "rewrite" the request on the same host.

What is "RewriteRule ^.* /sitename:::144.html? [L,R=301]" actually doing?

What the effect of the following code is?
RewriteRule ^.* /site:::144.html? [L,R=301]
I couldn't find any matching entries in Google explaning the same.
It matches any request ^.* (“starts with any number of any arbitrary character”), and redirects it to /site:::144.html.
The question mark at the end of the target means any existing query string of the original request will be discarded.
The L flag means this will be the last rule interpreted in the current round of rewriting (when configured in .htaccess, the rewriting process “loops”, until no more rules match the current internal request),
and R=301 means it will use status code 301 for a permanent redirect.

Apache rewrite rule for optional query parameter

Even after lengthy perusal, the Apache rewriterule documentation continues to confound me.
Currently I am using the following .htaccess URL rewrite rule:
RewriteRule ^([_0-9a-zA-Z-]+)\.html$ index.php?a=p&p=$1 [nc]
This rewrites something like
http://www.website.com/thepagename.html
into
http://www.website.com/index.php?a=p&p=thepagename
which works fine.
Now I need to modify this to allow for an optional query parameter that may or nay not be tacked on to the original (unrewritten) URL. E.g.:
http://www.website.com/thepagename.html?req=login
or even
http://www.website.com/thepagename.html?req=login&usr=johndoe
must be rewritten into:
http://www.website.com/index.php?a=p&p=thepagename&req=login
and
http://www.website.com/index.php?a=p&p=thepagename&req=login&user=johndoe
respectively, without breaking the original rewrite (i.e. without the optional query parameters tacked onto the unrewritten URL).
Try as I might, I cannot work out the correct syntax. Can anyone point me in the right direction?
Tnx!
// FvW
You only have to add ˋQSA` flag (Query String Append)
RewriteRule ^([_0-9a-zA-Z-]+)\.html$ index.php?a=p&p=$1 [L,QSA,NC]
More info (and examples) here