mod-rewrite to remove tracking code from the end of urls - apache

How do I write a mod-rewrite to remove an old tracking code after image urls. I would like to send requests for
www.myurl.com/blah/image.jpg%12345
to
www.myurl.com/blah/image.jpg
The %12345 tracking code is always the same.

The %12345 tracking code is always the same.
At the start of that string %12 is urlencoded as an unprintable character, but mod_rewrite treats it like a _. So you would have to inspect the REQUEST_URI for _345 and strip it out accordingly.
%3F345 is used in the URL.
If the tracking code is %3F345, then the %3F is urlencoded as a ? and should be detected as a query string. However, the mod_rewrite doesn't catch this it seems, so I used two checks for your case - one for ? and one for %3F. This will work if the ? is encoded or not:
RewriteEngine On
RewriteCond %{QUERY_STRING} ^345 [OR]
RewriteCond %{THE_REQUEST} \%3F345 [NC]
RewriteRule ^(.*)$ %{REQUEST_URI}? [R=301,L]
Inputs:
http://www.myurl.com/blah/image.jpg?345
http://www.myurl.com/blah/image.jpg%3F345
http://www.myurl.com/blah/image.jpg%3F345&param=value
Rewrite:
http://www.myurl.com/blah/image.jpg
NOTE: You cannot experiment with it here because %{THE_REQUEST} is not supported. I tested this on one of my live servers to verify it works.

Related

htaccess Rewrite rule from "example.com/de/foo" to "example.de/foo"

RewriteRules can be such a pain for me, I cannot get this one to work.
I have to redirect urls like example.com/de/anypath to example.de/anypath.
[anypath] can be really any path, as I have to get it work for
example.com/de/articles/programming/hello-world (would be redirected to example.de/articles/programming/hello-world)
as well as for example.com/de/events/pic-nic (would be redirected to example.de/events/pic-nic).
This is what I wrote so far :
RewriteRule "^/de/(.*)$" "http://example.de/$1" [R=301,NC,L]
I also tried with RewriteCond with no more luck
RewriteCond %{HTTP_HOST} http://example.com/de
I am working with xampp, but tested on my web server with same
result.
I know this .htaccess file is working (get error if I enter a
typo)
I got some result when testing with something like :
RewriteRule "^(.*)$" "https://google.com" [R=301,NC,L]
Any help would be appreciated !
RewriteRule "^/de/(.*)$" "http://example.de/$1" [R=301,NC,L]
In .htaccess, the URL-path matched by the RewriteRule pattern does not start with a slash. ie. It should be "^de/(.*)$", not "^/de/(.*)$".
You don't need the double quotes and the NC flag is probably redundant, unless you also need to match dE, Ed or DE.
For example (near the top of the root .htaccess file):
RewriteRule ^de/(.*) http://example.de/$1 [R=301,L]
(HTTP, not HTTPS?!)
The trailing $ on the RewriteRule pattern was also redundant.
Test first with 302 (temp) redirect to avoid potential caching issues.
RewriteCond %{HTTP_HOST} http://example.com/de
The Host HTTP request header (ie. the value of the HTTP_HOST server variable) contains the hostname only. eg. example.com only in your example.
Any server variable that is prefixed with HTTP_ refers to the HTTP request header of the same name.
I got some result when testing with something like :
RewriteRule "^(.*)$" "https://google.com" [R=301,NC,L]
Careful with testing 301s since they are cached persistently by the browser. You will need to clear your browser cache before testing!
I added two conditions (the first is to apply according to local or live site, the second to leave the node paths unchanged) :
RewriteCond %{HTTP_HOST} local.example.com
RewriteCond %{REQUEST_URI} !^.*/de/node/.*$
And it is working as expected. Thanks again.

htaccess Redirect URL with GET Parameters

I have a URL that is in the format http://www.example.com/?s=query
I want to redirect this URL to http://www.example.com/search/query
I have the following .htaccess but I wanted to check if there is anything wrong with this. My RewriteRule looks a little wonky and I don't know if it will cause problems for other URLs.
RewriteEngine on
RewriteCond %{QUERY_STRING} ^s=(.*)$ [NC]
RewriteRule ^$ /search/%1? [NC,L,R]
I ran a test Here and it seems to redirect to the correct URL.
RewriteCond %{QUERY_STRING} ^s=(.*)$ [NC]
RewriteRule ^$ /search/%1? [NC,L,R]
You will likely need the NE (noescape) flag on the RewriteRule directive if you are receiving a %-encoded URL parameter value, otherwise the target URL will be doubly-encoded. The QUERY_STRING server variable is not decoded by Apache.
It also depends on how you are rewriting /search/query back to /?s=query (or presumably more like /index.php?s=query?) - presumably you are already doing this later in the config? You only want this redirect to apply to direct requests and not rewritten requests (otherwise you'll get a redirect loop). An easy way to ensure this is to check that the REDIRECT_STATUS env var is empty.
For example:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^s=(.*) [NC]
RewriteRule ^$ /search/%1 [NE,QSD,R,L]
Other points:
The QSD flag would be preferable (on Apache 2.4) to appending ? to the end of the susbtitution string in order to remove the query string.
The regex ^s=(.*) (the trailing $ was superfluous) does assume that s is the only URL parameter at the start of the query string. As it stands, everything is assumed to be part of this value. eg. s=foo&bar=1 will result in /search/foo&bar=1.
The NC flag on the RewriteRule directive is superfluous.
Should you also be checking for /index.php?s=<query>? (Or whatever file/DirectoryIndex is handling the request.)

htaccess: Can match one slash, but not double slashes

I am unable to write a rule that matches double slashes.
In my .htacess file:
#RULE 1:
RewriteCond %{REQUEST_URI} ^.*hi1.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
#RULE 2:
RewriteCond %{REQUEST_URI} ^.*hi2/.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
#RULE 3:
RewriteCond %{REQUEST_URI} ^.*hi3//.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
RESULTS:
https://www.example.com/hi1//
successfully redirects to google
https://www.example.com/hi2//
successfully redirects to google
https://www.example.com/hi3//
fails to redirect to google
Third url yields the following:
Sorry, this page doesn't exist.
Please check the URL or go back a page.
404 Error. Page Not Found.
EDIT # 1:
Interestingly:
#RULE 4:
RewriteCond %{REQUEST_URI} ^.*hi4/.*/.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
RESULTS:
https://www.example.com/hi4/abc/
successfully redirects to google
https://www.example.com/hi4//
fails to redirect to google
EDIT # 2:
My original post seems to have created confusion. I will try to be clearer: I need a rule that will match a url ending in double slash, and will not match a url that does not end in double slash. Currently, my .htaccess file contains only the following:
RewriteEngine on
RewriteRule yoyo https://www.cnn.com/ [R=301,L]
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Results:
https://www.example.com/about-us//
fails to redirect to google, and yields 404 error
(The first rule (yoyo) is only to ensure no caching.)
EDIT # 3:
I see that the confusion continues. So, my .htaccess file contains only:
RewriteEngine on
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Results:
https://www.example.com/about-us//
fails to redirect to google, and yields 404 error
This time, I think we can rule out caching, because I used the .htaccss on a website of mine that previously had no .htaccess file.
Simply, my efforts to match a url ending with double-slash are failing.
You need not to write 3 rules when you could catch similar kind of URIs with regex patterns so that we need not to write multiple patterns, this also takes cares of multiple occurrences of / coming in the end. Could you please try following, please make sure you clear your browser cache after placing these rules into your htaccess file.
RewriteEngine ON
RewriteCond %{REQUEST_URI} ^/hi[0-9]+/{2,}?$ [NC]
RewriteRule ^(.*)$ https://www.google.com/ [R=301,L]
EDIT:
OK now I get it. Only match paths ending with two slashes.
I updated the answer. The request URI inside THE_REQUEST is not on the end, but is followed by a space and more after that, so matching //\s should work for you
AmitVerma mentioned the correct answer in his comment, but it is being snowed in by other comments. For all the other people like me who did not know about the THE_REQUEST parameter (thank you Amit) a more complete answer here.
The problem with the original rule is the use of the REQUEST_URI parameter. The value of this parameter will probably already have been cleaned by the webserver or other modules. Double slashes would have been removed.
The THE_REQUEST parameter contains the original unmodified request. Therefore the following will work as requested:
RewriteCond %{THE_REQUEST} //\s.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Regarding your updated question:
... I need a rule that will match a url ending in double slash
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Aside: Your previous rules matched a URL containing a double slash anywhere in the URL-path (which would naturally catch a double slash at the end as well).
However, the above will not match a URL that ends with a double slash. In fact, it will never match anything because THE_REQUEST does not only contain the URL. THE_REQUEST server variable contains the first line of the HTTP request headers. For example, when you request https://example.com/about-us//, THE_REQUEST will contain a string of the form:
GET /about-us// HTTP/1.1
So, you can see from the above that a regex like //$ will never match. You will need to use a condition of the form:
RewriteCond %{THE_REQUEST} //\s
To match two slashes followed by a space. Which could only occur at the end of URL. (Although it could also occur at the end of the query string, but cross that bridge when we come to it.)
However, since the other suggestions (eg. ^.*hi3//.*$) don't appear to have worked, then this is not going to work either.
You need to clear your browser cache before testing and please test with 302 (temporary) redirects, otherwise, you can easily go round in circles chasing caching issues. You should also test this with the Browser "Inspector" open on the "Network" tab and check the "Disable cache" option. For example, in Chrome:
(UPDATE) Debugging...
This does not seem to be a question about regex, as the earlier answers/comments (and code snippets in the question itself) should already have produced the desired results. So "something else" would seem to be going on here.
To debug and see the value of THE_REQUEST, you can do something like the following (at the very top of your .htaccess file):
RewriteCond %{QUERY_STRING} !^the-request=
RewriteRule ^ /?the-request=%{THE_REQUEST} [R,L]
And then request /about-us//. You should then be redirected to a URL of the form:
/?the-request=GET%20/about-us//%20HTTP/1.1
(Where the %20 are naturally the URL encoded spaces.)
Please report back exactly what you are seeing.
Here's what finally worked to match double slashes (nothing else worked for me):
RewriteEngine on
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
(And, as I wrote, I was careful to prevent caching, so caching never was an issue.)
PLOT TWIST:
Even this solution, which is the only solution that works on one of my websites, does not work on the website I have been testing on for most of this discussion. In other words, there is not one single solution for matching double-slash on that server!

mod_rewrite: encode only certain matches in URL

I am trying to rewrite a URL using mod_rewrite and encode a substring in my URL which is between brackets. My URL:
http://localhost/something?var_a=A&var_b=(B&2/3&)&var_c=C
and my .htaccess:
RewriteEngine On
RewriteCond %{QUERY_STRING} (.+)\((.*?)\)(.+)
RewriteRule ^.*$ somedir/%1%2%3? [R,B]
so I capture three strings, anything before the brackets, anything within, and anything after.
Result is:
somedir/var_a%3dA%26var_b%3dB%262%2f3%26%26var_c%3dC
but I would like to encode only the text which was within the brackets of my initial URL, such that
somedir/var_a=A&var_b=dB%262%2f3%26&var_c=C
The problem seems to be that the [B] option decodes the whole string. Is there a way to do this selectively? Also, my solution could only capture an occurrence of brackets once, it would be nice to have this more generic; could someone give me a hint?
Note that this question is related to my previous one, where I was trying to capture text between brackets.
This is extremely tricky for mod_rewrite but I took a shot at it. Solution is not pretty as it involves 2 redirects and use of cookies, but it works.
RewriteEngine On
# store non bracket query string in cookie while redirecting value in brackets using B flag
RewriteCond %{QUERY_STRING} ^(.+?&)?(var_b=)\(([^)]*)\)(.*)$ [NC]
RewriteRule !^somedir/ /somedir/%3? [L,CO=QS:%1-%2-%4:%{HTTP_HOST},B,R]
# retrieve value from cookie and use it to construct full URL
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTP_COOKIE} QS=([^-]*)-([^-]+)-([^;]+) [NC]
RewriteRule ^(somedir)/(.*)$ /$1/%1%2$2%3? [L,R,NE]
Using these rules when I visit this URL: http://localhost/something?var_a=A&var_b=(B&2/3&)&var_c=C
it gets redirected to: http://localhost/somedir/var_a=A&var_b=B%262%2f3%26&var_c=C

How do I redirect a specific URL pattern when Drupal Clean URLs are on?

I have a Drupal 5.23 installation using clean URLs with Apache and the mod_rewrite module. I am using an .htaccess file for the clean URLs functionality with the following configuration:
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
</IfModule>
I am going to be disabling the Localization/Internationalization plugins on the website, which is going to change every single page's URL on the website from http://www.example.com/en/url-to-a-page to http://www.example.com/url-to-a-page (the /en portion is being stripped out).
I would like to add a mod_rewrite rule to give an HTTP 301 Redirect response for any incoming URLs with the /en portion in the URL so they are directed to the correct page.
I've tried adding the following lines to my .htaccess file both above and below the existing rules, but in both cases visiting a page with /en results in an HTTP 404 Not Found response:
RewriteRule ^en/(.+)$ http://www.example.com/$1 [R=301]
If I comment out the existing rules, my rule works just fine. I've also tried to add a condition to the rule, but this doesn't appear to have an effect either:
RewriteCond %{REQUEST_URI} =/en/*
This came up for me when writing all of my custom redirects, and it turns out the solution was to add an "L" to the redirect line. Give the following at try:
RewriteRule ^en/(.+)$ http://www.example.com/$1 [L,R=301]
Note the "L" near the end of the line. That, according to the Apache RewriteRule docs, means "Stop the rewriting process here and don't apply any more rewrite rules".
In addition to what sillygwailo suggest, I'd recommend you to make sure that your RewriteCond (needed, I think) actually matches..
from the apache docs:
=CondPattern' (lexicographically equal)
Treats the CondPattern as a plain string and compares it lexicographically to TestString. True if TestString is lexicographically equal to CondPattern (the two strings are exactly equal, character for character). If CondPattern is "" (two quotation marks) this compares TestString to the empty string.
So, It could possibly match only an URL containing an actual '*'..? Not sure, but you could also try this:
RewriteCond %{REQUEST_URI} ^/en/.*