mod_rewrite: encode only certain matches in URL - apache

I am trying to rewrite a URL using mod_rewrite and encode a substring in my URL which is between brackets. My URL:
http://localhost/something?var_a=A&var_b=(B&2/3&)&var_c=C
and my .htaccess:
RewriteEngine On
RewriteCond %{QUERY_STRING} (.+)\((.*?)\)(.+)
RewriteRule ^.*$ somedir/%1%2%3? [R,B]
so I capture three strings, anything before the brackets, anything within, and anything after.
Result is:
somedir/var_a%3dA%26var_b%3dB%262%2f3%26%26var_c%3dC
but I would like to encode only the text which was within the brackets of my initial URL, such that
somedir/var_a=A&var_b=dB%262%2f3%26&var_c=C
The problem seems to be that the [B] option decodes the whole string. Is there a way to do this selectively? Also, my solution could only capture an occurrence of brackets once, it would be nice to have this more generic; could someone give me a hint?
Note that this question is related to my previous one, where I was trying to capture text between brackets.

This is extremely tricky for mod_rewrite but I took a shot at it. Solution is not pretty as it involves 2 redirects and use of cookies, but it works.
RewriteEngine On
# store non bracket query string in cookie while redirecting value in brackets using B flag
RewriteCond %{QUERY_STRING} ^(.+?&)?(var_b=)\(([^)]*)\)(.*)$ [NC]
RewriteRule !^somedir/ /somedir/%3? [L,CO=QS:%1-%2-%4:%{HTTP_HOST},B,R]
# retrieve value from cookie and use it to construct full URL
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTP_COOKIE} QS=([^-]*)-([^-]+)-([^;]+) [NC]
RewriteRule ^(somedir)/(.*)$ /$1/%1%2$2%3? [L,R,NE]
Using these rules when I visit this URL: http://localhost/something?var_a=A&var_b=(B&2/3&)&var_c=C
it gets redirected to: http://localhost/somedir/var_a=A&var_b=B%262%2f3%26&var_c=C

Related

htaccess: Can match one slash, but not double slashes

I am unable to write a rule that matches double slashes.
In my .htacess file:
#RULE 1:
RewriteCond %{REQUEST_URI} ^.*hi1.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
#RULE 2:
RewriteCond %{REQUEST_URI} ^.*hi2/.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
#RULE 3:
RewriteCond %{REQUEST_URI} ^.*hi3//.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
RESULTS:
https://www.example.com/hi1//
successfully redirects to google
https://www.example.com/hi2//
successfully redirects to google
https://www.example.com/hi3//
fails to redirect to google
Third url yields the following:
Sorry, this page doesn't exist.
Please check the URL or go back a page.
404 Error. Page Not Found.
EDIT # 1:
Interestingly:
#RULE 4:
RewriteCond %{REQUEST_URI} ^.*hi4/.*/.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
RESULTS:
https://www.example.com/hi4/abc/
successfully redirects to google
https://www.example.com/hi4//
fails to redirect to google
EDIT # 2:
My original post seems to have created confusion. I will try to be clearer: I need a rule that will match a url ending in double slash, and will not match a url that does not end in double slash. Currently, my .htaccess file contains only the following:
RewriteEngine on
RewriteRule yoyo https://www.cnn.com/ [R=301,L]
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Results:
https://www.example.com/about-us//
fails to redirect to google, and yields 404 error
(The first rule (yoyo) is only to ensure no caching.)
EDIT # 3:
I see that the confusion continues. So, my .htaccess file contains only:
RewriteEngine on
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Results:
https://www.example.com/about-us//
fails to redirect to google, and yields 404 error
This time, I think we can rule out caching, because I used the .htaccss on a website of mine that previously had no .htaccess file.
Simply, my efforts to match a url ending with double-slash are failing.
You need not to write 3 rules when you could catch similar kind of URIs with regex patterns so that we need not to write multiple patterns, this also takes cares of multiple occurrences of / coming in the end. Could you please try following, please make sure you clear your browser cache after placing these rules into your htaccess file.
RewriteEngine ON
RewriteCond %{REQUEST_URI} ^/hi[0-9]+/{2,}?$ [NC]
RewriteRule ^(.*)$ https://www.google.com/ [R=301,L]
EDIT:
OK now I get it. Only match paths ending with two slashes.
I updated the answer. The request URI inside THE_REQUEST is not on the end, but is followed by a space and more after that, so matching //\s should work for you
AmitVerma mentioned the correct answer in his comment, but it is being snowed in by other comments. For all the other people like me who did not know about the THE_REQUEST parameter (thank you Amit) a more complete answer here.
The problem with the original rule is the use of the REQUEST_URI parameter. The value of this parameter will probably already have been cleaned by the webserver or other modules. Double slashes would have been removed.
The THE_REQUEST parameter contains the original unmodified request. Therefore the following will work as requested:
RewriteCond %{THE_REQUEST} //\s.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Regarding your updated question:
... I need a rule that will match a url ending in double slash
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Aside: Your previous rules matched a URL containing a double slash anywhere in the URL-path (which would naturally catch a double slash at the end as well).
However, the above will not match a URL that ends with a double slash. In fact, it will never match anything because THE_REQUEST does not only contain the URL. THE_REQUEST server variable contains the first line of the HTTP request headers. For example, when you request https://example.com/about-us//, THE_REQUEST will contain a string of the form:
GET /about-us// HTTP/1.1
So, you can see from the above that a regex like //$ will never match. You will need to use a condition of the form:
RewriteCond %{THE_REQUEST} //\s
To match two slashes followed by a space. Which could only occur at the end of URL. (Although it could also occur at the end of the query string, but cross that bridge when we come to it.)
However, since the other suggestions (eg. ^.*hi3//.*$) don't appear to have worked, then this is not going to work either.
You need to clear your browser cache before testing and please test with 302 (temporary) redirects, otherwise, you can easily go round in circles chasing caching issues. You should also test this with the Browser "Inspector" open on the "Network" tab and check the "Disable cache" option. For example, in Chrome:
(UPDATE) Debugging...
This does not seem to be a question about regex, as the earlier answers/comments (and code snippets in the question itself) should already have produced the desired results. So "something else" would seem to be going on here.
To debug and see the value of THE_REQUEST, you can do something like the following (at the very top of your .htaccess file):
RewriteCond %{QUERY_STRING} !^the-request=
RewriteRule ^ /?the-request=%{THE_REQUEST} [R,L]
And then request /about-us//. You should then be redirected to a URL of the form:
/?the-request=GET%20/about-us//%20HTTP/1.1
(Where the %20 are naturally the URL encoded spaces.)
Please report back exactly what you are seeing.
Here's what finally worked to match double slashes (nothing else worked for me):
RewriteEngine on
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
(And, as I wrote, I was careful to prevent caching, so caching never was an issue.)
PLOT TWIST:
Even this solution, which is the only solution that works on one of my websites, does not work on the website I have been testing on for most of this discussion. In other words, there is not one single solution for matching double-slash on that server!

mod-rewrite to remove tracking code from the end of urls

How do I write a mod-rewrite to remove an old tracking code after image urls. I would like to send requests for
www.myurl.com/blah/image.jpg%12345
to
www.myurl.com/blah/image.jpg
The %12345 tracking code is always the same.
The %12345 tracking code is always the same.
At the start of that string %12 is urlencoded as an unprintable character, but mod_rewrite treats it like a _. So you would have to inspect the REQUEST_URI for _345 and strip it out accordingly.
%3F345 is used in the URL.
If the tracking code is %3F345, then the %3F is urlencoded as a ? and should be detected as a query string. However, the mod_rewrite doesn't catch this it seems, so I used two checks for your case - one for ? and one for %3F. This will work if the ? is encoded or not:
RewriteEngine On
RewriteCond %{QUERY_STRING} ^345 [OR]
RewriteCond %{THE_REQUEST} \%3F345 [NC]
RewriteRule ^(.*)$ %{REQUEST_URI}? [R=301,L]
Inputs:
http://www.myurl.com/blah/image.jpg?345
http://www.myurl.com/blah/image.jpg%3F345
http://www.myurl.com/blah/image.jpg%3F345&param=value
Rewrite:
http://www.myurl.com/blah/image.jpg
NOTE: You cannot experiment with it here because %{THE_REQUEST} is not supported. I tested this on one of my live servers to verify it works.

Redirect rewrite rule, Adding parameter at the end of url

If /c/ is part of URL parameter then I want to add parameter at the end of URL parameter. because parameter may increase or decrease.
http://example.com/c/file.php?par1=val1&par2=val2
I need add two parameter &addpar1=val&addpar2=val at the end of URL like this.
http://example.com/c/file.php?par1=val1&par2=val2&addpar1=val&addpar2=val
What I am trying to do here:
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/c/(.*)$ [NC]
RewriteRule /c/ /%1 [QSA]
Please suggest me what should written in RewriteRule here.
Your rule is close, but you're not actually adding anything to the query string. Try:
RewriteEngine On
RewriteCond %{QUERY_STRING !&addpar1=val&addpar2=val
RewriteRule ^/?c/(.*)$ /%1?%{QUERY_STRING}&&addpar1=val&addpar2=val [L]
Here, you need to check that the parameters has already been added, then you add them to the end of the query string. You don't want the QSA flag here because you're manually doing the appending.
If you want to redirect the browser so that they see the query strings then you need an R or R=301 flag in the square brackets (separated by a comma).

Remove Page Number from URL with .htaccess

I need page numbers from URLs of the form:
http://mydomain.com/index.php?showtopic=XXXX&page=XXXX&#entryXXXX
so they become
http://mydomain.com/index.php?showtopic=XXXX&#entryXXXX
where XXXX are integers
I've previously tried:
RewriteEngine on
RewriteRule ^(.*)showtopic=([0-9]+)&page=([0-9]+)(.*) http://mydomain.com/index.php?showtopic=$1$3 [QSA,L,R=301]
but to no avail. So I shortened it to:
RewriteEngine on
RewriteRule ^(.*)&page=([0-9]+)(.*)$ $1&$3 [QSA,L,R=301]
but still nowt. Is there anything wrong with the regex at all?
You can't match against the query string in a rewrite rule, you need to match against the %{QUERY_STRING} var inside a rewrite condition:
RewriteEngine On
RewriteCond %{QUERY_STRING} ^showtopic=([^&]+)&page=([^&]+)(&.*)?$
RewriteRule ^index\.php$ /index.php?showtopic=%1%3 [L,R=301]
The #entryXXXX part of the URL is a fragment, and the server actually never sees that. It's a client/browser-side only thing. Hopefully, the browser is smart enough to re-append the fragment after getting redirected.

What RewriteRule would be to redirect based on the on query string parameters?

If requested page page1.html and in query string uin is anything but not 12 or 13, let them see this page1.html page, otherwise redirect them to page2.html
Update: BTW, there are also other params in the query string. They should be sent to either page too.
The Rewrite Condition you're looking for is %{QUERY_STRING}
Here's another SO question doing something similar: Redirecting URLs (with specific GET parameters)
This will redirect to page2.html if uin=12 or uin=13. The entire query string will be sent to the page2.html page:
# EDIT: Doesn't properly handle all cases
RewriteCond %{QUERY_STRING} [\&]+uin=1[23][&]+ [OR]
RewriteCond %{QUERY_STRING} ^uin=1[23][&]+
RewriteRule ^/page1\.html /page2.html [R]
EDIT: This is a lot better and will handle the parameter in any position in the query string, beginning or end, and will also account for filtering out cases where the string is within another parameter, like suin=123
RewriteCond %{QUERY_STRING} ^(.*&)*uin=1[23](&.*)*$
RewriteRule ^/page1\.html /page2.html [R]
I tested on the following cases:
Redirected:
http://local.sandbox.com/page1.html?hello=world&uin=13&test=1
http://local.sandbox.com/page1.html?uin=12&test=1
http://local.sandbox.com/page1.html?uin=12
http://local.sandbox.com/page1.html?uin=13
http://local.sandbox.com/page1.html?uin=13&t=t
http://local.sandbox.com/page1.html?t=t&r=r&uin=13&t=3
http://local.sandbox.com/page1.html?t=t&uin=13
Didn't redirect:
http://local.sandbox.com/page1.html?uin=11&test=1
http://local.sandbox.com/page1.html?hello=world&uin=1&test=1
http://local.sandbox.com/page1.html?hello=world&ui=13&test=1
http://local.sandbox.com/page1.html?t=t&&r=r&suin=13&t=3
http://local.sandbox.com/page1.html?t=t&&r=r&uin=134&t=3
http://local.sandbox.com/page1.html?suin=134&t=3
http://local.sandbox.com/page1.html?auin=13&t=t
http://local.sandbox.com/page1.html?uin=134&t=3
http://local.sandbox.com/page1.html?t=t&uin=134
http://local.sandbox.com/page1.html?t=t&auin=13