HTACCESS 301 : How to redirect all urls to HTTPS except spammy urls with a specific character? - apache

I posted a question one month ago with great answers (HTACCESS 403 : How to block URL with a specific character?) : HTACCESS 403 : How to block url with a specific character?
The problem is, I migrated my website HTTP to HTTPS and I would like to redirect all urls, except spammy urls whith a specific caracter that I would block with 410 code.
Exemple what I would like :
http://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html ==> 410 code, without 301 to HTTPS
http://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 ==> 410 code, without 301 to HTTPS
http://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 ==> 410 code, without 301 to HTTPS
Wrong, today, the spammy urls have a 301 code, and then a 410 code
http://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html ==> 301 to https://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html and then ==> 410.
http://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 ==> 301 to
https://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 and then ==> 410.
http://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 ==> 301 to
https://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 and then ==> 410.
I'm using these rules :
RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^.*$ https://www.%1%{REQUEST_URI} [L,NE,R=301]
RewriteEngine On
RewriteCond %{QUERY_STRING} ^vn/ [NC]
RewriteRule ^ - [R=410]
RewriteEngine On
RewriteCond %{THE_REQUEST} /webhook.php [NC]
RewriteRule ^ - [R=410]
RewriteEngine On
RewriteCond %{THE_REQUEST} /football.php [NC]
RewriteRule ^ - [R=410]
Do you have an idea to manage the 301 redirection except URLs with a specific character / string pages.

Just reverse the order of the rules, so your blocking directives are first (as they should be).
There is also no need to repeat the RewriteEngine directive.
Instead of using THE_REQUEST server variable (which is perhaps matching too much in the context you are using it), you should just use the RewriteRule pattern (or even combine the rules into one).
For example:
RewriteEngine On
# Blocking the following requests
RewriteCond %{QUERY_STRING} ^vn/ [NC]
RewriteRule ^ - [R=410]
RewriteRule /webhook\.php$ - [NC,R=410]
RewriteRule /football\.php$ - [NC,R=410]
# Canonical redirect
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [L,NE,R=301]
Note also that I simplified the regex ^.*$ in the last rule to just ^.
The 3 blocking rules can be combined into one (but does not really serve any benefit to do so). For example:
# Blocking the following requests (combined rule)
RewriteCond %{QUERY_STRING} ^vn/ [OR,NC]
RewriteCond %{REQUEST_URI} /webhook\.php$ [OR,NC]
RewriteCond %{REQUEST_URI} /football\.php$ [NC]
RewriteRule ^ - [G]
# Canonical redirect
:
NB: G (gone) is just shorthand for R=410.
As a general rule, the order of your directives should be:
Blocking directives
External redirects
Internal rewrites
Wrong, today, the spammy urls have a 301 code, and then a 410 code
Although this doesn't really matter, except that it potentially uses a minuscule amount of additional resources. It's still ultimately a 410.

Related

Redirect url with ids range to another url using htaccess

I try to redirect user from Joomla plugins links that have specific IDs to the default admin page as following:
When user login in Joomla backend, he can reach this page of plugins:
https://www.example.com/administrator/index.php?option=com_plugins
Then if he wants to open a plugin with the id like 422 to edit it, he's to click on this link:
https://www.example.com/administrator/index.php?option=com_plugins&task=plugin.edit&extension_id=422
But instead of opening the plugin, I want the user to get redirected to this page:
https://www.example.com/administrator/index.php
To achieve this, I created a .htaccess in the folder administrator and placed the code at the end. So, I set a range of IDs of plugins that user cannot edit, but gets redirected.
Please find the all content of .htaccess file as following:
# Canonical https/www
<IfModule mod_rewrite.c>
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule (.*) https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
</IfModule>
# Redirect plug id from 350 to 423:
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteCond %{QUERY_STRING} (^|&)option\=com_plugins($|&)
RewriteCond %{QUERY_STRING} (^|&)extension_id=\b(3[5-8][0-9]|39[0-9]|4[01][0-9]|42[0-3])\b($|&)
RewriteRule ^administrator/index\.php$ https://www.example.com/administrator/index.php? [L,R=302]
# Redirect plug id from 425 to 10864:
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteCond %{QUERY_STRING} (^|&)option\=com_plugins($|&)
RewriteCond %{QUERY_STRING} (^|&)extension_id=\b(42[5-9]|4[3-9][0-9]|[5-9][0-9]{2}|[1-8][0-9]{3}|9[0-8][0-9]{2}|99[0-8][0-9]|999[0-9]|10[0-7][0-9]{2}|108[0-5][0-9]|1086[0-4])\b($|&)
RewriteRule ^administrator/index\.php$ https://www.example.com/administrator/index.php? [L,R=302]
But does not work.
I create a .htaccess in the folder administrator and placed the code at the end.
# Redirect plug id from 350 to 423:
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteCond %{QUERY_STRING} (^|&)option\=com_plugins($|&)
RewriteCond %{QUERY_STRING} (^|&)extension_id=\b(3[5-8][0-9]|39[0-9]|4[01][0-9]|42[0-3])\b($|&)
RewriteRule ^administrator/index\.php$ https://www.example.com/administrator/index.php? [L,R=302]
If the .htaccess file is inside the /administrator subdirectory then you need to remove administrator/ from the start of the RewriteRule pattern (1st argument), otherwise the rule will never match.
In .htaccess, the RewriteRule pattern matches against a relative URL-path to the directory that contains the .htaccess file.
In other words, it should look like this:
:
RewriteRule ^index\.php$ https://www.example.com/administrator/index.php [QSD,R=302,L]
Also, on Apache 2.4 you can use the QSD (Query String Discard) flag instead of appending an empty query string to remove the original query string.
The preceding conditions that match the query string and plugin id are OK and should match the requested URL. (Although the word boundary \b elements are unnecessary.)
Depending on what other directives you have, this rule should be near the top of the .htaccess file, not "at the end". Since you have used an absolute substitution string it would be more optimal to include these rules before your general canonical redirects (although this does assume you are not implementing HSTS).
You are also missing the RewriteEngine On directive from the rules in question.
So, it should look like this instead:
RewriteEngine On
# Redirect plug id from 350 to 423:
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteCond %{QUERY_STRING} (^|&)option\=com_plugins($|&)
RewriteCond %{QUERY_STRING} (^|&)extension_id=(3[5-8][0-9]|39[0-9]|4[01][0-9]|42[0-3])($|&)
RewriteRule ^index\.php$ https://www.example.com/administrator/index.php? [QSD,R=302,L]
# Redirect plug id from 425 to 10864:
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteCond %{QUERY_STRING} (^|&)option\=com_plugins($|&)
RewriteCond %{QUERY_STRING} (^|&)extension_id=(42[5-9]|4[3-9][0-9]|[5-9][0-9]{2}|[1-8][0-9]{3}|9[0-8][0-9]{2}|99[0-8][0-9]|999[0-9]|10[0-7][0-9]{2}|108[0-5][0-9]|1086[0-4])($|&)
RewriteRule ^index\.php$ https://www.example.com/administrator/index.php [QSD,R=302,L]
# Canonical https/www
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
Additional notes:
I assume you have not implemented HSTS.
I reversed the order of your two canonical redirects to reduce the number of redirects when requesting http://example.com/ (HTTP + non-www). But this does assume #1 above.
Optimised the regex on the canonical redirects... no need to traverse and capture the entire URL-path when using the REQUEST_URI server variable.
Removed the word boundary \b from the regex as this would seem unnecessary here.

htaccess turn on https and www BUT only when live, not locally

I would like to have .htaccess code that adds https and www to my URLs – but only when my site is live, not when I am developing locally on my computer.
I have the following
RewriteCond %{HTTP_HOST} !^localhost(?::\d+)?$ [NC]
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [R=301,L,NE]
But this doesn't appear to be adding the www when my site is live. So for example https://example.com doesn't have the www and all the links are broken. For example https://example.com/about just gets a 404 Not Found error message.
Thanks for any help. I don't understand .htaccess files/code.
EDIT / UPDATE
Comparing the code to other code, should it be the following?
RewriteCond %{HTTP_HOST} !^localhost(?::\d+)?$ [NC]
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} !on
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]
Comparing the code to other code, should it be the following?
RewriteCond %{HTTP_HOST} !^localhost(?::\d+)?$ [NC]
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} !on
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [R=301,L,NE]
Yes.
Your original rule explicitly removed the www subdomain. If you had requested http://www.example.com/ (note HTTP), you would have been redirected to https://example.com/ (HTTPS + non-www). But nothing would happen if requesting https://www.example.com/ - the canonical URL (the rule is skipped because the 2nd and 3rd conditions fail).
The %1 backreference contains the match from the first capturing group in the preceding CondPattern. eg. Given the CondPattern ^(?:www\.)?(.+)$ then %1 contains whatever is matched by (.+) (the first capturing group), ie. the hostname, less the optional www. prefix.
There is no difference between testing %{HTTPS} off or %{HTTPS} !on - the result is the same.
Test first with 302 (temporary) redirects to avoid potential caching issues. You will need to clear your browser cache before testing since the erroneous redirect will have been cached by the browser.
RewriteCond %{HTTP_HOST} !^localhost(?::\d+)?$ [NC]
Checking for a port number would seem to be unnecessary here. This will fail if you are using localhost on the standard port (80 or 443) since the port number will be omitted from the Host header. Something like !^localhost would suffice, or perhaps !^localhost($|:) if you happen to using a domain name that starts localhost!

301 Redirect: New Domain and URL Structure

I need to set up redirects from an old domain to a new one. The URL structure will also be different since we'll be using a different CMS. I need to force HTTPS and non-www.
Is the code below correct for that case scenario?
# Needed before any rewriting
RewriteEngine On
# Force HTTPS and non-WWW
RewriteCond %{HTTP_HOST} ^(www\.)?olddomain\.com$ [NC]
RewriteRule ^ https://www.newdomain.com%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ https://www.newdomain.com%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
## 301 Redirects
# Page to Page
RewriteCond %{HTTP_HOST} ^olddomain\.com$ [NC]
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTPS} =on
RewriteRule ^collections/product/product-1$ https://newdomain.com/product/product-1? [R=301,NE,NC,L]
RewriteRule ^collections/product/product-2$ https://newdomain.com/product/product-2? [R=301,NE,NC,L]
What's the difference between the code above and a
Redirect 301 /collections/product/product-1 https://www.newdomain.com/product/product-1
Is one preferred over the other?
Have it your .htaccess file in following way.
1st way: Fixing OP's attempts here.
# Needed before any rewriting
RewriteEngine On
# Force HTTPS and WWW to non-www/www URLs.
RewriteCond %{HTTP_HOST} ^(www\.)?olddomain\.com$ [NC]
RewriteRule ^ https://www.newdomain.com%{REQUEST_URI} [L,NE,R=301]
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,NE,R=301]
## 301 Redirects
# Page to Page
RewriteCond %{HTTP_HOST} ^(www\.)?olddomain\.com$ [NC]
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTPS} =on
RewriteRule ^collections/product/product-1$ https://newdomain.com/product/product-1? [R=301,NE,NC,L]
RewriteCond %{HTTP_HOST} ^(www\.)?olddomain\.com$ [NC]
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTPS} =on
RewriteRule ^collections/product/product-2$ https://newdomain.com/product/product-2? [R=301,NE,NC,L]
2nd way: Using trick here to convert them into single rule if in case you have only collections/product/product-1 AND collections/product/product-2 rules only then try following and save one more condition/rules :)
# Needed before any rewriting
RewriteEngine On
# Force HTTPS and WWW to non-www/www URLs.
RewriteCond %{HTTP_HOST} ^(www\.)?olddomain\.com$ [NC]
RewriteRule ^ https://www.newdomain.com%{REQUEST_URI} [L,NE,R=301]
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,NE,R=301]
## 301 Redirects
# Page to Page
RewriteCond %{HTTP_HOST} ^(www\.)?olddomain\.com$ [NC]
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{HTTPS} =on
RewriteRule ^collections/product/product-([12])$ https://newdomain.com/product/product-$1? [R=301,NE,NC,L]
Important points related to OP's question:
I have removed your 2nd set of htaccess rules as follows ones. Why because your 1st ruleset itself takes care of www OR non-www both redirection to www URL, so you don't need to keep following ones.
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ https://www.newdomain.com%{REQUEST_URI} [L,R=301]
Now coming on one more point, I have added NE flag into your Rules for keeping urls in same format(not to convert them hexa form).
Now coming on your question of Redirect rule Redirect 301 /collections/product/product-1 https://www.newdomain.com/product/product-1 its simple, if you keep only this 1 line then your https and www will NOT be implemented in case URLs are NOT https or www, so it completely depends on your requirement, if you need to implement https and www then keep htaccess shown above(edited on from your shown attempts). But you need to still have redirect Rule because you are going to new domain, so have it in above shown way only.
Also have put this in regex (www\.)? to match in case www is there(which will be there since we had done redirect previous rules) in condition under your comment ## 301 Redirects.
NOTE: Either use 1st rules OR 2nd rules one at a time, 2nd one is preferred(if it meets my mentioned suggestion where you are having urls product-1 OR product-2)

HTACCESS 301 Redirect Rule to point all URL Variations to the Live URL

I am trying to achieve something which is working 99%, but there is a tiny issue.
Let's say my live URL is https://www.example.com/sample-page/
I want all the following URL variations to redirect to the live URL with a 301 status.
http://example.com/sample-page/
http://www.example.com/sample-page/
https://example.com/sample-page/
All of the above should redirect to https://www.example.com/sample-page/
I managed to get this working by using the htaccess rule displayed below.
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTP_HOST} !^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
The problem with the above rule is this: http://example.com/sample-page/ does a double redirect.
http://eyeacademy.com/expert-eye-examination/
301 Moved Permanently
https://eyeacademy.com/expert-eye-examination/
301 Moved Permanently
https://www.eyeacademy.com/expert-eye-examination/
200 OK
As you can see, http redirects to https and then https non-www redirects to https www. I have been trying a few tweaks to this rule and reading up, but I am sure someone here would have a quicker and more robust solution?
You can use this single rule to redirect http -> https and add www and there is no need to hardcode host name in the rule:
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [L,R=301,NE]
You can also reorder your existing rules and avoid multiple redirects like this:
# first add www and make sure it is https://
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301,NE]
# http -> https
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301,NE]
Use an or flag in your RewriteCond directive. Replace everything with the following:
RewriteCond %{HTTPS} off [NC,OR]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ https://www.example.com%{REQUEST_URI} [L,R=301]

Avoiding multiple Apache redirects for multiple rules

I have a need to put some redirects in place to enforce some rules and handle some URL changes.
Specifically as follows:
If the protocol is http, enforce https
If there is no subdomain or the subdomain is not www, enforce www
If the domain is www.domainA.com and the path does not begin with /en/, /fr/, /de/ or /es/, enforce /en/
If the domain is www.domainB.com and the path does not begin with /b_en/, /b_fr/, /b_de/ or /b_es/, enforce /b_en/
I've been trying to get this working so that only one 301 happens at any one time and we don't end up with a chain of 301s. For example a request to http://domainA.com could potentially be redirected 3 times:
http://domainA.com 301 to...
https://domainA.com 301 to...
https://www.domainA.com 301 to...
https://www.domainA.com/en/
However I've not been able to come with a solution.
This would live in a .htaccess file.
You can use in your .htaccess:
# domainA
RewriteCond %{HTTP_HOST} domainA\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/(?:en|fr|de|es) [NC]
RewriteRule ^ https://www.domainA.com/en%{REQUEST_URI} [NE,L,R=301]
# domainB
RewriteCond %{HTTP_HOST} domainB\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/b_(?:en|fr|de|es) [NC]
RewriteRule ^ https://www.domainB.com/b_en%{REQUEST_URI} [NE,L,R=301]
# https & www
RewriteCond %{HTTP_HOST} (?:^|\.)(domainA\.com|domainB\.com)$ [NC]
RewriteCond %{HTTP_HOST} !^www\. [NC,OR]
RewriteCond %{HTTPS} off
RewriteRule ^ https://www.%1%{REQUEST_URI} [NE,L,R=301]
Never more than one redirection.