How to 404 urls in .htaccess? - apache

I'm trying to make all urls for https://www.example.com/page/ redirect to 404.
The problem I'm having is every number after /page/ is redirecting to the homepage, and is getting indexed - causing a duplicate content penalty.
So far I have around 30 pages indexed, which all redirect to the homepage.
I'd like to have any URL which has a number after /page/ to 404 (with the 404 header), so I can deindex all of these pages.
So far I've tried:
Redirect 404 ^page/(.*)$
or
Redirect 404 /page/*
Unsurprisingly these haven't worked - where am I going wrong?

Redirect 404 ^page/(.*)$ or Redirect 404 /page/*
You are close, except that the Redirect directive uses simple prefix-matching, it does not use a regular expression (regex) or wildcard match.
to make all urls for https://www.example.com/page/ redirect to 404.
To make all requests for /page/ result in a 404, regardless of whether a number follows it or not, you would use the following:
Redirect 404 /page/
As noted above, the Redirect directive uses simple prefix-matching, so the above serves a 404 for any URL that starts with /page/.
However ....
any URL which has a number after /page/ to 404
To specifically serve a 404 for URLs of the form /page/<number> only and not /page/ then you would need to use a RedirectMatch directive instead that uses regex to match and not prefix-matching.
For example:
RedirectMatch 404 ^/page/\d+$
\d+$ matches 1 or more digits to the end of the URL-path. So the above will match /page/1 and /page/123456 but not /page/ or /page/abc or /page/123z etc.
If instead you wanted to match /page/<something> where <something> is literally anything, but not match /page/ only. Then you could instead use the following:
RedirectMatch 404 ^/page/.
The above matches /page/ at the start of the URL-path followed by at least 1 other character.
Note that whilst we are using the Redirect and RedirectMatch directives here, there is no external redirect (3xx response). The 404 is served by Apache as an internal subrequest. Apache sends the 404 Not Found header.
However, if you specifically want to "deindex" these pages quicker then consider sending a "410 Gone" instead. This is a stronger signal for search engines that the page is not coming back. In this case you can also use the gone keyword in place of the 410 status code.

Related

Apache htaccess return 404 code even if page exists without using rewrite mod:

To protect my server from bots, I want to return a 404 error page if certain files are requested EVEN IF THEY EXIST but without using the rewrite mod. It is possible?
You can use a mod_alias Redirect (or RedirectMatch) directive.
For example:
Redirect 404 /file1.html
Redirect 404 /file2.html
This doesn't actually trigger a "redirect", as in an external 3xx redirect. It sends the stated 404 response if one of the URLs is requested (regardless of whether that file exists or not).
Reference:
https://httpd.apache.org/docs/current/mod/mod_alias.html#redirect
If there is a pattern to these URLs then use a RedirectMatch directive instead, which matches against a regex rather than using simple prefix-matching. For example, to serve a 404 for all requests to the /secret subdirectory (and all files within) then use the following:
RedirectMatch 404 ^/secret($|/)
To customise the 404 response use an ErrorDocument directive:
ErrorDocument 404 /error-docs/404.html

RedirectMatch without last part of URL

I have this RedirecMatch
RedirectMatch 301 ^/en/products/(.*)/(.*)/(.*)$ https://www.example.com/en/collections/$2/
If I visit
https://www.example.com/en/products/sofas/greyson/greyson-sofa
I'm redirected to
https://www.example.com/en/collections/greyson/greyson-sofa
What I want is
https://www.example.com/en/collections/greyson/
How do I accomplish this?
There's nothing obvious in what you have posted that would produce the specific output you are seeing, however, there are other errors in the directives and you may be seeing a cached response. 301s are cached persistently by the browser, so any errors are also cached.
The Redirect directive is prefix-matching and everything after the match is copied onto the end of the target URL. So, the redirect you are seeing would be produced by a directive something like this:
Redirect 301 /en/products/sofas/greyson https://www.example.com/en/collections/sofas/greyson
When you request /en/products/sofas/greyson/greyson-sofa, the part after the match, ie. /greyson-sofa, is copied onto the end of the target URL to produce /en/collections/sofas/greyson/greyson-sofa
You can resolve most of these issues by reordering your rules (but also watch the trailing slashes). You need to have the most specific redirects first. RedirectMatch before Redirect. For example, take the following two redirects:
Redirect 301 /en/products/accessories https://www.example.com/en/products/complements/
Redirect 301 /en/products/accessories/bush/ https://www.example.com/en/collections/bush-on/
Since the Redirect directive is prefix-matching, a request for /en/products/accessories/bush/ will actually be caught by the first rule, not the second and end up redirecting to /en/products/complements//bush-on/ - note the erroneous double-slash (since you have a mismatch of trailing slashes on the source and target URLs.)
You need to reverse these two rules. (But also watch the trailing slash.)
The same applies to the Redirect directives that follow. You also have some duplication, ie. You have two rules for /en/products/chairs-and-bar-stools/piper/?

Intelligent redirection

I am a newbie using Apache 2.4.18.
I have URLs with the following form that I'd like to redirect.
Current URL: https://www.example.org/page/10/
Desired URL: https://www.example.org/index.php/page/10/
If I use the following rule, I can modify a request, eg for page 2:
Redirect permanent /page/2/ /index.php/page/2/
However, I want to redirect without having to hardcode all the pages on my site. I have tried the following, my browser fails after many redirects:
RedirectMatch /page/(.*)/$ /index.php/page/$1/
And using the following fails, I don't know why:
RedirectMatch "https://www.example.org/page/(.*)/$" "https://www.example.org/index.php/page/$1/"
What am I doing wrong?
In your line
RedirectMatch /page/(.*)/$ /index.php/page/$1/
/page/2/ will in fact redirect to /index.php/page/2/, but this new URL will still match your RedirectMatch’s regex and will produce another redirect. That’s why it redirects endlessly until the browser gives up (see this example).
I’d try with RedirectMatch ^/page/(.*)/$ /index.php/page/$1/, so when redirected, /index.php/page/2/ will no longer match ^/page/(.*)/$ and will not produce the recursive redirects.
As an aftertought, why not use RewriteRule instead? It’d save an additional HTTP request to the client, as redirects are usually used when you need to send the client to a different server.

Rewrite language code into url if missing

I want to change an existing Magento store to add store/lang codes to the url i.e.
http://mystore/en/PRODUCTXYZ.html
http://mystore/de/PRODUCTXYZ.html
Old links to http://mystore/PRODUCTXYZ.html will now throw a 404 error.
How can I create an Apache url rewrite rule to add a language code if it is missing i.e. rewrite
http://mystore/PRODUCTXYZ.html
to
http://mystore/de/PRODUCTXYZ.html
So that old links 301 redirect to the correct product.
I have worked around this with
Redirect 301 /PRODUCTXYZ http://mystore/de/PRODUCTXYZ.html
But obviously for thousands of products this might not be practical.
You can redirect multiple Product.html urls with just 1 line of code using RedirectMatch :
RedirectMatch 302 ^/([^/.]+)\.html$ http://example.com/de/$1.html
I used 302 for testing purposes and to avoid browser's cache,
change 302 to 301 (permanent redirect) when you are sure the redirect is working.

301 Redirect A URL Pattern Using .htaccess

I want to redirect my URLs to a new pattern. For this purpose, I used 301 redirect for every single URL but that are taking a huge time and my .htaccess file is going large and large as I have thousands of URLs.
So now Someone said to me to use .htaccess to use 301 redirect or rewrite engine option. Now I am new to .htaccess to use Pattern Redirect. First of all clear me out that is this possible to use 301 redirect in patterns? If yes then Can I do pattern 301 redirect in the below URLs? I want to redirect the below pattern so can you help me?
/search/label/XXXXXXXXXX to /category/XXXXXXXXXX
/year/month/XXXXXXXXXX.html/?m=0 to /year/month/XXXXXXXXXX.html
/year/month/XXXXXXXXXX.html/?m=1 to /year/month/XXXXXXXXXX.html
/search to /
/feed to /
XXXXXXXXXX means some text/no that are dynamic and changeable. year and month means only no that are also dynamic and changeable. / means site homepage. Rest are fixed text.
Please keep in mind that sometime there are many variables in every URL so we also want to avoid that that always start from ?variable=value&variable=value in the end of every URL.
After asking here, I keep trying myself too so I am able to do it and working on my side. I added below codes in my .htaccess file and after that I am able to redirect all upper URLs without any 404 error.
Redirect 301 /search/label http://www.example.com/category
Redirect 301 /search http://www.example.com
Redirect 301 /feed http://www.example.com
For 2,3 URL pattern, I did nothing because after checking, its not showing any 404 error as they are only variable in front of URL so no need to edit that.