RedirectMatch without last part of URL - apache

I have this RedirecMatch
RedirectMatch 301 ^/en/products/(.*)/(.*)/(.*)$ https://www.example.com/en/collections/$2/
If I visit
https://www.example.com/en/products/sofas/greyson/greyson-sofa
I'm redirected to
https://www.example.com/en/collections/greyson/greyson-sofa
What I want is
https://www.example.com/en/collections/greyson/
How do I accomplish this?

There's nothing obvious in what you have posted that would produce the specific output you are seeing, however, there are other errors in the directives and you may be seeing a cached response. 301s are cached persistently by the browser, so any errors are also cached.
The Redirect directive is prefix-matching and everything after the match is copied onto the end of the target URL. So, the redirect you are seeing would be produced by a directive something like this:
Redirect 301 /en/products/sofas/greyson https://www.example.com/en/collections/sofas/greyson
When you request /en/products/sofas/greyson/greyson-sofa, the part after the match, ie. /greyson-sofa, is copied onto the end of the target URL to produce /en/collections/sofas/greyson/greyson-sofa
You can resolve most of these issues by reordering your rules (but also watch the trailing slashes). You need to have the most specific redirects first. RedirectMatch before Redirect. For example, take the following two redirects:
Redirect 301 /en/products/accessories https://www.example.com/en/products/complements/
Redirect 301 /en/products/accessories/bush/ https://www.example.com/en/collections/bush-on/
Since the Redirect directive is prefix-matching, a request for /en/products/accessories/bush/ will actually be caught by the first rule, not the second and end up redirecting to /en/products/complements//bush-on/ - note the erroneous double-slash (since you have a mismatch of trailing slashes on the source and target URLs.)
You need to reverse these two rules. (But also watch the trailing slash.)
The same applies to the Redirect directives that follow. You also have some duplication, ie. You have two rules for /en/products/chairs-and-bar-stools/piper/?

Related

Prevent wildcard in htaccess 301 redirect

I have setup a htaccess 301 redirect to redirect an indexed page to a new page, however this is also acting as a wildcard redirect for child pages which I do no what to happen.
Old structure
example.com/faq
example.com/faq/question-1
example.com/faq/question-2
etc etc
New structure
example.com/faqs
example.com/faq/question-1
example.com/faq/question-2
etc etc
htaccess redirect in place :
Redirect 301 /faq/ https://example.com/faqs/
This is working with no issues to send /faq to /faqs however it is also sending /faq/* to /faqs/* which I do not want to happen.
For example going to example.com/faq/question-1 causes a to many redirects error and finally lands on example.com/faqs/question-1
Is there anything i can add to the single redirect line to prevent this happening, or is their a more complex use of RewriteRule I could use instead. Research into the matter initially seem to confirm that this should/would happen, and if it does what can be added to prevent it. After a prompt to the apache docs I can see why they would redirect as a wildcard.
After suggests from CBroe an implementation of using RedirectMatch worked.
RedirectMatch 301 ^/faq/$ https://example.com/faqs
This now redirects /faq to /faqs, however doesn't redirect /faq/question-1 to /faqs/question-1 etc

Intelligent redirection

I am a newbie using Apache 2.4.18.
I have URLs with the following form that I'd like to redirect.
Current URL: https://www.example.org/page/10/
Desired URL: https://www.example.org/index.php/page/10/
If I use the following rule, I can modify a request, eg for page 2:
Redirect permanent /page/2/ /index.php/page/2/
However, I want to redirect without having to hardcode all the pages on my site. I have tried the following, my browser fails after many redirects:
RedirectMatch /page/(.*)/$ /index.php/page/$1/
And using the following fails, I don't know why:
RedirectMatch "https://www.example.org/page/(.*)/$" "https://www.example.org/index.php/page/$1/"
What am I doing wrong?
In your line
RedirectMatch /page/(.*)/$ /index.php/page/$1/
/page/2/ will in fact redirect to /index.php/page/2/, but this new URL will still match your RedirectMatch’s regex and will produce another redirect. That’s why it redirects endlessly until the browser gives up (see this example).
I’d try with RedirectMatch ^/page/(.*)/$ /index.php/page/$1/, so when redirected, /index.php/page/2/ will no longer match ^/page/(.*)/$ and will not produce the recursive redirects.
As an aftertought, why not use RewriteRule instead? It’d save an additional HTTP request to the client, as redirects are usually used when you need to send the client to a different server.

Redirect and append GET-Parameter

I'm having trouble with redirecting from one URL to another, while appending one GET-parameter to the NEW URL.
The first time (from root of old domain to root of new domain) works perfectly fine. After (further down in the .htaccess this one case) the GET-parameter appears randomly in the middle of the new url.
Obviously that leads to some nasty 404 situations...
RewiriteEngine on
Redirect 301 / https://foo.bar?redirect=1
Redirect 301 /foo/bar/ https://foo.bar/foo/bar?redirect=1
To give further information:
The idea is to redirect inklusive this parameter, to trigger a popup, giving information about the recent redirect so the user doesn't lose his confidence about the visit of foo.bar.
The first redirect will result in just the right way, it works perfectly.
The second thou, turns out like:
https://foo.bar/foo/?redirect=1bar/
Please and Thank you :)
Your rules work fine as you configure them , why ?
With mod_alias redirect and this part
Redirect 301 / https://foo.bar?redirect=1
Will match / & /foo/ & /foo/bar/ and so on , so the second rule will not work at all because of every request being captured by firrt rule ,and the result of redirecting /foo/bar/ with first rule is foo/bar?redirect=1foo/bar/ because with redirect every things after match will be appending to new target.
To avoid that use RedirectMatch
RewriteEngine On
RedirectMatch 301 /?$ https://foo.bar/foo/bar?redirect=1
RedirectMatch 301 /foo/bar/ https://foo.bar/foo/bar?redirect=1
By this way with regex , you will be able to match against root only by first rule then against /foo/bar/ by second rule.
Note: clear browser cache then test

Strip parent categories from url

I'm struggling to fix an issue with 301 redirects and .htaccess. I have moved a site from an old domain to a new domain. And I have successfully managed to do this with a 301 redirect. Like so:
Redirect 301 / https://newdomain.com
On the old site child category URLs are like this:
olddomain.com/product-category/parent-cat1/parent-cat2/child-cat
or
olddomain.com/product-category/parent-cat1/child-cat
or
olddomain.com/product-category/child-cat
Whereas on the new site they are:
newdomain.com/product-category/child-cat
Unfortunately, this is resulting in 404s from the redirects. Is there any way to remove the parent categories (which can vary by name and amount of them) from the URL?
Try including the following RedirectMatch directive before your existing Redirect directive:
RedirectMatch 302 ^/([\w-]+)/(?:[\w-]+/)+([\w-]+)$ https://newdomain.com/$1/$2
The RedirectMatch directive is complementary to the Redirect directive, both part of mod_alias. Except the RedirectMatch directive uses regex to match the URL-path, whereas Redirect uses simple prefix-matching.
This assumes that the path segments (ie. "product-category", "parent-cat" and "child-cat") consist of just the characters a-z, A-Z, 0-9, _ and - (hyphen). This needs to be as specific as possible so as not to match "too much". One or more "parent-cat" are required.
$1 is a backreference to the first captured group in the pattern. ie. ([\w-]+), the product-category. And $2 is a backreference to the second captured group, ie. ([\w-]+) at the end of the pattern, the child-cat. The (?:....) "group" in the middle is a non-capturing group, so there is no backreference that applies to this.
This is a 302 (temporary) redirect. Change it to a 301 only when it is working OK. It is easier to test with 302s since they are not cached by the browser. Consequently, you'll need to make sure your browser cache is clear before testing.

Does REQUEST_URI hide or ignore some filenames in .htaccess?

I'm having some difficulty with a super simple htaccess redirect.
All I want to do is rewrite absolutely everything, except a couple files.
htaccess looks like this:
RewriteEngine On
RewriteCond %{REQUEST_URI} !sitemap
RewriteCond %{REQUEST_URI} !robots
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The part that works is that everything gets redirected to new domain as it should be. And I can also access robots.txt without being forwarded, but not with sitemap.xml. If I try to go to sitemap.xml, the domain forwards along anyway and opens the sitemap file on the new domain.
I have this exact same issue when trying to "ignore" index.html. I can ignore robots, I can ignore alternate html or php files, but if I want to ignore index.html, the regex fails.
Since I can't actually SEE what is in the REQUEST_URI variable, my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI? I know this because of a stupid test. If I choose to ignore index.html like this:
RewriteCond %{REQUEST_URI} !index.html
Then if I type example.com/index.html I will be forwarded. But if I just type example.com/ the ignore actually works and it shows the content of index.html without forwarding!
How is it that when I choose to ignore the regex "index.html", it only works when "index.html" is not actually typed in the address bar!?!
And it gets even weirder! Should I type something like example.com/index.html?option=value, then the ignore rule works and I do NOT get forwarded when there are attributes like this. But index.html by itself doesn't work, and then just having the slash root, the rule works again.
I'm completely confused! Why does it seem like REQUEST_URI is not able to see some filenames like index.html and sitemap.xml? I've been Googling for 2 days and not only can I not find out if this is true, but I can't seem to find any websites which actually give examples of what these htaccess server variables actually contain!
Thanks!
my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI?
This is not true. There is no such special treatment of any requested URL. The REQUEST_URI server variable contains the URL-path (only) of the request. This notably excludes the scheme + hostname and any query string (which are available in their own variables).
However, if there are any other mod_rewrite directives that precede this (including the server config) that rewrite the URL then the REQUEST_URI server variable is also updated to reflect the rewritten URL.
index.html (Directory Index)
index.html is possibly a special case. Although, if you are explicitly requesting index.html as part of the URL itself (as you appear to be doing) then this does not apply.
If, on the other hand, you are requesting a directory, eg. http://example.com/subdir/ and relying on mod_dir issuing an internal subrequest for the directory index (ie. index.html), then the REQUEST_URI variable may or may not contain index.html - depending on the version of Apache (2.2 vs 2.4) you are on. On Apache 2.2 mod_dir executes first, so you would need to check for /subdir/index.html. However, on Apache 2.4, mod_rewrite executes first, so you simply check for the requested URL: /subdir/. It's safer to check for both, particularly if you have other rewrites and there is possibility of a second pass through the rewrite engine.
Caching problems
However, the most probable cause in this scenario is simply a caching issue. If the 301 redirect has previously been in place without these exceptions then it's possible these redirections have been cached by the browser. 301 (permanent) redirects are cached persistently by the browser and can cause issues with testing (as well as your users that also have these redirects cached - there is little you can do about that unfortunately).
RewriteCond %{REQUEST_URI} !(sitemap|index|alternate|alt) [NC]
RewriteRule .* alternate.html [R,L]
The example you presented in comments further suggests a caching issue, since you are now getting different results for sitemap than those posted in your question. (It appears to be working as intended in your second example).
Examining Apache server variables
#zzzaaabbb mentioned one method to examine the value of the Apache server variable. (Note that the Apache server variable REQUEST_URI is different to the PHP variable of the same name.) You can also assign the value of an Apache server variable to an environment variable, which is then readable in your application code.
For example:
RewriteRule ^ - [E=APACHE_REQUEST_URI:%{REQUEST_URI}]
You can then examine the value of the APACHE_REQUEST_URI environment variable in your server-side code. Note that if you have any other rewrites that result in the rewritting process to start over then you could get multiple env vars, each prefixed with REDIRECT_.
With the index.html problem, you probably just need to escape the dot (index\.html). You are in the regex pattern-matching area on the right-hand side of RewriteCond. With the un-escaped dot in there, there would need to be a character at that spot in the request, to match, and there isn't, so you're not matching and are getting the unwanted forward.
For the sitemap not matching problem, you could check to see what REQUEST_URI actually contains, by just creating an empty dummy file (to avoid 404 throwing) and then do a redirect at top of .htaccess. Then, in browser URL, type in anything you want to see the REQUEST_URI for -- it will show in address bar.
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
Credit MrWhite with that easy test method.
Hopefully that will show that sitemap in URL ends up as something else, so will at least partially explain why it's not pattern-matching and preventing redirect, when it should be pattern-matching and preventing redirect.
I would also test by being sure that the server isn't stepping in front of things with custom 301 directive that for whatever reason makes sitemap behave unexpectedly. Put this at the top of your .htaccess for that test.
ErrorDocument 301 default