How can I create a redirect with .htaccess to correct path instead of page acess - apache

I am making a multilingual dynamic site that creates a virtual path per language.
So french pages go to domain.com/fr/ english domain.com/en/page domain.com/fr/some/page but in reality these pages are in the base folder and /fr/ is converted to a query string.
This is all working with the following .htaccess:
RewriteEngine on
DirectorySlash Off # Fixes the issue where a page and folder can have the same name. See https://stackoverflow.com/questions/2017748
# Return 404 if original request is /foo/bar.php
RewriteCond %{THE_REQUEST} "^[^ ]* .*?\.php[? ].*$"
RewriteRule .* - [L,R=404]
# Remove virtual language/locale component
RewriteRule ^(en|fr)/(.*)$ $2?lang=$1 [L,QSA]
RewriteRule ^(en|fr)/$ index.php?lang=$1 [L,QSA]
# Rewrite /foo/bar to /foo/bar.php
RewriteRule ^([^.?]+)$ %{REQUEST_URI}.php [L]
My problem is that some sites (Like a Linkedin post) somehow remove the trailing / in the index page automatically. So if I put a link in my post of domain.com/fr/ somehow they make the link domain.com/fr even if it shows domain.com/fr/ but that 404's as domain.com/fr dosent exist.
So how can I redirect domain.com/fr to domain.com/fr/ or localhost/mypath/fr (There's many sites in my local workstation) to localhost/mypath/fr/.
I tried something like:
RewriteRule ^(.*)/(en|fr)$ $1/$2/ [L,QSA,R=301]
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
But that ended up somehow adding the full real computer path in the url:
localhost/mypath/fr becomes localhost/thepathofthewebserverinmypc/mypath/fr/
I would very much appreciate some help as I have yet to find the right rule.
Thank you

RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
You are just missing the slash prefix on the substitution string. Consequently, Apache applies the directory-prefix to the relative URL, which results in the malformed redirect.
For example:
RewriteRule ^(en|fr)$ /$1/ [L,R=301]
The substitution is now a root-relative URL path and Apache just prefixes the scheme + hostname to the external redirect. (The QSA flag is unnecessary here, since any query string is appended by default.)
This needs to go before the existing rewrites (and after the blocking rule for .php requests).
Note that the "internal rewrite" directives are correct to not have the slash prefix.
Aside:
DirectorySlash Off
Note that if you disable the directory slash, you must ensure that auto-generated directory listings (mod_autoindex) are also disabled, otherwise if a directory without a trailing slash is requested then a directory listing will be generated (exposing your file structure), even though there might be a DirectoryIndex document in that directory.
For example, include the following at the top of the .htaccess file:
# Disable auto-generated directory listings (mod_autoindex)
Options -Indexes
UPDATE:
this worked on the production server. As the site is in the server root. Would your know how can I also try and "catch" this on my localhost ? RewriteRule ^(.*)/(en|fr)$ /$1/$2/ [L,R=301] dosent catch but with only RewriteRule ^(en|fr)$ /$1/ [L,R=301] localhost/mypath/fr becomes localhost/fr/
From that I assume the .htaccess file is inside the /mypath subdirectory on your local development server.
The RewriteRule pattern (first argument) matches the URL-path relative to the location of the .htaccess file (so it does not match /mypath). You can then make use of the REQUEST_URI server variable in the substitution that contains the entire (root-relative) URL-path.
For example:
RewriteRule ^(en|fr)$ %{REQUEST_URI}/ [L,R=301]
The REQUEST_URI server variable already includes the slash prefix.
This rule would work OK on both development (in a subdirectory) and in production (root directory), so it should replace the rule above if you need to support both environments with a single .htaccess file.

Related

Apache2 mod_rewrite difficulty with GET variables

On the website.conf file I have:
<VirtualHost *:80>
DocumentRoot /srv/http/website/cgi-bin
ServerName website
ServerAlias www.website
RewriteEngine on
RewriteRule ^$ ""
RewriteRule ^([a-z]+)$ /?tab=repo
...
My goal is to have http://localhost/ redirect to localhost and http://localhost/word redirect to http://localhost/?tab=word.
With the current directives I get a 404 error, because it's trying to open the file repo # DocumentRoot. All I need is to rewrite the URL to make the word be a GET variable.
A directive like the following works:
RewriteRule /word$ http://localhost/?tab=word
This is obviously somewhat simplistic because I would then have to do it for every possibility.
I experimented with those directives on this website https://htaccess.madewithlove.com/, that I found from another thread on SO, the results are what I expect them to be, I.E.: http://localhost/word is transformed to http://localhost/?tab=word.
Extra info: The website does not have any PHP.
# Virtual Host
RewriteRule ^$ ""
RewriteRule ^([a-z]+)$ /?tab=repo
A directive like the following works:
RewriteRule /word$ http://localhost/?tab=word
The difference with the "working directive" is that you've included a slash prefix. The regex ^([a-z]+)$ does not allow for a slash prefix, so never matches.
You are also failing to use the captured backreference (ie. $1) in the substitution string, so it would always rewrite to /?tab=repo regardless of the URL requested.
Consequently, the first rule, that matches against ^$ will never match either - but this rule is not required. You are not performing a redirect when requesting the root - you just don't want to do anything and instead allow mod_dir to serve the directory index.
In a virtualhost context the URL-path matched by the RewriteRule pattern is a root-relative URL-path, starting with a slash.
So, your rule(s) should be like this instead:
RewriteEngine On
RewriteRule ^/([a-z]+)$ /?tab=$1 [L]
(Or, make the slash prefix optional, ie. ^/?([a-z]+)$)
However, /?tab=<word> is not strictly a valid end-point. What is the actual file that is handling the request? This should be included in the rewrite (and not rely on the DirectoryIndex). You state you are not using PHP, so how are you reading the URL parameter?
I experimented with those directives on this website https://htaccess.madewithlove.com/,
You are not using .htaccess in your example. mod_rewrite behaves slightly differently depending on context (.htaccess, directory, virtualhost and server).
Reference:
https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
What is matched?
In VirtualHost context, The Pattern will initially be matched against
the part of the URL after the hostname and port, and before the query
string (e.g. "/app1/index.html"). This is the (%-decoded) URL-path.
In per-directory context (Directory and .htaccess), the Pattern is
matched against only a partial path, for example a request of
"/app1/index.html" may result in comparison against "app1/index.html"
or "index.html" depending on where the RewriteRule is defined.
After tinkering a bit more I got down to this:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ %{REQUEST_FILENAME} [PT,L]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -d
RewriteRule ^(.*)$ %{REQUEST_FILENAME} [PT,L]
RewriteRule ^/([^index.cgi]{1}.+)$ /index.cgi?tab=$1 [L]
It opens files if they exist and sends requests that don't exist to my C++ cgicc program. The only thing I don't understand is why the -d condition isn't opening directories the same way the -f one opens files.

htaccess not working but check successful localhost

I have a simple .htaccessfile
DirectoryIndex index.php
RewriteEngine On
RewriteRule ^v4r.info/(.*)/(.*) v4r.info/NGOplus/index.php?NGO=$1&page=$2 [L,QSA]
I tested the file in htaccess.madewithlove.com, it gives a correct result and copy&pasting the result works flawlessly. (http://localhost/v4r.info/NGOplus/index.php?NGO=action-for-woman&page=board.list.php&ff=710;;;;;&startdate=2017-11-11)
But htaccess fails on localhost with an error:
File does not exist:
/var/www/html/public_html/v4r.info/action-for-woman/board.list.php
The test URL is
localhost/v4r.info/NGOplus/index.php?NGO=action-for-woman&page=board.list.php&ff=710;;;;;&startdate=2017-11-11
htaccess is active. (rubbish line gives "internal server error")
in another directory htaccess is working fine.
apache.conf seems ok (AllowOverride All)
Added:
The htaccess file is not in the base directory but in the 1. subdirectory (v4r.info).
What works is htaccess in v4r.info/NGOplus with a symlink 'action-for-woman' to NGOplus
RewriteRule ^(.+?)/?$ index.php?page=$1 [L,QSA]
Here, apache does a «local» rewrite, i.e. just the last part of the URL (the directory name 'action-for-woman' I have to extract from $_SERVER ...)
my .htaccess file is in v4r.info directory what is not the root directory.
In that case, your rule will never match. The RewriteRule pattern matches a URL-path relative to the directory that contains the .htaccess file.
But anyhow, rewriting is not recursive afaik.
Yes, it is "recursive" in a directory context (ie. .htaccess). In that the rewrite engine "loops" repeatedly until the URL passes through unchanged, or you have explicitly set END (Apache 2.4).
Try the following instead:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_URI} !index\.php$
RewriteRule ^([^/]+)/([^/]+)$ /v4r.info/NGOplus/index.php?NGO=$1&page=$2 [L,QSA]
The check against the REDIRECT_STATUS environment variable is to ensure that only direct requests are rewritten and not already rewritten requests.
However, this pattern is still far too generic as it matches any two path segments. I put the 2nd condition that checks index.php just so you can request /v4r.info/NGOplus/index.php directly (as you were doing in your tests). However, this could be avoided by making the regex more specific.

how do I rewrite a URL and maintain the file name

I have a rewrite written in my .htaccess file. I am trying to redirect the following
https://olddomain.com/folder/file.pdf to https://newdomain.com/folder/file.pdf. file.pdf can change so I need to change the domain but leave the folder and file name needs to stay what ever it is. it could be file.pdf or file1.pdf etc
I have this code in my .htaccess file
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/folder/(.*)$
RewriteRule ^(.*) https://newdomain.com/folder/%1 [R=301,NC]
If the file.pdf exists on the old server then the redirect works but if the file does not exist on the old server the redirect does not work.
Any help fixing this would be appreciated.
If the file.pdf exists on the old server then the redirect works but if the file does not exist on the old server the redirect does not work.
That sounds like you've put the rule/redirect in the wrong place. If you have other directives before this redirect that implement a front-controller pattern then you will experience this same behaviour since any request for a non-existent file would be routed to the front-controller (and request for an existing file is ignored) before your redirect is triggered - so no redirect occurs.
If this is the case then you need to move your rule to the top of the file, before any existing rewrites.
RewriteCond %{REQUEST_URI} ^/folder/(.*)$
RewriteRule ^(.*) https://newdomain.com/folder/%1 [R=301,NC]
However, your existing rule is not quite correct. Importantly, you are missing the L flag on the RewriteRule directive and the preceding RewriteCond directive is not required. For example, try the following instead:
RewriteRule ^folder/.* https://newdomain.com/$0 [NC,R=301,L]
This does assume your .htaccess file is located in the document root of the site.
Alternatively, you create an additional .htaccess file inside the /folder with the following:
RewriteEngine On
RewriteRule ^ https://newdomain.com%{REQUEST_URI} [R=301,L]
The REQUEST_URI server variable contains the full URL-path of the request (including the slash prefix).
By default, the mod_rewrite directives in the /folder/.htaccess file will completely override any directives in the parent (root) .htaccess file (the mod_rewrite directives in the parent are not even processed).

Apache htaccess rewrite root and all root folders to subfolder without redirecting

Options +FollowSymLinks -MultiViews
# Turn mod_rewrite on
RewriteEngine On
RewriteBase /
RewriteRule ^$ /subdir/ [L,NC]
I want to rewrite the root domain to subfolder without changing the URL in the browser. The above code works just for the root domain but not any folders and files.
For example, I have https://example.com/ and https://example.com/subdir/.
With the above code in .htaccess file, when I go to https://example.com/ I see the contents of https://example.com/subdir/ which is good.
But when I go to https://example.com/test.txt I should see https://example.com/subdir/test.txt but I get The requested URL was not found on this server.
Same happens when I go to https://example.com/abc expecting to see contents of https://example.com/subdir/abc
Any idea?
RewriteRule ^$ /subdir/ [L,NC]
Change this to read:
RewriteRule !^subdir/ subdir%{REQUEST_URI} [L]
Any request that does not start /subdir/ is internally rewritten to /subdir/<url>. The REQUEST_URI server variable contains the full URL-path (including the slash prefix).
I removed the slash prefix from the substitution string since you have defined a RewriteBase /. (Although neither are strictly necessary here.)
UPDATE:
...when I go to example.com/s I am being redirected to example.com/subdir/s/
s is a subfolder within subdir, does that make any difference?
Ah yes, if /s is a subdirectory then mod_dir will append the trailing slash (to "fix" the URL) with an external 301 redirect. This redirect occurs after the URL has been rewritten to /subdir/s - thus exposing the /subdir subdirectory.
To handle this situation we can add another rule (a redirect) before the existing rewrite that first checks whether the request would map to a directory within the /subdir subdirectory and append a slash if it is omitted (before mod_dir would append the slash to the rewritten URL).
For example:
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/subdir%{REQUEST_URI} -d
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/ [R=301,L]
This states... for any request that:
!\.\w{2,4}$ - does not contain (what looks like) a file extension of between 2 and 4 characters (assuming your directories aren't named this way)
!/$ - and does not currently end in a slash.
-d - and exists as a physical directory in the /subdir subdirectory.
THEN redirect to append the trailing slash on the original request
Whilst this probably should be a 301 (permanent) redirect, you should first test with a 302 (temporary) redirect to avoid potential caching issues.
You will need to clear your browser cache before testing, since the erroneous 301 redirect from /s to /subdir/s/ will have been cached by the browser.
A potential optimisation is to remove the filesystem check and simply assume that any request that does not contain a file extension should map to a directory. (But this depends on whether you are handling these URLs in any other way.)
Summary
Options +FollowSymLinks -MultiViews
# Turn mod_rewrite on
RewriteEngine On
# If the requested URL exists as a directory in "/subdir" then append a slash
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/subdir%{REQUEST_URI} -d
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/ [R=301,L]
# Rewrite everything to "/subdir"
RewriteRule !^subdir/ subdir%{REQUEST_URI} [L]

.htaccess to show a directory index.html without a trailing slash

I've got a Jekyll generated site running on an Apache server and I'm having some trouble getting my .htaccess file set up correctly. Jekyll places index.html files into folders which represent each page so my URLs currently look like domain.com/foo/
I'd like to remove that trailing slash from the URL so that it exactly matches what I had set up previously (and also because I think it looks better).
Currently the section of my .htaccess file dealing with rewites looks like:
<IfModule mod_rewrite.c>
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
</IfModule>
Options -Indexes
DirectoryIndex index.xml index.html
I have tried following the advice here but that puts me into a redirect loop.
Can anybody help me out? In brief, what I want is for a domain.com/foo URL to show the index.html file form the /foo directory and for domain.com/foo/ and domain.com/foo/index.html to redirect to domain.com/foo.
You should be able to use this to turn off the addition of slashes.
DirectorySlash Off
Note that the trailing slash is added for a good reason. Having the trailing slash in the directory name will make relative URLs point at the same thing regardless of whether the URL ends with "foo/bar/index.html" or just "foo/bar/". Without the trailing slash, relative URLs would reference something up one level from what they normally point at. (eg: "baz.jpg" would give the user "/foo/baz.jpg" instead of "/foo/bar/baz.jpg", as the trailing "bar" will get removed if it isn't protected by a trailing slash.) So if you do this, you probably want to avoid relative URLs.
To then rewrite the directory name to return the index.html you could probably do something like this:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule ^(.*)$ /$1/index.html [L]
This checks if REQUEST_URI/index.html exists, and if it does performs an internal redirect.