.htaccess removing file extension is conflicting with folders of the same name - apache

I'm removing file extensions (like .html) from the url with a .htaccess file. The code in the file is working fine, but as soon as there is a folder with the same name as the file without extension, it redirects to the folder instead of redirecting to the file. For example, if I have a demo.html file and a demo folder in the same directory, as soon as I type in the searchbar of the browser www.example.com/demo, it redirects to the folder, instead of the file. If I delete the folder and I type the same thing again, it works perfectly! Any help would be appreciated :)
Here's the code in the .htaccess file:
RewriteCond %{THE_REQUEST} /([^.]+)\.html [NC]
RewriteRule ^ /%1 [NC,L,R]
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^ %{REQUEST_URI}.html [NC,L]

This is caused by a conflict with mod_dir. When you request a directory without a trailing slash, mod_dir will "fix" the URL and append a trailing slash with a 301 redirect. After which it will attempt to serve a DirectoryIndex document. This takes priority over your internal rewrite.
To resolve this you need to disable this behaviour with DirectorySlash Off.
For example:
# Ensure that directory listings are disabled
Options -Indexes
# Prevent mod_dir appending a slash to physical directories
DirectorySlash Off
# Redirect to remove the ".html" extension
RewriteCond %{THE_REQUEST} /([^.?]+)\.html [NC]
RewriteRule ^ /%1 [NC,L,R=301]
# Rewrite request to append ".html" extension if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.*) $1.html [L]
Directory listings (mod_autoindex) need to be disabled when you disable DirectorySlash because if mod_autoindex is enabled then when you request the directory without a slash, a directory listing will be generated, regardless of whether you have a DirectoryIndex document (eg. index.html) in that directory that would ordinarily prevent the directory listing being generated.
Also, I've "fixed" your existing rules that remove and append the .html extension. The first rule that removes the .html extension could have potentially matched an instance of .html that appeared in the query string. And the second rule that appends the .html extension would have resulted in a rewrite-loop (500 error) if requesting /demo/<does-not-exist> - where demo is a directory and a file basename (as in your example).
See my answer to a related question on ServerFault for more information on this potential rewrite-loop:
https://serverfault.com/questions/989333/using-apache-rewrite-rules-in-htaccess-to-remove-html-causing-a-500-error

Related

How can I create a redirect with .htaccess to correct path instead of page acess

I am making a multilingual dynamic site that creates a virtual path per language.
So french pages go to domain.com/fr/ english domain.com/en/page domain.com/fr/some/page but in reality these pages are in the base folder and /fr/ is converted to a query string.
This is all working with the following .htaccess:
RewriteEngine on
DirectorySlash Off # Fixes the issue where a page and folder can have the same name. See https://stackoverflow.com/questions/2017748
# Return 404 if original request is /foo/bar.php
RewriteCond %{THE_REQUEST} "^[^ ]* .*?\.php[? ].*$"
RewriteRule .* - [L,R=404]
# Remove virtual language/locale component
RewriteRule ^(en|fr)/(.*)$ $2?lang=$1 [L,QSA]
RewriteRule ^(en|fr)/$ index.php?lang=$1 [L,QSA]
# Rewrite /foo/bar to /foo/bar.php
RewriteRule ^([^.?]+)$ %{REQUEST_URI}.php [L]
My problem is that some sites (Like a Linkedin post) somehow remove the trailing / in the index page automatically. So if I put a link in my post of domain.com/fr/ somehow they make the link domain.com/fr even if it shows domain.com/fr/ but that 404's as domain.com/fr dosent exist.
So how can I redirect domain.com/fr to domain.com/fr/ or localhost/mypath/fr (There's many sites in my local workstation) to localhost/mypath/fr/.
I tried something like:
RewriteRule ^(.*)/(en|fr)$ $1/$2/ [L,QSA,R=301]
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
But that ended up somehow adding the full real computer path in the url:
localhost/mypath/fr becomes localhost/thepathofthewebserverinmypc/mypath/fr/
I would very much appreciate some help as I have yet to find the right rule.
Thank you
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
You are just missing the slash prefix on the substitution string. Consequently, Apache applies the directory-prefix to the relative URL, which results in the malformed redirect.
For example:
RewriteRule ^(en|fr)$ /$1/ [L,R=301]
The substitution is now a root-relative URL path and Apache just prefixes the scheme + hostname to the external redirect. (The QSA flag is unnecessary here, since any query string is appended by default.)
This needs to go before the existing rewrites (and after the blocking rule for .php requests).
Note that the "internal rewrite" directives are correct to not have the slash prefix.
Aside:
DirectorySlash Off
Note that if you disable the directory slash, you must ensure that auto-generated directory listings (mod_autoindex) are also disabled, otherwise if a directory without a trailing slash is requested then a directory listing will be generated (exposing your file structure), even though there might be a DirectoryIndex document in that directory.
For example, include the following at the top of the .htaccess file:
# Disable auto-generated directory listings (mod_autoindex)
Options -Indexes
UPDATE:
this worked on the production server. As the site is in the server root. Would your know how can I also try and "catch" this on my localhost ? RewriteRule ^(.*)/(en|fr)$ /$1/$2/ [L,R=301] dosent catch but with only RewriteRule ^(en|fr)$ /$1/ [L,R=301] localhost/mypath/fr becomes localhost/fr/
From that I assume the .htaccess file is inside the /mypath subdirectory on your local development server.
The RewriteRule pattern (first argument) matches the URL-path relative to the location of the .htaccess file (so it does not match /mypath). You can then make use of the REQUEST_URI server variable in the substitution that contains the entire (root-relative) URL-path.
For example:
RewriteRule ^(en|fr)$ %{REQUEST_URI}/ [L,R=301]
The REQUEST_URI server variable already includes the slash prefix.
This rule would work OK on both development (in a subdirectory) and in production (root directory), so it should replace the rule above if you need to support both environments with a single .htaccess file.

how do I rewrite a URL and maintain the file name

I have a rewrite written in my .htaccess file. I am trying to redirect the following
https://olddomain.com/folder/file.pdf to https://newdomain.com/folder/file.pdf. file.pdf can change so I need to change the domain but leave the folder and file name needs to stay what ever it is. it could be file.pdf or file1.pdf etc
I have this code in my .htaccess file
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/folder/(.*)$
RewriteRule ^(.*) https://newdomain.com/folder/%1 [R=301,NC]
If the file.pdf exists on the old server then the redirect works but if the file does not exist on the old server the redirect does not work.
Any help fixing this would be appreciated.
If the file.pdf exists on the old server then the redirect works but if the file does not exist on the old server the redirect does not work.
That sounds like you've put the rule/redirect in the wrong place. If you have other directives before this redirect that implement a front-controller pattern then you will experience this same behaviour since any request for a non-existent file would be routed to the front-controller (and request for an existing file is ignored) before your redirect is triggered - so no redirect occurs.
If this is the case then you need to move your rule to the top of the file, before any existing rewrites.
RewriteCond %{REQUEST_URI} ^/folder/(.*)$
RewriteRule ^(.*) https://newdomain.com/folder/%1 [R=301,NC]
However, your existing rule is not quite correct. Importantly, you are missing the L flag on the RewriteRule directive and the preceding RewriteCond directive is not required. For example, try the following instead:
RewriteRule ^folder/.* https://newdomain.com/$0 [NC,R=301,L]
This does assume your .htaccess file is located in the document root of the site.
Alternatively, you create an additional .htaccess file inside the /folder with the following:
RewriteEngine On
RewriteRule ^ https://newdomain.com%{REQUEST_URI} [R=301,L]
The REQUEST_URI server variable contains the full URL-path of the request (including the slash prefix).
By default, the mod_rewrite directives in the /folder/.htaccess file will completely override any directives in the parent (root) .htaccess file (the mod_rewrite directives in the parent are not even processed).

Apache htaccess rewrite root and all root folders to subfolder without redirecting

Options +FollowSymLinks -MultiViews
# Turn mod_rewrite on
RewriteEngine On
RewriteBase /
RewriteRule ^$ /subdir/ [L,NC]
I want to rewrite the root domain to subfolder without changing the URL in the browser. The above code works just for the root domain but not any folders and files.
For example, I have https://example.com/ and https://example.com/subdir/.
With the above code in .htaccess file, when I go to https://example.com/ I see the contents of https://example.com/subdir/ which is good.
But when I go to https://example.com/test.txt I should see https://example.com/subdir/test.txt but I get The requested URL was not found on this server.
Same happens when I go to https://example.com/abc expecting to see contents of https://example.com/subdir/abc
Any idea?
RewriteRule ^$ /subdir/ [L,NC]
Change this to read:
RewriteRule !^subdir/ subdir%{REQUEST_URI} [L]
Any request that does not start /subdir/ is internally rewritten to /subdir/<url>. The REQUEST_URI server variable contains the full URL-path (including the slash prefix).
I removed the slash prefix from the substitution string since you have defined a RewriteBase /. (Although neither are strictly necessary here.)
UPDATE:
...when I go to example.com/s I am being redirected to example.com/subdir/s/
s is a subfolder within subdir, does that make any difference?
Ah yes, if /s is a subdirectory then mod_dir will append the trailing slash (to "fix" the URL) with an external 301 redirect. This redirect occurs after the URL has been rewritten to /subdir/s - thus exposing the /subdir subdirectory.
To handle this situation we can add another rule (a redirect) before the existing rewrite that first checks whether the request would map to a directory within the /subdir subdirectory and append a slash if it is omitted (before mod_dir would append the slash to the rewritten URL).
For example:
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/subdir%{REQUEST_URI} -d
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/ [R=301,L]
This states... for any request that:
!\.\w{2,4}$ - does not contain (what looks like) a file extension of between 2 and 4 characters (assuming your directories aren't named this way)
!/$ - and does not currently end in a slash.
-d - and exists as a physical directory in the /subdir subdirectory.
THEN redirect to append the trailing slash on the original request
Whilst this probably should be a 301 (permanent) redirect, you should first test with a 302 (temporary) redirect to avoid potential caching issues.
You will need to clear your browser cache before testing, since the erroneous 301 redirect from /s to /subdir/s/ will have been cached by the browser.
A potential optimisation is to remove the filesystem check and simply assume that any request that does not contain a file extension should map to a directory. (But this depends on whether you are handling these URLs in any other way.)
Summary
Options +FollowSymLinks -MultiViews
# Turn mod_rewrite on
RewriteEngine On
# If the requested URL exists as a directory in "/subdir" then append a slash
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/subdir%{REQUEST_URI} -d
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/ [R=301,L]
# Rewrite everything to "/subdir"
RewriteRule !^subdir/ subdir%{REQUEST_URI} [L]

.htaccess setting for handling index files, trailing slashes, and file extensions

My website is pretty standard containing a bunch of index files inside folders. ie:
folder1
> index.php
> some-file.php
sub-folder1
> index.php
sub-folder2
> index.php
folder2
> some-file.html
index.html
I recently made some .htaccess changes that were supposed to allow a user to enter a file name without the .html extension but, still be directed to the correct file on the server. It was also supposed to remove trailing slashes. This is the code in my root .htaccess file:
DirectoryIndex index.html
RewriteEngine On
# remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{THE_REQUEST} \s(.+?)/+[?\s]
#RewriteRule ^(.+?)/$ /$1 [R,L]
RewriteRule ^(.+?)/$ /$1 [R=301,L]
# To internally forward /dir/file to /dir/file.html
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}/$1.html -f [NC]
RewriteRule ^(.+?)/?$ /$1.html [L]
This seems to work fine with html files. such as including a link to:
folder2/some-file
However, when I link to folder1/ where the index file is a php file, I'm directed to a list of files and folders inside folder1. I would expect the behavior to pull up index.php not the file list.
If I include the entire path in a link such as: folder1/index.php it works just fine.
This behavior all changed when I added the .htaccess setting above. Before this I didn't even have a .htaccess file on my site. I'm assuming it had to do with the code inside the .htaccess file but, I have no idea how to fix it, as the code it something I found on a help forum.
I also noticed that before I changed my .htaccess setting I could link to my root index.html from any page and the browser would just display www.domain.com. now it always shows www.domain.com/index
Wondering if anyone knows the correct setting I should be using inside my .htaccess file?
You made DirectoryIndex index.html and if there is no index.html in directory ,listing will be done even there is index.php there .
Add it like this :
DirectoryIndex index.html index.php
So , if there is no index.html request to directory will go to index.php .

.htaccess DirectorySlash Off causes "/" to be inaccessible

I am currently using the following code in my .htaccess:
I have (A|B) because I have A.php, B.php, /A, and /B so I want the URL to redirect to A.php when /A is called.
RewriteEngine On
DirectorySlash Off
RewriteCond %{REQUEST_FILENAME} (A|B)/$
RewriteRule ^(.*)/$ $1
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php
However, with
DirectorySlash Off
I cannot access my root directory without a trailing slash. For e.g
If I have the files under directory /site, I can access /site/, /site/index, and /site/index.php but I cannot access using /site as it gives me a 403 error. I have seen on other SO questions that with DirectorySlash Off it will skip /site.
Is there a way to write a rule such that it applies to all files/directories apart from the root /site?
If not is there a way to remove trailing slashes for URLs where a .php file is involved?
I am asking this because when DirectorySlash is On, I may try /site/A?id=824 and the URL will become /site/A/?id=824.
It's probably because you have autoindexes turned off, so if you try to access a directory without the trailing slash, it tries to list the contents of the directory and returns a 403 instead because it's not allowed to auto index.
Turning directory slash off means if someone goes to /site without the trailing slash, you get a list of contents of the /site directory, and not the index file (e.g. /site/index.php). This is why Directory Slash exists.
Instead of stripping off the slash, you'll have to check for the slash and remove it at the same time you're checking for the php file:
RewriteCond %{REQUEST_URI} ^/(.*?)/?$
RewriteCond %{DOCUMENT_ROOT}/%1.php -f
RewriteRule ^ /%1.php [L]
And don't turn off DirectorySlash.