Apache Rewrite urls - remove .html and return 404 if .html is present - apache

I am adding a directory to a website that is served with Apache 2 that I want to drop the .html extension from incoming requests. In /new-directory I have a .htaccess file containing:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.html [NC,L]
So with this rule /new-directory/page works, but /new-directory/page.html also works, which I don't want. I want all pages in new-directory/ and sub-directories to only serve pages without .html, and return a 404 not found if a page.html request comes in.
These are new pages so I don't care about redirecting.
Edit:
Forgot to mention that there is only one file in /new-directory (/new-directory/dhandler) - a Perl script that parses the incoming url if there is a matching database entry. There are no files to match so I can drop that condition.

Figured it out, my fault that I didn't explain that there is only one default file handler in /new-directory which lead to some confusion, see edit above - anyway this worked:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^\.]+)$ $1.html [NC,L]
RewriteCond %{THE_REQUEST} .*\.html[\s\?]{1}
RewriteRule .*\.html$ - [R=404,L]
What messed me up was that sometimes these urls will have query strings so I have to check for both space and ? to match .html in %{THE_REQUEST}.
EDIT: If someone has actual .html files in a directory(instead of one file that dynamically handles all requests like I do) then they should probably add:
RewriteCond %{REQUEST_FILENAME} -f
right after 'RewriteEngine On' To make sure that the incoming request matches an existing file in /new-directory.

You need a couple of rules to do this:
RewriteBase /
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule (.*) $1.html [NC,L]
RewriteCond %{THE_REQUEST} .*\.html\s
RewriteRule .*\.html$ - [R=404,L]
The first one checks that there is actually an html file that corresponds to the request. If it does, it will internally rewrite the request to that.
The second rule will redirect anything that ends .html to a 404 not found.

Place this code in /new-directory/.htaccess:
RewriteEngine On
RewriteBase /new-directory/
RewriteCond %{THE_REQUEST} /(?:index)?(.*?)\.html[\s?] [NC]
RewriteRule ^ %1 [R=301,L,NE]
RewriteCond %{REQUEST_FILENAME} !-d
#RewriteCond %{DOCUMENT_ROOT}/new-directory/$1\.html -f [NC]
RewriteRule ^(.+?)/?$ $1.html [L]

Related

Rewrite URL using .htaccess by replacing /? with?

How can I rewrite URL (i.e. remove last / after test) using .htaccess on php page from
from www.example.com/test/?sku=23456&qty=3 to www.example.com/test?sku=23456&qty=3
from www.example.com/page2/?page=3&emp=543 to www.example.com/page2?page=3&emp=543
from www.example.com/stream/?start=4&id=tdfcs45s&q=sat to www.example.com/stream?start=4&id=tdfcs45s&q=sat
I tried but it doesn't work
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^\.]+)$ $1.php [NC,L]
With your shown samples and attempts please try following .htaccess rules file. We need to use THE_REQUEST variable here of apache. Check this if this helps you, also clear your browser cache before testing your URLs.
RewriteEngine ON
RewriteCond %{THE_REQUEST} \s(/test)/(\?sku=(\d+)&qty=\d+)\s [NC]
RewriteRule ^ %1%2 [L]

htaccess remove folder redirect

I have a problem removing folders from an url. I want that google / old links aren't broken. The old webpage had several sections with a structure like this
example.com/news/items/entry1.html
example.com/news/items/entry2.html
example.com/blog/items/foo.html
The new page has the urls like this:
example.com/news/entry1
example.com/news/entry2
example.com/blog/foo
Removing html was rather straight forward
<IfModule mod_rewrite.c>
RewriteEngine On
# Send would-be 404 requests to Craft
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/(favicon\.ico|apple-touch-icon.*\.png)$ [NC]
RewriteRule (.+) index.php [QSA,L]
RewriteCond %{THE_REQUEST} /([^.]+)\.html[\s?] [NC]
RewriteRule ^ /%1 [R=302,L,NE]
</IfModule>
The part I'm struggling with is removing the 'items' part. The rules I found only worked for request path like 'example.com/items/subfolder1/...'
Any help would be greatly appreciated.
To remove /items/ from your URLs you can use the following in your .htaccess file:
RewriteEngine On
RewriteRule ^(.*)/items /$1 [L,R=301]
So for example, this will take the URL: http://example.com/news/items/entry1 and turn it into http://example.com/news/entry1
Make sure you clear your cache before testing this.

Hiding file extension and redirecting to https

First off, I know there are many questions similar to this one. I've read everything I can find, but the solutions I see elsewhere don't seem to work for me. I'm really hoping someone can give me some insight here.
I am trying to use Apache's .htaccess directives to force specific pages on my server to use ssl. In addition to those directives, I'm also using some rewrites to mask .php and .html extensions.
I created a page, https-test.html. I want that page specifically to always get redirected so it uses https and so that .html gets stripped off, like https://www.example.com/https-test
However, I seem to always end up with a loop. Reading the Apache docs for 6 hours got me closer, but I'm still missing something.
Below is my annotated htaccess file.
RewriteEngine on
# If port is insecure...
RewriteCond %{SERVER_PORT} ^80$
# And requested URI is /https-test...
RewriteCond %{REQUEST_URI} ^(.*/)https-test$ [NC]
# Then point the server to the secure url:
RewriteRule . "https://www.example.com/https-test" [L,R]
# The next few lines try matching extensionless requests to .php files
# If the requested file is not a directory...
RewriteCond %{REQUEST_FILENAME} !-d
# And we CAN find a .php file matching that name...
RewriteCond %{REQUEST_FILENAME}\.php -f
# Then point us to that .php file and append the query string.
RewriteRule ^(.+)$ $1.php [L,QSA]
# These next few lines were added by the previous project owner
# They're supposed to redirect requests like /foo.html to /foo,
# But I suspect these might be the culprit
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)/$ /$1 [R=301,NE,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ /$1.html [NE,L]
# Next few lines are legacy SEO stuff, some pages were linked to as
# php but now are html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} .php$
RewriteRule ^(.*).php$ /$1.html [L,NE]
So that's the code I have in my htaccess. And if I go to http://www.example.com/https-test in Chrome, I get www.mysite.com redirected you too many times.
You should probably just rewrite the code a bit. You are trying to match both extensionless files to php and html and doesn't look like you're accounting for each of the conditions. You should add a condition to make sure they are not tryiing to do the same things.
Backup your code, replace your code with this and give it a try. Clear all your cache before trying.
RewriteEngine on
# If port is insecure... redirect for a specific page
RewriteCond %{HTTPS} !^on [OR]
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule ^http-test/?$ https://www.example.com%{REQUEST_URI} [R=301,L]
# Next few lines are legacy SEO stuff, some pages were linked to as
# php but now are html
RewriteCond %{THE_REQUEST} ^GET\ /(.+)\.php
RewriteRule ^ /%1? [R=301,L]
# The next few lines try matching extensionless requests to .php files
# If the requested file is not a directory and php file exists
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.+)$ $1.php [L,QSA]
#remove trailing slash and is not a php file
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php !-f
RewriteRule ^([^\.]+)/$ /$1 [R=301,NE,L]
#finally redirect extensionless URI to html
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ /$1.html [NE,L]
Note I haven't tested this fully.

mod_rewrite redirect url if no file found to main page

I want to be able to Access my scripts without the .php, .html etc., so I already wrote
RewriteEngine on
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}\.php -f
RewriteRule ^(.*)$ $1.php
##same for other extensions
in my .htaccess file (note: this file lies not in the root-path), but I also want to Redirect every incorrect request to my main page, so that www.mysite.com/dir/incorrect will be rewritten to www.mysite.com/dir/.
But my first try (RewriteRule ^ / [R] after RewriteCond) redirected me to www.mysite.com/, my experiments with RewriteBase (RewriteBase . and RewriteBase /) didnt work and I also noticed that many similar scriptredirect to www.mysite.com/dir/index.php (www.mysite.com/dir/index in my case), but I really want to Redirect to www.mysite.com/dir/. Is there any way to achieve this?
Have it this way:
RewriteEngine on
# see if .php is found
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}\.php -f
RewriteRule ^(.*)$ $1.php [L]
# determine DIR_BASE dynamically
RewriteCond %{REQUEST_URI}::$1 ^(.*?/)(.*)::\2$
RewriteRule ^(.*)$ - [E=DIR_BASE:%1]
# if not found redirect to %{ENV:DIR_BASE}
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ %{ENV:DIR_BASE} [L,R]

Multiple RewriteConds and RewriteRule Stacked Together

I have this apache rewrite rule:
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} mycompany.com
RewriteRule ^$ http://mycompany.com/login [L]
# we check if the .html version is here (caching)
RewriteRule ^$ index.html [QSA]
RewriteRule ^([^.]+)$ $1.html [QSA]
RewriteCond %{REQUEST_FILENAME} !-f
# no, so we redirect to our front web controller
RewriteRule ^(.*)$ index.php [QSA,L]
The only thing I can make sense of is if it's mycompany.com, then the script will redirect to http://mycompany.com/login. If not, then ...
I can't figure out already.
Any idea what does the above script say?
Something quite interesting, not easy to understand.
A google search on the comment texts inside the code gave interesting results: http://www.google.com/search?q=%22%23+we+check+if+the+.html+version+is+here+%28caching%29%22
Edit: if we look at the last lines and knowing that Symfony uses caching (it creates local files with .html extension in the same directories as the URL shows 'em) I can try to explain the lines here
If the requested url is something like http://yoursite.com/blabla/ we try to open an index.html file in that directory. If the file is not there, another cycle of rewriting will happen and the last Cond will be hit (where the file does not exist)
RewriteRule ^$ index.html [QSA]
If something more is in the url, like http://yoursite.com/blabla/blblbl, try to find a file blblbl.html
RewriteRule ^([^.]+)$ $1.html [QSA]
This is the collector of all urls that did not match any of the previous rules or the cached file did not exist:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php [QSA,L]