Redirecting .html internally on Apache - apache

I'm wanting to redirect all requests for .html pages to a perl script for processing. However, for some reason I'm also being redirected for pages with no .html like the root site / as well as directories like /images/
RewriteCond %{REQUEST_URI} \.(html)$ [NC]
RewriteRule .* /page.cgi [L]
How can I make it so it will only redirect requests ending in .html
What am I missing?

Requests with no specified file default to requesting index.html in a particular directory, per a different Apache config option. Maybe try putting a blank index.php file in those directories (assuming your server also runs php?). You'd have to add index.php in httpd.conf as a recognized index file. That way, requests for a directory with no file specified would default to index.php, which does not trigger your re-write rule because it's not a .html.

Related

Resolve content in subdirectory but keep the path in the browser

My goal is this:
We have site that lives in a webroot which is /web.
It contains the .htaccess file but we want to serve up the content from /web/content but we do not want the URL the user sees to contain /content just the initial path they requested.
Example:
The user makes a request to a URL:
example.com/color/cool/blue
This request goes to:
/webroot/color/cool/blue (which does not exist)
The content is in
/webroot/content/color/cool/blue/index.htm
We would like the user to see example.com/color/cool/blue in the browser, but see the content from what is example.com/content/color/cool/blue/index.htm.
We also would like some directories to be directly accessed like:
example.com/exeption/foo.pdf
We are doing this as a conversion of a dynamic site to a static site so simply moving everything to the root or switching the webroot are not options.
Assumptions:
Directory file-paths do not contain dots.
In the root .htaccess file try the following:
# Disable directory listings (mod_autoindex) since "DirectorySlash Off"
Options -Indexes -MultiViews
# Prevent trailing slash being appended to directories
DirectorySlash Off
# File to serve from requested directory
DirectoryIndex index.htm
RewriteEngine On
# Remove trailing slash on any URL that is requested directly (excludes rewritten URLs)
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule (.*)/$ /$1 [R=301,L]
# Rewrite root
RewriteRule ^$ content/ [L]
# If request maps to a directory in "/content" then rewrite and append trailing slash
RewriteCond %{DOCUMENT_ROOT}/content/$1 -d
RewriteRule ^([^.]+)$ content/$1/ [L]
We also would like some directories to be directly accessed like: example/exeption/foo.pdf
You don't necessarily need to add anything in this respect. Although I'm assuming you mean "files", not "directories".

How does Apache handle index.php/some_text webpage requests? It returns http status 200 instead of expected 404

I have a website on a shared server with some very basic php pages in the public_html directory, as well as some sub-directories with other pages in:
index.php
test.php
subdir1/index.php
subdir2/index.php
Looking at my visitor logs, I'm getting visits to index.php/some_text and index.php/some_other_text and so on. Naively I would expect those to receive an http status 404 as a) there is no directory called index.php and b) no files exist called some_text and some_other_text. However Apache is returning the file index.php with an http status 200.
Is there something I can set in .htaccess that will return a 404 status in these cases, without restricting the valid subdirectories?
I found some suggestions to set "DirectorySlash Off" but that made no difference. I've also tried
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ - [R=404,L]
But that too made no difference.
Thanks.
I'm getting visits to index.php/some_text and index.php/some_other_text and so on.
The part of the URL that starts with a slash and follows a physical file is called additional pathname information (or path-info). So, /some_text (in your example) is path-info.
In this case index.php receives the request and /some-text is passed to the script via the PATH_INFO environment variable (in PHP this is available in the $_SERVER['PATH_INFO'] superglobal).
By default, whether path-info is valid on the URL is dependent on the handler responsible for the request. PHP files allow path-info by default, but .html files do not. So, by default /index.html/some-text will result in a 404.
You can disable path-info by setting AcceptPathInfo Off in your Apache config / .htaccess file. By doing this, a request for /index.php/some-text will now result in a 404.
Conversely, if you set AcceptPathInfo On then /index.html/some-text will also be permitted.
Alternatively, you can use mod_rewrite in .htaccess to explicitly trigger a 404 for such URLs. For example, to target .php files (anywhere) only:
RewriteEngine On
RewriteRule \.php/ - [R=404]
Or, just .php files in the document root:
RewriteRule ^[^/]+\.php/ - [R=404]
Or, you can explicitly check the PATH_INFO server variable to block any URL that includes path-info. For example:
RewriteCond %{PATH_INFO} .
RewriteRule . - [R=404]
Note that some frameworks use path-info to route requests in a front-controller pattern (as opposed to using a query string or parsing the requested URI directly).
Reference:
https://httpd.apache.org/docs/2.4/mod/core.html#acceptpathinfo
I found some suggestions to set "DirectorySlash Off"
That has nothing to do with this issue. Setting DirectorySlash Off prevents mod_dir from appending trailing slashes to requests for directories.
I have since tried
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/[^/]+\.php/.*$
RewriteRule ^(.*)$ - [R=404,L]
This will only then impact *.php files in the root directory, leaving any subdirectories alone. I think. It produces the behaviour I want but it doesn't feel like a good solution.

htaccess - rewrite rule not working when requested URL is a folder on my system

All requests to my site should be rewritten to index.php?page=blah, where blah is the page that's requested (except for css, js, jp(e)g, gif and png files).
This is how my .htaccess file looks like:
RewriteEngine On
RewriteCond %{REQUEST_URI} !\.(?:css|js|jpe?g|gif|png)$ [NC]
RewriteRule ^(.*)$ index.php?page=$1 [L,QSA]
The .htaccess is in this directory: localhost:8080/example/, so when I go to localhost:8080/example/abc, it is (internally) rewritten to localhost:8080/example/index.php?page=abc.
However when I go to localhost:8080/example/res, I get redirected to localhost:8080/example/res/?page=res. I found out that this only happens to directories; when I go to localhost:8080/example/core(also a folder on my file system), I get redirected to localhost:8080/example/core/?page=core while it should be internally rewritten to localhost:8080/example/index.php?page=core and the url visible to the user should stay localhost:8080/example/core/
EDIT:
Thanks to #w3dk, who solved the problem stated above. But I found another problem, which may be related to the problem above:
When I go to:
localhost:8080/example/index/a, it's internally rewritten to localhost:8080/example/index.php?page=index.php/a, while it should be rewritten to localhost:8080/example/index.php?page=index/a.
I found out that this happens when index is a file, cause when I go to localhost:8080/example/exampleFile/abc, it's redirected to localhost:8080/example/index.php?page=exampleFile.php/abc, which shouldn't be the case.
The 2 files in my directory are:
index.php (everything should be directed to this file)
example.php
Apache seems to ignore the php file extension, cause this also works for exampleFile.txt
This is probably happening because of a conflict with mod_dir. The default behaviour (DirectorySlash On) is for mod_dir to automatically "fix" the URL when you request a physical directory without a trailing slash. It does this with an external 301 redirect, before your rule is processed. Your rule then fires, which modifies the target URL, a Location header gets returned to the client and the browser redirects.
This won't happen if you include the trailing slash on the original request. eg. localhost:8080/example/core/. mod_dir then does not need to "fix" the URL and issue a redirect. Although this may not be desirable for you?
Since you are wanting to internally rewrite all directories then the simple fix is to disable this behaviour in .htaccess:
DirectorySlash Off
You will need to clear your browser cache before testing, as the earlier 301s by mod_dir will have been cached locally.
Reference (note the security warning):
https://httpd.apache.org/docs/current/mod/mod_dir.html#directoryslash
You can use this
.htaccess file
Note: The directory folder1 must be unique in the URL. It won't work for http://domain.com/folder1/folder1.html. The directory folder1 must exist and have content in it.
RewriteEngine On
RewriteCond %{HTTP_HOST} domain.com$ [NC]
RewriteCond %{HTTP_HOST} !folder1
RewriteRule ^(.*)$ http://domain.com/folder1/$1 [R=301,L]

RewriteRule for a file in a parent directory

I've got a site set up on an apache server with the desktop site in /public_html and a mobile site /in public_html/mob
I'm trying to set up an .htaccess rewriterule to send users to an index.php file in /public_html if they visit the /mob folder. My current rewrite rule, in the mob subfolder is:
RewriteRule ^(/)?$ ../index.php
I can load up the same file in the mob subdirectory with:
RewriteRule ^(/)?$ index.php
However I can't seem to get the site to load the index.php file from the parent directory (public_html).
When attempting to load http://www.domain.com/mob in a browser I receive:
Bad Request
Your browser sent a request that this server could not understand.
This same rewriterule worked fine on our development server, but doesn't work in our live environment.
The .htaccess file from the /public_html/mob folder is as follows:
Options +FollowSymLinks
RewriteEngine on
RewriteRule ^(/)?$ ../index.php [L,QSA]
When index.php is reached a mobile device detect script decides whether to load the content from the desktop or mobile site.
Check DOCUMENT_ROOT of your m.domain.com.
If DOCUMENT_ROOT for your m.domain.com is /public_html/mob then you cannot load /public_html/index.php without doing a full redirect (or proxy) to http://domain.com/index.php
Just to clarify any web site cannot access files above its DOCUMENT_ROOT folder level.
If your rule's target starts with a /, that makes it an absolute URI starting from the document root, which I assume would be the public_html folder:
RewriteRule ^(/)?$ /index.php

Redirects without .htaccess file and "No input file specified" after uploading .htaccess file

I do not have access to apache files in server. Only possible way for me to change modify possible configuration or rewrite rule is from htaccess file.
To make it simple, the website(eg website.com) has only following pages
index.php
page.php
projects.php
There is no htaccess file
When I access website.com/page/ or website.com/page . It redirects me to " page.php ".
But after I upload htacces (see below) and access website.com/page(without slash)
It displays. No input file specified.
Could there be a default setting that has caused this redirection.
the htaccess file that I uploaded later
Options +FollowSymLinks
AddDefaultCharset utf-8
RewriteEngine On
RewriteRule ^([^/]+)$ /$1/ [L]
RewriteRule ^page/([a-zA-Z0-9_-\s&().+/-:%]+)$ page.php?url=$1
I want following redirection to work
page/about-us
redirected to
page.php?id=about-us
But this is not working until I use the word other that "page" at the left side of the rule.