.htaccess RewriteRule behavior with existing subdirectories - apache

I've been through many of similar questions but I couldn't find this particular case:
Having this structure:
public_html/
q/
.htaccess
index.php
/dirnofixedname1
/dirnofixedname2
/dirnofixedname3
dirnofixednameN are folders that have files to be used by index.php and not to be directly accessible (called like that as I may not enumerate all in the .htaccess file or it would be impractical)
index.php should process incoming requests
The intention is to process requests like:
http://domain/q/dirnofixedname2 with http://domain/q/index.php?q=dirnofixedname2 while still showing http://domain/q/dirnofixedname2. A popular and already solved case indeed.
So that .htaccess file is:
RewriteEngine On
RewriteCond $1 !^(index\.php)
RewriteRule ^(.*)$ index.php?q=$1 [L]
The problem happens that when the request matches those existing directories (thing I want), it works as intended (index.php executes and gets q) but making a redirect to:
http://domain/q/dirnofixedname2/?q=dirnofixedname2
(and showing that in the URL bar), instead of the intended:
http://domain/q/dirnofixedname2
Particularly, if the directory happens to not exist,
http://domain/q/dirthatdoesnotexist
Gets processed correctly by index.php with q as dirthatdoesnotexist (the script obviously dismisses that and returns nothing).
Do you have any ideas about how to avoid that redirect in cases where the subdir exists? (It's practical to have the same dir name as the parameter)

This is happening due to DirectorySlash directive which is by default in On state since you are requesting an actual directory in your URI.
You can turn it off by using:
DirectorySlash Off
Also to mask directory listing use:
Options -Indexes
at top of your .htaccess

Related

Proper .htaccess config for Next.js SSG

NextJS exports a static site with the following structure:
|-- index.html
|-- article.html
|-- tag.html
|-- article
| |-- somearticle.html
| \-- anotherarticle.html
\-- tag
|-- tag1.html
\-- tag2.html
I'm using an .htaccess file to hide the .html extensions:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
Everything works flawlessly, EXCEPT:
If I follow a link to domain/article it displays the article.html page, but my address bar shows domain/article <--Good.
If I refresh, I get sent to address: domain/article/ (note trailing slash) which lists the contents of the article directory <--Bad (same thing with Tag)
Similarly, manually typing in domain/article takes me to domain/article/ instead of showing article.html without the .html extension.
So...
How do I fix this?
Is this an .htaccess issue?
A nextjs config issue?
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
exportTrailingSlash
I tried playing around with exportTrailingSlash which seems related, but this created other problems like always having a trailing slash at the end of all my links:
Eg: if I go to domain/article/somearticle and hit refresh, something (.httaccess?) is adding a / to the end to give me domain/article/somearticle/ not horrible, just not very clean and inconsistent...
Edit: Actually, it's a little more horrible, because sometimes we get a trailing slash, sometimes we don't on the nextjs links... must be something about how I'm using <Link /> but I can't figure that out.
Regardless, NONE of the .htaccess rules I've tried successfully remove the trailing slash all the time every time...
More details:
In my next app, I have folder:
/articles/
[slug].js
index.js
In various pages, I use nextJS Link component:
import Link from 'next/link';
<Link href="/articles" as="/articles">
<a>Articles</a>
</Link>
If you request /article and /article exists as a physical directory then Apache's mod_dir, will (by default) append the trailing slash in order to "fix" the URL. This is achieved with a 301 permanent redirect - so it will be cached by the browser.
Although having a physical directory with the same basename as a file and using extensionless URLs creates an ambiguity. eg. Is /article supposed to access the directory /article/ or the file /article.html. You don't seem to want to allow direct access to directories anyway, so that would seem to resolve that ambiguity.
To prevent Apache mod_dir appending the trailing slash to directories we need to disable the DirectorySlash. For example:
DirectorySlash Off
But as mentioned, if you have previously visited /article then the redirect to /article/ will have been cached by the browser - so you'll need to clear the browser cache before this will be effective.
Since you are removing the file extension you also need to ensure that MultiViews is disabled, otherwise, mod_negotiation will issue an internal subrequest for the underlying file, and potentially conflict with mod_rewrite. MultiViews is disabled by default, although some shared hosts do enable it for some reason. From the output you are getting it doesn't look like MultiViews is enabled, but better to be sure...
# Ensure that MutliViews is disabled
Options -MultiViews
However, if you need to be able to access the directory itself then you will need to manually append the trailing slash with an internal rewrite. Although this does not seem to be a requirement here. You should, however, ensure that directory listings are disabled:
# Disable directory listings
Options -Indexes
Attempting to access any directory (that does not ultimately map to a file - see below) and does not contain a DirectoryIndex document will return a 403 Forbidden response, instead of a directory listing.
Note that the only difference that could occur between following a link to domain/article, refreshing the page and manually typing domain/article is caching... either by the browser or any intermediary proxy caches. (Unless you have JavaScript that intercepts the click event on the anchor?!)
You do still need to rewrite requests from /foo to /foo.html OR /foo to /foo/index.html (see below), depending on how you have configured your site. Although it would be preferable that you choose one or the other, rather than both (as you seem to imply could be the case).
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
It is unclear how this is seemingly "working" for you currently - unless you are seeing a cached response? When you request /article, the first condition fails because this exists as a physical directory and the rule is not processed. Even with MultiViews enabled, mod_dir will take priority and append the trailing slash.
The second condition that checks the existence of the .html file isn't necessarily checking the same file that is being rewritten to. eg. If you request /foo/bar, where /foo.html exists, but there is no physical directory /foo then the RewriteCond directive checks for the existence of /foo.html - which is successful, but the request is internally rewritten to /foo/bar.html (from the captured RewriteRule pattern) - this results in an internal rewrite loop and a 500 error response being returned to the client. See my answer to the following ServerFault question that goes into more detail behind what is actually happening here.
We can also make a further optimisation if we assume that any URL that contains what looks like a file extension (eg. your static resources .css, .js and image files) should be ignored, otherwise we are performing filesystem checks on every request, which is relatively expensive.
So, in order to map (internally rewrite) requests of the form /article to /article.html and /article/somearticle to /article/somearticle.html you would need to modify the above rule to read something like:
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
There is no need to backslash escape a literal dot in the RewriteCond TestString - the dot carries no special meaning here; it's not a regex.
Then, to handle requests of the form /foo that should map to /foo/index.html you can do something like the following:
# Rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
Ordinarily, you would allow mod_dir to serve the DirectoryIndex (eg. index.html), but having omitted the trailing slash from the directory, this can be problematic.
Summary
Bringing the above points together, we have:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
This could be further optimised, depending on your site structure and whether you are adding any more directives to the .htaccess file. For example:
you could check for file extensions on the requested URL at the top of the file to prevent any further processing. The RewriteRule regex on each subsequent rule could then be "simplified".
Requests that include a trailing slash could be blocked or redirected (to remove the trailing slash).
If the request is for a .html file then redirect to the extensionless URL. This is made slightly more complicated if you are dealing with both /foo.html and /foo/index.html. But this is only really necessary if you are changing an existing URL structure.
For example, implementing #1 and #2 above, would enable the directives to be written like so:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Prevent any further processing if the URL already ends with a file extension
RewriteRule \.\w{2,4}$ - [L]
# Redirect any requests to remove a trailing slash
RewriteRule (.*)/$ /$1 [R=301,L]
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.*) $1.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1/index.html -f
RewriteRule (.*) $1/index.html [L]
Always test with a 302 (temporary) redirect before changing to a 301 (permanent) redirect in order to avoid caching issues.
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
Yes! And Next can do that for you:
It is possible to configure Next.js to export pages as index.html
files and require trailing slashes, /about becomes /about/index.html
and is routable via /about/. This was the default behavior prior to
Next.js 9.
To switch back and add a trailing slash, open next.config.js and
enable the exportTrailingSlash config:
module.exports = { exportTrailingSlash: true, }

htaccess - rewrite rule not working when requested URL is a folder on my system

All requests to my site should be rewritten to index.php?page=blah, where blah is the page that's requested (except for css, js, jp(e)g, gif and png files).
This is how my .htaccess file looks like:
RewriteEngine On
RewriteCond %{REQUEST_URI} !\.(?:css|js|jpe?g|gif|png)$ [NC]
RewriteRule ^(.*)$ index.php?page=$1 [L,QSA]
The .htaccess is in this directory: localhost:8080/example/, so when I go to localhost:8080/example/abc, it is (internally) rewritten to localhost:8080/example/index.php?page=abc.
However when I go to localhost:8080/example/res, I get redirected to localhost:8080/example/res/?page=res. I found out that this only happens to directories; when I go to localhost:8080/example/core(also a folder on my file system), I get redirected to localhost:8080/example/core/?page=core while it should be internally rewritten to localhost:8080/example/index.php?page=core and the url visible to the user should stay localhost:8080/example/core/
EDIT:
Thanks to #w3dk, who solved the problem stated above. But I found another problem, which may be related to the problem above:
When I go to:
localhost:8080/example/index/a, it's internally rewritten to localhost:8080/example/index.php?page=index.php/a, while it should be rewritten to localhost:8080/example/index.php?page=index/a.
I found out that this happens when index is a file, cause when I go to localhost:8080/example/exampleFile/abc, it's redirected to localhost:8080/example/index.php?page=exampleFile.php/abc, which shouldn't be the case.
The 2 files in my directory are:
index.php (everything should be directed to this file)
example.php
Apache seems to ignore the php file extension, cause this also works for exampleFile.txt
This is probably happening because of a conflict with mod_dir. The default behaviour (DirectorySlash On) is for mod_dir to automatically "fix" the URL when you request a physical directory without a trailing slash. It does this with an external 301 redirect, before your rule is processed. Your rule then fires, which modifies the target URL, a Location header gets returned to the client and the browser redirects.
This won't happen if you include the trailing slash on the original request. eg. localhost:8080/example/core/. mod_dir then does not need to "fix" the URL and issue a redirect. Although this may not be desirable for you?
Since you are wanting to internally rewrite all directories then the simple fix is to disable this behaviour in .htaccess:
DirectorySlash Off
You will need to clear your browser cache before testing, as the earlier 301s by mod_dir will have been cached locally.
Reference (note the security warning):
https://httpd.apache.org/docs/current/mod/mod_dir.html#directoryslash
You can use this
.htaccess file
Note: The directory folder1 must be unique in the URL. It won't work for http://domain.com/folder1/folder1.html. The directory folder1 must exist and have content in it.
RewriteEngine On
RewriteCond %{HTTP_HOST} domain.com$ [NC]
RewriteCond %{HTTP_HOST} !folder1
RewriteRule ^(.*)$ http://domain.com/folder1/$1 [R=301,L]

.htaccess - Redirect all to index.php for root folder or subfolder

I need an .htaccess file that will work whether it is put in the root folder or a subfolder without modification. The script below is the normal one that I've been trying to adapt without success. I tried the solution on htaccess rewrite index.php on root and subfolders and couldn't get it to work.
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]
</IfModule>
Layout
.htaccess
index.php
subfolder1
- .htaccess
- index.php
The route /blah should go to /index.php and /subfolder1/whatever should go to /subfolder1/index.php. Currently, the above script will send /subfolder1/whatever to /index.php.
[Update]
This should also work for any path under subfolder1, like /subfolder1/1/2/3/idunno.
If you are using Apache 2.2.16 and later, you can just stop using mod_rewrite, which although extremely useful and powerful, can get messy as hell.
A new directive in mod_dir was introduced, FallbackResource which does just that, redirecting to the uri of your choice if there is no hit on the file system. It is available in .htaccess files as long as AllowOverride Indexes is specified for the directories in the configuration.
As .htaccess files are evaluated depth-first, you just have to have each .htaccess file describe your fallback resource in the current directory, and the one in the subdirectory subfolder1 will take precedence:
subfolder1/.htaccess:
FallbackResource index.php
.htaccess:
FallbackResource index.php
They're both the same, and work just right.
It seems this directive is not well known yet even though it's been around for a few years, and its goal is precisely to solve that problem in an elegant way.
There is only one limitation with that setup. Calling urls in non-existing sub-directories of the root dir or subfolder1 will yield subrequest recursion and subsequently an error 500, because the fallback resource is local to the given directory.
The best approach is to have absolute uris (beginning with '/') as parameter to FallbackResource, which is why it is true that the requirement in itself is kind of odd, and is probably not playing too well with the inner workings of Apache.

Apache does not rewrite request if file on path exists

I'm doing a rewrite with mod_rewrite on every request that does not match an existing file or directory. This is my configuration:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^.*$ /index.php [NC,L]
This is used to map URLs like /abc/foo or /abc/foo/10 to my app. And it works just fine.
To improve the performance, my app now stores the results of a call to /abc/foo in a file foo in the corresponding directory /abc - so that after the first call the rewrite conditions do no longer apply (file does not exist) and apache directly serves the data without first invoking the app. Works fine as well.
The problem is: Requesting /abc/foo/10 does now no longer cause the URL to get rewritten, instead I get an error "404 File Not Found". The log entries state that the rewrite condition !-f is no longer true, but actually the file /abc/foo/10 does not exist. /abc/foo exists, but is a file, not a directory.
How can I get this to work?
(MultiViews is disabled)
This is because foo exists as a file and apache serves foo with the additional /10 passed as a query string. So, your application should write some additional code to the foo file, that also checks if a request includes some additional url component and then handle creation of the directory "foo" and the file 10.
You must be in per-dir/htaccess context w/ AcceptPathInfo on.
Therefore REQUEST_FILENAME matched the part that existed, and is not the same as REQUEST_URI.
Use the REQUEST_URI var if you don't care where the request was previously mapped in your rewritecond.
In per-vh context, these vars are always the same.
Project design is a little bit wrong - others already pointed out that it's not scallable - how could You cache a request to /abc/foo/10 if there is already a /abc/foo file?
Answer to that and to Your problem is usage of subfolders instead of files.
So instead of cache structure of:
/abc/foo
/abc/bar
...?
use:
/abc/index.html
/abc/foo/index.html
/abc/bar/index.html
/abc/foo/10/index.html
and each time create new directory with index.html
This time Apache would find out that there is /abc/foo folder but no /abc/foo/10 file in it, so RewriteCond will apply.
edit
You could also try a different way - to modify url with mod_rewrite, changing urls:
/abc/foo
/abc/bar
/abc/foo/10
to something like:
/cache/abc~foo
/cache/abc~bar
/cache/abc~foo~10
htaccess rules (roughly):
# redirecting to cache folder and removing last '/'
RewriteCond %{REQUEST_URI} ^/(abc|cde)
RewriteRule ^(.*?)/?$ /cache/$1 [L]
# recursive replacing '/' with '~'
RewriteCond %{REQUEST_URI} ^/cache/.*/
RewriteRule cache/(.*)/(.*)$ /cache/$1~$2 [L]
Your standard htaccess rules should follow

How to fwd urls to existing paths AND one more path with apache's mod_rewrite?

My current .htaccess looks like this:
RewriteEngine On
# RewriteCond %{REQUEST_URI} !^/_project
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
# RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule ^.*$ index.php [QSA,L]
The uncommented lines are pretty straightforward:
The two Conds make sure the Rule isn't applied to existing files (!-f) and folders (!-d).
The Rule sends everyting else to index.php
The uncommented lines I took from somewhere. I believe it's the best way to do what I require: 'pretty urls'.
Basically it works. Existing files (e.g. /css/general.css) are requestable and non-existing paths (e.g. /admin/login) are routed to index.php. Existing and non-existing paths must be able to work 'amongst eachother': /css/all.css is sometimes a buffered existing css file and sometimes (when it doesn't exist) it's handled by PHP. /css/general.css is always a file. /css/club_N.css (N is a number) is always a PHP script.
/_project/ is an existing folder with Basic HTTP Auth protection. For instance /_project/phpinfo.php works as well. In the _project folder I have created a (valid) symlink to the backups folder: /_project/backups/. Somehow the (existing) files in the backups folder can't be reached. For instance /_project/backups/today.bz2 is routed to index.php =( The same happens with either or both commented lines uncommented.
What's wrong with the htaccess config? If I remove the Rewrite stuff entirely, I get a 403 Forbidden. Probably something with the .htaccess in the _project folder (?).
PS. Obviously I can't show you the actual website. People wouldn't like it if you could download their backups =)
.htaccess files are hierarchical in scope, any such files in parent directories apply to their children.
The Basic Auth in /_project/ will apply to subdirectories unless you switch it off in those directories, as will the RewriteRule declaration. Often it is wise to add RewriteEngine off in the .htaccess of the child directory structure to stop the rules applying there, or possibly add a conditional blocking that structure on the original rule set.