Apache DirectorySlash Off - Site breaks - apache

If i set DirectorySlash Off in my .htaccess file and call the directory without the trailing slash i get an 403-Forbidden from my server. If i call it with slash everything works fine.
Could anyone explain why? Here are my fully anonymized .htaccess:
# GLOBAL CONFIG
Options +FollowSymlinks
DirectorySlash Off
AddDefaultCharset utf-8
php_value post_max_size 256M
php_value upload_max_filesize 256M
# BEGIN WordPress
RewriteEngine On
RewriteBase /folder/
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /folder/index.php [L]
# END WordPress
# REMOVE WWW
RewriteCond %{HTTP_HOST} ^([^.]+)\.domain\.com$ [NC]
RewriteRule ^(.*)$ http://domain.com$1 [R=301,L]

As you know per the documentation, when DirectorySlash is set to Off, requests to /folder do not have DirectoryIndex evaluated. This means that the request will not be automatically mapped to /folder/index.php.
mod_dir performs this check in the "fixup" phase of the request processing. mod_rewrite, which is responsible for your RewriteRule definitions, also performs its processing in this phase when you specify the rules in a .htaccess file.
However, it was programmed with an awareness of modules like mod_dir, and includes a check to make sure that the current directory was requested with a trailing slash. If not, it declines to handle the request, since doing so might lead to undefined behaviour.
The request then moves on to the content-generation phase, which, since the request was not mapped to a real file, is handled by mod_autoindex. Given that Indexes are disabled on your host by default, mod_autoindex returns 403 Forbidden which is what you see.
Note that since DirectoryIndex is not evaluated, even if mod_rewrite were to process the request, it would still fail, because no auto-resolution to index.php would occur, and your rule
RewriteRule . /folder/index.php [L]
wouldn't match, because the . requires a match on something (but the request would be blank).
Enabling DirectorySlash prevents this scenario by correcting the prevented actions in all of the previously mentioned scenarios except the last note, which is taken care of by the fact that DirectoryIndex maps the request to index.php anyway.

With Apache 2.4 you can allow rewrites in .htaccess files by setting RewriteOptions AllowNoSlash.
Changes with Apache 2.3.16
...
*) mod_rewrite: Add the AllowNoSlash RewriteOption, which makes it possible
for RewriteRules to be placed in .htaccess files that match the directory
with no trailing slash. PR 48304.
[Matthew Byng-Maddick <matthew byng-maddick bbc.co.uk>]
...
See Apache documentation of mod_rewrite

I think because when you turn DirectorySlash off, it disable the autocorrection of the url and it is trying to show the directory list but fortunately you have probably disabled this somewhere (or in file permissions) so it sends a 403-Forbidden. I guess that when you turn it on, it works normally.
From what I understand from the docs, it is not very good to use DirectorySlash off for security.
http://httpd.apache.org/docs/2.1/mod/mod_dir.html

As Tom already answered, there is special option for RewriteOptions, but only for Apache 2.3.16+, so if you, like me, have an apache of the older version, then you cannot rewrite url for same directory, because apache doesn't know about this directory.
Example:
"GET /somedir" will point to <Directory /var/www/html/public> in rewrite log, but(!) requested filename (%f) in access log will still /var/www/html/public/somedir/ - this is crazy apache logic. And apache will show you either 503 (without Options +Indexes) or directory listing (otherwise) with wrong urls such as /subdir/ instead of /somedir/subdir/
So, I've found only one worked solution for me - using aliases:
AliasMatch "/somedir$" "/var/www/html/public/somedir/index.html"
Hope this helps someone else in 2020+ :D

Related

htaccess shows 404 error rather than finding index.php

I'm having some issues with a shady URL rewriting
I want to turn http://localhost:81/es/index.php into http://localhost:81/index.php?lengua=es with my .htaccess in order to help the page SEO
This is my current .htaccess
<FilesMatch ".*\.(log|ini|htaccess)$">
deny from all
</FilesMatch>
Options -Indexes
RewriteEngine On
RewriteBase /
FallbackResource "index.php"
RewriteRule ^(en|es|pt)?/?(.*)?$ $2?idioma=$1 [QSA,L]
I have checked that they work with htaccess tester and they're working as expected but when I browse the page it shows a "File not found." error (I do have a index.php, I do not have a es/index.php)
Since my output URL is http://localhost:81/index.php?lengua=es I don't understand why is it not working
I would suggest breaking this in four rewrite rules:
RewriteEngine on
# Redirect to add the trailing slash to language directory
# http://example.com/es > http://example.com/es/
RewriteRule ^/?(en|es|pt)$ /$1/ [R=301,L]
# Redirect to remove `index.php`
# http://example.com/es/index.php > http://example.com/es/
RewriteRule ^/?(en|es|pt)/index\.php$ /$1/ [R=301,L]
# Handle requests for the base language directory
# http://example.com/es/ > http://example.com/index.php?idioma=es
RewriteRule ^/?(en|es|pt)/$ /index.php?idioma=$1 [QSA,L]
# Handle requests for php files within the language directory
# http://example.com/es/foo.php > http://example.com/foo.php?idioma=es
RewriteRule ^/?(en|es|pt)/(.+\.php)$ /$2?idioma=$1 [QSA,L]
I would remove RewriteBase / because I believe that is the default in the root .htaccess file anyway.
I would remove FallbackResource "index.php" because you shouldn't need it based on the examples you have provided. If you keep it, the examples in the documentation show it starting with a slash: FallbackResource /index.php. You should also test without it because it has the potential to conflict with the rewrite rules.
I always like to start rewrite rules with an optional slash ^/? (rather than just ^) so that they can be used both in .htaccess and in httpd.conf without modifications.
The rule in the question makes everything optional including the language code. Rather than (en|es|pt)? my rules use (en|es|pt) so that they don't match if the language code isn't in the URL.
Rather than make the slash after the language directory optional, my rules do different things when it is present and when it is not.
In your rule (.*)? is exactly equivalent to the simpler (.*). I changed it to (.*\.php) so that it only matches PHP files.

How to avoid the need of typing .php on the url?

I'm on MacOs Big Sur, using Apache and PHP. What I want is: not needing to put .php on the end of my files to load it.
For instance, instead of typing this on the URL:
127.0.0.1/public_html/home.php
I want just to type
127.0.0.1/public_html/home
To achieve this, I'm using this code in .htaccess:
RewriteEngine On
Options -Indexes
DirectoryIndex home.php index.php
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.+)$ $1.php [L]
The code above works on my hosting, but for some reason, it does not work on my development machine. Instead, a get a 404 error.
The .htaccess file with the code is on the root of public_html folder.
What am I missing?
By typing some "nonsense" at the top of the .htaccess file and not getting an error (ordinarily you would get a 500 Internal Server Error) it would seem that .htaccess overrides were not enabled on the server. So, .htaccess files were effectively disabled - which they are by default on Apache 2.4.
To enable .htaccess overrides (to allow .htaccess to override the server config) you need to set the AllowOverride directive in the appropriate <Directory> container in the server config (or <VirtualHost> container). The default on Apache 2.4 is AllowOverride None.
With the directives as posted you would need a minimum of:
AllowOverride FileInfo Indexes Options
FileInfo for mod_rewrite, Indexes for DirectoryIndex and Options for Options and related directives.
Although it is common (and easier) to just set:
AllowOverride All
Reference:
https://httpd.apache.org/docs/2.4/mod/core.html#allowoverride
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.+)$ $1.php [L]
These directives are not strictly correct. Whilst they may work OK for the URLs you are testing, they would result in a rewrite-loop (500 error response) if you simply append a slash to your URLs (and there is no directory by that name), eg. /home/ (or /home/<anything>). This is because your condition that tests for the presence of the .php file is not necessarily the same as the URL-path you are rewriting to. See my answer to the following question on ServerFault for a thorough explanation of this issue: https://serverfault.com/questions/989333/using-apache-rewrite-rules-in-htaccess-to-remove-html-causing-a-500-error
Also, there's no need to check that the request does not map to a directory to then check if the request + .php extension maps to a file. If the request maps to a file then it can not also be a directory, so if the 2nd condition is true, the 1st condition must also be true and is therefore superfluous.
And there's no need to backslash-escape literal dots in the RewriteCond TestString - this is an "ordinary" string, not a regex.
So, these directives should be written like this instead:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.php -f
RewriteRule (.+) $1.php [L]
(RewriteBase should not be used here.)
You can further optimise this by excluding requests that already contain what looks like a file extension (assuming your URLs that need rewriting do not contain a dot near the end of the URL-path). For example:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.php -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.php [L]
(With this 2nd version, it does not matter if RewriteBase is set - it is not used.)
DirectoryIndex home.php index.php
You gave an example URL of /public_html/home (to which .php is appended). However, this DirectoryIndex directive allows home.php to also be served when simply requesting the directory /public_html/. It should be one or the other, not both.

Proper .htaccess config for Next.js SSG

NextJS exports a static site with the following structure:
|-- index.html
|-- article.html
|-- tag.html
|-- article
| |-- somearticle.html
| \-- anotherarticle.html
\-- tag
|-- tag1.html
\-- tag2.html
I'm using an .htaccess file to hide the .html extensions:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
Everything works flawlessly, EXCEPT:
If I follow a link to domain/article it displays the article.html page, but my address bar shows domain/article <--Good.
If I refresh, I get sent to address: domain/article/ (note trailing slash) which lists the contents of the article directory <--Bad (same thing with Tag)
Similarly, manually typing in domain/article takes me to domain/article/ instead of showing article.html without the .html extension.
So...
How do I fix this?
Is this an .htaccess issue?
A nextjs config issue?
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
exportTrailingSlash
I tried playing around with exportTrailingSlash which seems related, but this created other problems like always having a trailing slash at the end of all my links:
Eg: if I go to domain/article/somearticle and hit refresh, something (.httaccess?) is adding a / to the end to give me domain/article/somearticle/ not horrible, just not very clean and inconsistent...
Edit: Actually, it's a little more horrible, because sometimes we get a trailing slash, sometimes we don't on the nextjs links... must be something about how I'm using <Link /> but I can't figure that out.
Regardless, NONE of the .htaccess rules I've tried successfully remove the trailing slash all the time every time...
More details:
In my next app, I have folder:
/articles/
[slug].js
index.js
In various pages, I use nextJS Link component:
import Link from 'next/link';
<Link href="/articles" as="/articles">
<a>Articles</a>
</Link>
If you request /article and /article exists as a physical directory then Apache's mod_dir, will (by default) append the trailing slash in order to "fix" the URL. This is achieved with a 301 permanent redirect - so it will be cached by the browser.
Although having a physical directory with the same basename as a file and using extensionless URLs creates an ambiguity. eg. Is /article supposed to access the directory /article/ or the file /article.html. You don't seem to want to allow direct access to directories anyway, so that would seem to resolve that ambiguity.
To prevent Apache mod_dir appending the trailing slash to directories we need to disable the DirectorySlash. For example:
DirectorySlash Off
But as mentioned, if you have previously visited /article then the redirect to /article/ will have been cached by the browser - so you'll need to clear the browser cache before this will be effective.
Since you are removing the file extension you also need to ensure that MultiViews is disabled, otherwise, mod_negotiation will issue an internal subrequest for the underlying file, and potentially conflict with mod_rewrite. MultiViews is disabled by default, although some shared hosts do enable it for some reason. From the output you are getting it doesn't look like MultiViews is enabled, but better to be sure...
# Ensure that MutliViews is disabled
Options -MultiViews
However, if you need to be able to access the directory itself then you will need to manually append the trailing slash with an internal rewrite. Although this does not seem to be a requirement here. You should, however, ensure that directory listings are disabled:
# Disable directory listings
Options -Indexes
Attempting to access any directory (that does not ultimately map to a file - see below) and does not contain a DirectoryIndex document will return a 403 Forbidden response, instead of a directory listing.
Note that the only difference that could occur between following a link to domain/article, refreshing the page and manually typing domain/article is caching... either by the browser or any intermediary proxy caches. (Unless you have JavaScript that intercepts the click event on the anchor?!)
You do still need to rewrite requests from /foo to /foo.html OR /foo to /foo/index.html (see below), depending on how you have configured your site. Although it would be preferable that you choose one or the other, rather than both (as you seem to imply could be the case).
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
It is unclear how this is seemingly "working" for you currently - unless you are seeing a cached response? When you request /article, the first condition fails because this exists as a physical directory and the rule is not processed. Even with MultiViews enabled, mod_dir will take priority and append the trailing slash.
The second condition that checks the existence of the .html file isn't necessarily checking the same file that is being rewritten to. eg. If you request /foo/bar, where /foo.html exists, but there is no physical directory /foo then the RewriteCond directive checks for the existence of /foo.html - which is successful, but the request is internally rewritten to /foo/bar.html (from the captured RewriteRule pattern) - this results in an internal rewrite loop and a 500 error response being returned to the client. See my answer to the following ServerFault question that goes into more detail behind what is actually happening here.
We can also make a further optimisation if we assume that any URL that contains what looks like a file extension (eg. your static resources .css, .js and image files) should be ignored, otherwise we are performing filesystem checks on every request, which is relatively expensive.
So, in order to map (internally rewrite) requests of the form /article to /article.html and /article/somearticle to /article/somearticle.html you would need to modify the above rule to read something like:
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
There is no need to backslash escape a literal dot in the RewriteCond TestString - the dot carries no special meaning here; it's not a regex.
Then, to handle requests of the form /foo that should map to /foo/index.html you can do something like the following:
# Rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
Ordinarily, you would allow mod_dir to serve the DirectoryIndex (eg. index.html), but having omitted the trailing slash from the directory, this can be problematic.
Summary
Bringing the above points together, we have:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
This could be further optimised, depending on your site structure and whether you are adding any more directives to the .htaccess file. For example:
you could check for file extensions on the requested URL at the top of the file to prevent any further processing. The RewriteRule regex on each subsequent rule could then be "simplified".
Requests that include a trailing slash could be blocked or redirected (to remove the trailing slash).
If the request is for a .html file then redirect to the extensionless URL. This is made slightly more complicated if you are dealing with both /foo.html and /foo/index.html. But this is only really necessary if you are changing an existing URL structure.
For example, implementing #1 and #2 above, would enable the directives to be written like so:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Prevent any further processing if the URL already ends with a file extension
RewriteRule \.\w{2,4}$ - [L]
# Redirect any requests to remove a trailing slash
RewriteRule (.*)/$ /$1 [R=301,L]
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.*) $1.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1/index.html -f
RewriteRule (.*) $1/index.html [L]
Always test with a 302 (temporary) redirect before changing to a 301 (permanent) redirect in order to avoid caching issues.
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
Yes! And Next can do that for you:
It is possible to configure Next.js to export pages as index.html
files and require trailing slashes, /about becomes /about/index.html
and is routable via /about/. This was the default behavior prior to
Next.js 9.
To switch back and add a trailing slash, open next.config.js and
enable the exportTrailingSlash config:
module.exports = { exportTrailingSlash: true, }

Why does mod_rewrite ignore my [L] flag?

This is my .htaccess file. It should deliver static files from assets folder if the url matches them. Otherwise, everything should be redirected to index.php.
Note that the url doesn't contain assets as segemnt here. So example.com/css/style.css directs to assets/css/style.css.
RewriteEngine on
# disable directory browsing
Options -Indexes
# static assets
RewriteCond %{DOCUMENT_ROOT}/assets/$1 -f
RewriteRule ^(.*)$ assets/$1 [L]
# other requests to index.php
RewriteRule !^asset/ index.php [L]
Unfortunately, urls like example.com/assets/css/style.css also deliver the file, since for that url none of my rules applies and Apache's default behavior is applied which delivers the file.
So I tried changing the last line to this. I thought that this would work since the [L] flag in the rule above should stop execution for asset urls and deliver them.
RewriteRule ^(.*)$ index.php [L]
Instead, not all requests are redirected to index.php, even static assets like example.com/css/style.css. Why does the flag not stop execution of rewrite rules and who to fix my problem then?
I found the solution on the pages of the official documentation.
If you are using RewriteRule in either .htaccess files or in
sections, it is important to have some understanding of
how the rules are processed. The simplified form of this is that once
the rules have been processed, the rewritten request is handed back to
the URL parsing engine to do what it may with it. It is possible that
as the rewritten request is handled, the .htaccess file or
section may be encountered again, and thus the ruleset may be run
again from the start. Most commonly this will happen if one of the
rules causes a redirect - either internal or external - causing the
request process to start over.
It is therefore important, if you are using RewriteRule directives in
one of these contexts, that you take explicit steps to avoid rules
looping, and not count solely on the [L] flag to terminate execution
of a series of rules, as shown below.
An alternative flag, [END], can be used to terminate not only the
current round of rewrite processing but prevent any subsequent rewrite
processing from occurring in per-directory (htaccess) context. This
does not apply to new requests resulting from external redirects.
To fix my problem, I changed to [L] flags to [END].
RewriteEngine on
# disable directory browsing
Options -Indexes
# static assets
RewriteCond %{DOCUMENT_ROOT}/assets/$1 -f
RewriteRule ^(.*)$ assets/$1 [END]
# other requests to index.php
RewriteRule !^asset/ index.php [END]

mod-rewrite question: /test/method is rewritten to test.svg/method

I noticed an odd (to me) mod_rewrite thing happening. Fixing it is not important to me so much as figuring out what's going on. Basically, I have an svg file called test.svg in my document root, as well as an index.php. My expectation, based on my .htaccess file is that visiting http://localhost/test.svg would get me the .svg file (and it does), while visiting http://localhost/test/action would be rewritten to index.php/test/action. Instead, the latter is apparently rewritten to test.svg/action, as I receive the message
The requested URL /test.svg/action was not found on this server.
Here is my .htaccess file:
# Turn on URL rewriting
RewriteEngine On
# Protect application and system files from being viewed
# RewriteRule ^(application|modules|system) - [F,L]
# Allow any files or directories that exist to be displayed directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Rewrite all other URLs to index.php/URL
RewriteRule .* index.php/$0 [PT,L]
I am using a Apache 2.2.12 on Ubuntu (installed via apt-get). I think my setup is fairly standard, but I'm not sure exactly what directives or config files would be relevant. I am by no means a sysadmin of any kind, I just use this server to test and develop things locally.
As I said, fixing this issue would be trivial, I just am often confounded by mod_rewrite and would like to understand what's going on here.
Apache's HTTP content negotiation feature is automatically translating from "/test" to "/test.svg". See http://httpd.apache.org/docs/2.0/content-negotiation.html#multiviews
You can disable content-negotiation in .htaccess with the directive:
Options -MultiViews
You can get more information about what mod_rewrite is doing by adding these directives to your Apache configuration (they won't work in .htaccess):
RewriteLog /path/to/rewrite.log
RewriteLogLevel 3
The RewriteLogLevel can be any number from 0 (disabled) to 9 (extremely verbose). 3 should give you enough to see what's going on, but don't use that on a production server.