Apache appends trailing slash to my rewrite rule - apache

I have clean path with the same name as existing directory.
I use these .htaccess rules to support clean path for the path:
RewriteCond ${REQUEST_URI} ^/mydir
RewriteCond ${REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]
everything works correctly (I have a "mydir" clean path working and I can access existing files in the /mydir directory directly), but apache appends the trailing slash all the time to requests.
I request http://domain.com/mydir, and it redirects me 301 to http://domain.com/mydir/.
What is the reason?

Trailing slash after /mydir/ is added by an Apache module called mod_dir that adds a trailing slash after all the directories. This is due to this setting turned on by default:
DirectorySlash On
You can turn it off using:
DirectorySlash Off
However it might expose some directories by showing their listings.
Security Warning
Turning off the trailing slash redirect may result in an information
disclosure. Consider a situation where mod_autoindex is active
(Options +Indexes) and DirectoryIndex is set to a valid resource (say,
index.html) and there's no other special handler defined for that URL.
In this case a request with a trailing slash would show the index.html
file. But a request without trailing slash would list the directory
contents.

Apache's proper URL always ends in a slash /. Because it treats URL's as if they were a disk file path (which always ends in a slash). If it's not there, the server needs to take one additional step to internally add it. I say let it be.
Plus Google (supposedly) likes the trailing slashes.
I say keep it as is.
Please read more: http://cdivilly.wordpress.com/2014/03/11/why-trailing-slashes-on-uris-are-important/
and here: http://bit.ly/1uSvbfy :)

Related

rewrite request for /folder to folder/index.php without 301 redirect with apache

So I put an index.php in /pipe/index.php
I'd like to rewrite (internal, not redirect)
https://host/pipe?token=abc to https://host/pipe/index.php?token=abc
what I tried (caveat, assumes there is always a ? in the url):
RewriteEngine on
RewriteRule "^([^?]*)(.*)$" "$1/$2" [PT]
my hope was to split at the ? and just insert a / there.
But it seems apache finds out that "oh, pipe is a folder" before checking my .htacces (?) Because despite my [PT] it still redirects with 301 to /pipe/?token=abc, when I hoped for internal rewrite.
But it seems apache finds out that "oh, pipe is a folder" before checking my .htacces (?)
Yes, mod_dir will append the trailing slash with a 301 redirect. Although this occurs after mod_rewrite has processed the URL (if indeed it is being processed at all - see below). (The PT flag is irrelevant in .htaccess, since the resulting rewrite is passed through as a URL-path by default.)
RewriteRule "^([^?]*)(.*)$" "$1/$2" [PT]
However, your existing rule (by itself) would result in a rewrite-loop (500 Internal Server Error) since it matches itself and repeatedly appends a slash. If you are seeing a 301 redirect as mentioned above then either this rule is not doing anything (are .htaccess overrides enabled?) or you have a conflict with other rules.
As you've stated, this rule also assumes that the query string (with leading ?) is also matched by the RewriteRule pattern. The RewriteRule directive matches against the URL-path only, not the query string. $2 in the above rule is therefore always empty (unless you have %3F in the URL-path, ie. a %-encoded ?).
The query string is contained in its own variable, QUERY_STRING. But you simply want to pass through the same query string, so you don't need to do anything special here, since that happens by default.
Solution
To prevent mod_dir appending the trailing slash, you need to set DirectorySlash Off at the top of the root .htaccess file.
Note that these directives must go in the .htaccess file in the root/parent directory, as opposed to the subdirectory that has the trailing slash omitted. This is because the mod_rewrite directives (that "fix" the URL by appending the trailing slash) would never actually be processed in the subdirectory .htaccess file. The trailing slash would seem to be required for mod_rewrite to function. (However, the mod_dir DirectorySlash Off directive would still be processed successfully, so the slash would not be appended.)
For example:
# Prevent mod_dir appending the trailing slash
DirectorySlash Off
# Must disable directory listings when "DirectorySlash Off" is set
Options -Indexes
However, you need to then manually append the trailing slash to any directory, where it is omitted, with an internal rewrite to "fix" the URL (and to correctly serve the DirectoryIndex document, ie. index.php).
# Ensure DirectoryIndex is set correctly
DirectoryIndex index.php
RewriteEngine On
# Append trailing slash to any directory where it has been omitted
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [L]
The trailing slash on the directory (via the internal rewrite) is required in order to serve the DirectoryIndex document, otherwise, you get a 403 Forbidden, even if the DirectoryIndex document is present.
If the trailing slash is omitted and directory listings (mod_autoindex) are enabled (disabled above) then a directory listing would be generated even if a DirectoryIndex document is present in that directory. (Which is why directory listings must be disabled when DirectorySlash Off is set.)
NB: You will need to make sure the browser cache is cleared since the earlier 301 redirect by mod_dir to append the trailing slash will have been cached by the browser.
This probably is what you are looking for:
RewriteEngine on
RewriteRule ^/?pipe/?$ /pipe/index.php [QSA,L]
The QSA flag is actually redundant here, it is the default, but it makes things clearer if you compare it to that variant (both work):
RewriteEngine on
RewriteRule ^/?pipe/?$ /pipe/index.php?%{QUERY_STRING} [QSD,L]
The documentation of the rewriting module, more specific of the RewriteRule directive clearly points out that the query string is not part of the path the rule's pattern is matched against.
If you want to have more control about the content of the query string you can use a RewriteCond:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^token=(.*)$
RewriteRule ^/?pipe/?$ /pipe/index.php?token=%1 [QSD,L]
Also you might want to redirect the original URL:
RewriteEngine on
RewriteRule ^/?pipe/index.php /pipe [QSA,R=301,END]
RewriteRule ^/?pipe/?$ /pipe/index.php [QSA,L]
And finally you might also want to take a look at the DirectoryIndex directive which might offer a solution without any rewriting at all, though this depends a bit on your setup ...

Proper .htaccess config for Next.js SSG

NextJS exports a static site with the following structure:
|-- index.html
|-- article.html
|-- tag.html
|-- article
| |-- somearticle.html
| \-- anotherarticle.html
\-- tag
|-- tag1.html
\-- tag2.html
I'm using an .htaccess file to hide the .html extensions:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
Everything works flawlessly, EXCEPT:
If I follow a link to domain/article it displays the article.html page, but my address bar shows domain/article <--Good.
If I refresh, I get sent to address: domain/article/ (note trailing slash) which lists the contents of the article directory <--Bad (same thing with Tag)
Similarly, manually typing in domain/article takes me to domain/article/ instead of showing article.html without the .html extension.
So...
How do I fix this?
Is this an .htaccess issue?
A nextjs config issue?
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
exportTrailingSlash
I tried playing around with exportTrailingSlash which seems related, but this created other problems like always having a trailing slash at the end of all my links:
Eg: if I go to domain/article/somearticle and hit refresh, something (.httaccess?) is adding a / to the end to give me domain/article/somearticle/ not horrible, just not very clean and inconsistent...
Edit: Actually, it's a little more horrible, because sometimes we get a trailing slash, sometimes we don't on the nextjs links... must be something about how I'm using <Link /> but I can't figure that out.
Regardless, NONE of the .htaccess rules I've tried successfully remove the trailing slash all the time every time...
More details:
In my next app, I have folder:
/articles/
[slug].js
index.js
In various pages, I use nextJS Link component:
import Link from 'next/link';
<Link href="/articles" as="/articles">
<a>Articles</a>
</Link>
If you request /article and /article exists as a physical directory then Apache's mod_dir, will (by default) append the trailing slash in order to "fix" the URL. This is achieved with a 301 permanent redirect - so it will be cached by the browser.
Although having a physical directory with the same basename as a file and using extensionless URLs creates an ambiguity. eg. Is /article supposed to access the directory /article/ or the file /article.html. You don't seem to want to allow direct access to directories anyway, so that would seem to resolve that ambiguity.
To prevent Apache mod_dir appending the trailing slash to directories we need to disable the DirectorySlash. For example:
DirectorySlash Off
But as mentioned, if you have previously visited /article then the redirect to /article/ will have been cached by the browser - so you'll need to clear the browser cache before this will be effective.
Since you are removing the file extension you also need to ensure that MultiViews is disabled, otherwise, mod_negotiation will issue an internal subrequest for the underlying file, and potentially conflict with mod_rewrite. MultiViews is disabled by default, although some shared hosts do enable it for some reason. From the output you are getting it doesn't look like MultiViews is enabled, but better to be sure...
# Ensure that MutliViews is disabled
Options -MultiViews
However, if you need to be able to access the directory itself then you will need to manually append the trailing slash with an internal rewrite. Although this does not seem to be a requirement here. You should, however, ensure that directory listings are disabled:
# Disable directory listings
Options -Indexes
Attempting to access any directory (that does not ultimately map to a file - see below) and does not contain a DirectoryIndex document will return a 403 Forbidden response, instead of a directory listing.
Note that the only difference that could occur between following a link to domain/article, refreshing the page and manually typing domain/article is caching... either by the browser or any intermediary proxy caches. (Unless you have JavaScript that intercepts the click event on the anchor?!)
You do still need to rewrite requests from /foo to /foo.html OR /foo to /foo/index.html (see below), depending on how you have configured your site. Although it would be preferable that you choose one or the other, rather than both (as you seem to imply could be the case).
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
It is unclear how this is seemingly "working" for you currently - unless you are seeing a cached response? When you request /article, the first condition fails because this exists as a physical directory and the rule is not processed. Even with MultiViews enabled, mod_dir will take priority and append the trailing slash.
The second condition that checks the existence of the .html file isn't necessarily checking the same file that is being rewritten to. eg. If you request /foo/bar, where /foo.html exists, but there is no physical directory /foo then the RewriteCond directive checks for the existence of /foo.html - which is successful, but the request is internally rewritten to /foo/bar.html (from the captured RewriteRule pattern) - this results in an internal rewrite loop and a 500 error response being returned to the client. See my answer to the following ServerFault question that goes into more detail behind what is actually happening here.
We can also make a further optimisation if we assume that any URL that contains what looks like a file extension (eg. your static resources .css, .js and image files) should be ignored, otherwise we are performing filesystem checks on every request, which is relatively expensive.
So, in order to map (internally rewrite) requests of the form /article to /article.html and /article/somearticle to /article/somearticle.html you would need to modify the above rule to read something like:
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
There is no need to backslash escape a literal dot in the RewriteCond TestString - the dot carries no special meaning here; it's not a regex.
Then, to handle requests of the form /foo that should map to /foo/index.html you can do something like the following:
# Rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
Ordinarily, you would allow mod_dir to serve the DirectoryIndex (eg. index.html), but having omitted the trailing slash from the directory, this can be problematic.
Summary
Bringing the above points together, we have:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
This could be further optimised, depending on your site structure and whether you are adding any more directives to the .htaccess file. For example:
you could check for file extensions on the requested URL at the top of the file to prevent any further processing. The RewriteRule regex on each subsequent rule could then be "simplified".
Requests that include a trailing slash could be blocked or redirected (to remove the trailing slash).
If the request is for a .html file then redirect to the extensionless URL. This is made slightly more complicated if you are dealing with both /foo.html and /foo/index.html. But this is only really necessary if you are changing an existing URL structure.
For example, implementing #1 and #2 above, would enable the directives to be written like so:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Prevent any further processing if the URL already ends with a file extension
RewriteRule \.\w{2,4}$ - [L]
# Redirect any requests to remove a trailing slash
RewriteRule (.*)/$ /$1 [R=301,L]
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.*) $1.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1/index.html -f
RewriteRule (.*) $1/index.html [L]
Always test with a 302 (temporary) redirect before changing to a 301 (permanent) redirect in order to avoid caching issues.
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
Yes! And Next can do that for you:
It is possible to configure Next.js to export pages as index.html
files and require trailing slashes, /about becomes /about/index.html
and is routable via /about/. This was the default behavior prior to
Next.js 9.
To switch back and add a trailing slash, open next.config.js and
enable the exportTrailingSlash config:
module.exports = { exportTrailingSlash: true, }

Apache Rewrite rule pointing to password protected directory

Consider the following structure of a directory, which is served by an apache webserver under the URL sample.com :
/local/path/index.html
/local/path/.htaccess
/local/path/admin
/local/path/admin/.htaccess
/local/path/admin/projects/project1/index.html
/local/path/admin/projects/project2/index.html
/local/path/admin/projects/project3/index.html
Whereas /local/path/projects is a symlink pointing to some other directory.
Thus the contents of /local/path/.htaccess is basically this rule:
Options SymLinksIfOwnerMatch
The other .htaccess ensures that the admin directory is password protected.
When http://sample.com is requested index.html is served.
When e.g. http://sample.com/admin/projects/project1 is requested, the contents of /local/path/admin/projects/project1/index.html are served, after the user has entered the correct password.
Requesting http://sample.com/admin of course leads to an 404 Error.
My intention is now to to make the address http://sample.com/admin serve /local/path/admin/projects/project1/index.html. But this should be nor redirection, meaning that e.g. in a browser url-bar the url remains the chosen one. However redirecting to http://sample.com/admin/ would be ok, if necessary.
I tried to enhance the /local/path/admin/.htaccess file with the following:
RewriteEngine On
RewriteRule ^admin admin/projects/project1
RewriteRule ^admin/(.*) admin/projects/project1/$1
But the rules seem to have no effect. Is it maybe because it points to a password protected area?
On the other hand, it was not possible to create a rule inside the admin/.htaccess like this:
RewriteRule ^(.*)$ projects/project1/$1
What am I'm doing wrong here?
I'll suggest to give a better look at the documentation here:
When using the rewrite engine in .htaccess files the per-directory
prefix (that is, the URI path that lead to the directory containing
this .htaccess file) is automatically removed for the RewriteRule
pattern matching and automatically added after any relative (not
starting with a slash or protocol name) substitution encounters the
end of a rule set. See the RewriteBase directive for more information
regarding what prefix will be added back to relative substitutions.
And again:
The removed prefix always ends with a slash, meaning the matching
occurs against a string which never has a leading slash. Therefore, a
Pattern with ^/ never matches in per-directory context.
So given that the per-directory prefix is automatically removed, you should try your rewrite rules without the admin prefix when you are writing /local/path/admin/.htaccess.
Try this rule inside /local/path/admin/.htaccess:
RewriteEngine On
RewriteRule ^/?$ /admin/projects/project1/index.html [L]
RewriteRule ^(?!projects/project1/).+$ /admin/projects/project1/$0 [L,NC]

How can I get rid of trailing slash for index.html

How do I get rid of the trailing slash on my site?
For example, I have an index.html at
http://example.com/examplepage/index.html
Instead of
http://example.com/examplepage/
I would like the URL to be
http://example.com/examplepage
Could I do this with .htaccess?
The trailing slash is really important for apache, without it, even if you have an index.html sitting in the examplepage folder, people will be able to see the contents of your folders. Apache deals with this by having a module loaded by default that redirects the browser to include the trailing slash everytime a directory/folder is accessed. You can turn that off but it's noted in the documentation that there's a major security concern when you do that; mainly, the contents of your folders can be viewed regardless of having an index file or not.
So you can turn this off, but you probably want to still have the trailing slash at least internally. You can do that with mod_rewrite:
# turn off the mechanism to always redirect to the trailing slash
DirectorySlash Off
# Internally add the trailing slash
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond $1 .*[^/]$
RewriteRule ^(.*)$ /$1/ [L]
That should allow you to access http://example.com/examplepage without getting redirected to http://example.com/examplepage/.

Apache rewrite slash

I want to create rewrite rule(s) that catches couple of urls and redirects them depending if the content is available on the first location. If not, then call a url on the application so that it will regenerate it (and next time we can access it from the hard drive).
Let me insert the code here, so it will be easier to understand:
# I need to catch more than one page (and it has to work with and without the trailing slash!)
RewriteCond %{REQUEST_FILENAME} ^(/?|/page1/?|/page2/subpage/?)$ [NC]
# If the content exists
RewriteCond "%{DOCUMENT_ROOT}%{REQUEST_FILENAME}" -f
# Go to the exported folder and try to serve the page from there
# The first slash problem is here: if I have trailing slash, it will not work, because it will try to go here: /var/www/contentstatic/export/sites/default/$1//index.html
RewriteRule ^(.*)$ /var/www/contentstatic/export/sites/default/$1/index.html
# Otherwise run this rule (regenerate the file)
# This has to be changed (to something), because this will catch anything, but I need only the paths I defined earlier: ^(/?|/page1/?|/page2/subpage/?)$ <- Also I have to make sure the that last trailing slash is not there
RewriteRule ^(.*)$ http://application1:8080/export/sites/default/$1/index.html [P]
# At the bottom of the VirtualHost, there is another application that catches all the requests by default, so that's why I shouldn't use the "^(.*)$" in the previous RewriteRule
RewriteRule ^/(.*) http://application2:8080/$1 [P]
ProxyPassReverse / http://application2:8080/
The problems I have here:
This has to work with and without the trailing slash
I have to specify exactly what URLs to be served up from the /var/www/ folder or from the /export/sites/default folder, because if I don't do that the default application tries that, but it will fail
I also tried to remove the trailing slash from the url if it is there (in the first RewriteRule), but this rule:
[^/](.*)[^/]
changed the url from this: /page2/ to this: age2, so it removed the slashes and the first and last character.
Is it possible to use the same "^(/?|/page1/?|/page2/subpage/?)$" paths in the 3rd and 4th RewriteRule without repeating them?
Thanks