Proper .htaccess config for Next.js SSG - apache

NextJS exports a static site with the following structure:
|-- index.html
|-- article.html
|-- tag.html
|-- article
| |-- somearticle.html
| \-- anotherarticle.html
\-- tag
|-- tag1.html
\-- tag2.html
I'm using an .htaccess file to hide the .html extensions:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
Everything works flawlessly, EXCEPT:
If I follow a link to domain/article it displays the article.html page, but my address bar shows domain/article <--Good.
If I refresh, I get sent to address: domain/article/ (note trailing slash) which lists the contents of the article directory <--Bad (same thing with Tag)
Similarly, manually typing in domain/article takes me to domain/article/ instead of showing article.html without the .html extension.
So...
How do I fix this?
Is this an .htaccess issue?
A nextjs config issue?
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
exportTrailingSlash
I tried playing around with exportTrailingSlash which seems related, but this created other problems like always having a trailing slash at the end of all my links:
Eg: if I go to domain/article/somearticle and hit refresh, something (.httaccess?) is adding a / to the end to give me domain/article/somearticle/ not horrible, just not very clean and inconsistent...
Edit: Actually, it's a little more horrible, because sometimes we get a trailing slash, sometimes we don't on the nextjs links... must be something about how I'm using <Link /> but I can't figure that out.
Regardless, NONE of the .htaccess rules I've tried successfully remove the trailing slash all the time every time...
More details:
In my next app, I have folder:
/articles/
[slug].js
index.js
In various pages, I use nextJS Link component:
import Link from 'next/link';
<Link href="/articles" as="/articles">
<a>Articles</a>
</Link>

If you request /article and /article exists as a physical directory then Apache's mod_dir, will (by default) append the trailing slash in order to "fix" the URL. This is achieved with a 301 permanent redirect - so it will be cached by the browser.
Although having a physical directory with the same basename as a file and using extensionless URLs creates an ambiguity. eg. Is /article supposed to access the directory /article/ or the file /article.html. You don't seem to want to allow direct access to directories anyway, so that would seem to resolve that ambiguity.
To prevent Apache mod_dir appending the trailing slash to directories we need to disable the DirectorySlash. For example:
DirectorySlash Off
But as mentioned, if you have previously visited /article then the redirect to /article/ will have been cached by the browser - so you'll need to clear the browser cache before this will be effective.
Since you are removing the file extension you also need to ensure that MultiViews is disabled, otherwise, mod_negotiation will issue an internal subrequest for the underlying file, and potentially conflict with mod_rewrite. MultiViews is disabled by default, although some shared hosts do enable it for some reason. From the output you are getting it doesn't look like MultiViews is enabled, but better to be sure...
# Ensure that MutliViews is disabled
Options -MultiViews
However, if you need to be able to access the directory itself then you will need to manually append the trailing slash with an internal rewrite. Although this does not seem to be a requirement here. You should, however, ensure that directory listings are disabled:
# Disable directory listings
Options -Indexes
Attempting to access any directory (that does not ultimately map to a file - see below) and does not contain a DirectoryIndex document will return a 403 Forbidden response, instead of a directory listing.
Note that the only difference that could occur between following a link to domain/article, refreshing the page and manually typing domain/article is caching... either by the browser or any intermediary proxy caches. (Unless you have JavaScript that intercepts the click event on the anchor?!)
You do still need to rewrite requests from /foo to /foo.html OR /foo to /foo/index.html (see below), depending on how you have configured your site. Although it would be preferable that you choose one or the other, rather than both (as you seem to imply could be the case).
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
It is unclear how this is seemingly "working" for you currently - unless you are seeing a cached response? When you request /article, the first condition fails because this exists as a physical directory and the rule is not processed. Even with MultiViews enabled, mod_dir will take priority and append the trailing slash.
The second condition that checks the existence of the .html file isn't necessarily checking the same file that is being rewritten to. eg. If you request /foo/bar, where /foo.html exists, but there is no physical directory /foo then the RewriteCond directive checks for the existence of /foo.html - which is successful, but the request is internally rewritten to /foo/bar.html (from the captured RewriteRule pattern) - this results in an internal rewrite loop and a 500 error response being returned to the client. See my answer to the following ServerFault question that goes into more detail behind what is actually happening here.
We can also make a further optimisation if we assume that any URL that contains what looks like a file extension (eg. your static resources .css, .js and image files) should be ignored, otherwise we are performing filesystem checks on every request, which is relatively expensive.
So, in order to map (internally rewrite) requests of the form /article to /article.html and /article/somearticle to /article/somearticle.html you would need to modify the above rule to read something like:
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
There is no need to backslash escape a literal dot in the RewriteCond TestString - the dot carries no special meaning here; it's not a regex.
Then, to handle requests of the form /foo that should map to /foo/index.html you can do something like the following:
# Rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
Ordinarily, you would allow mod_dir to serve the DirectoryIndex (eg. index.html), but having omitted the trailing slash from the directory, this can be problematic.
Summary
Bringing the above points together, we have:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
This could be further optimised, depending on your site structure and whether you are adding any more directives to the .htaccess file. For example:
you could check for file extensions on the requested URL at the top of the file to prevent any further processing. The RewriteRule regex on each subsequent rule could then be "simplified".
Requests that include a trailing slash could be blocked or redirected (to remove the trailing slash).
If the request is for a .html file then redirect to the extensionless URL. This is made slightly more complicated if you are dealing with both /foo.html and /foo/index.html. But this is only really necessary if you are changing an existing URL structure.
For example, implementing #1 and #2 above, would enable the directives to be written like so:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Prevent any further processing if the URL already ends with a file extension
RewriteRule \.\w{2,4}$ - [L]
# Redirect any requests to remove a trailing slash
RewriteRule (.*)/$ /$1 [R=301,L]
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.*) $1.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1/index.html -f
RewriteRule (.*) $1/index.html [L]
Always test with a 302 (temporary) redirect before changing to a 301 (permanent) redirect in order to avoid caching issues.

(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
Yes! And Next can do that for you:
It is possible to configure Next.js to export pages as index.html
files and require trailing slashes, /about becomes /about/index.html
and is routable via /about/. This was the default behavior prior to
Next.js 9.
To switch back and add a trailing slash, open next.config.js and
enable the exportTrailingSlash config:
module.exports = { exportTrailingSlash: true, }

Related

How can I create a redirect with .htaccess to correct path instead of page acess

I am making a multilingual dynamic site that creates a virtual path per language.
So french pages go to domain.com/fr/ english domain.com/en/page domain.com/fr/some/page but in reality these pages are in the base folder and /fr/ is converted to a query string.
This is all working with the following .htaccess:
RewriteEngine on
DirectorySlash Off # Fixes the issue where a page and folder can have the same name. See https://stackoverflow.com/questions/2017748
# Return 404 if original request is /foo/bar.php
RewriteCond %{THE_REQUEST} "^[^ ]* .*?\.php[? ].*$"
RewriteRule .* - [L,R=404]
# Remove virtual language/locale component
RewriteRule ^(en|fr)/(.*)$ $2?lang=$1 [L,QSA]
RewriteRule ^(en|fr)/$ index.php?lang=$1 [L,QSA]
# Rewrite /foo/bar to /foo/bar.php
RewriteRule ^([^.?]+)$ %{REQUEST_URI}.php [L]
My problem is that some sites (Like a Linkedin post) somehow remove the trailing / in the index page automatically. So if I put a link in my post of domain.com/fr/ somehow they make the link domain.com/fr even if it shows domain.com/fr/ but that 404's as domain.com/fr dosent exist.
So how can I redirect domain.com/fr to domain.com/fr/ or localhost/mypath/fr (There's many sites in my local workstation) to localhost/mypath/fr/.
I tried something like:
RewriteRule ^(.*)/(en|fr)$ $1/$2/ [L,QSA,R=301]
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
But that ended up somehow adding the full real computer path in the url:
localhost/mypath/fr becomes localhost/thepathofthewebserverinmypc/mypath/fr/
I would very much appreciate some help as I have yet to find the right rule.
Thank you
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
You are just missing the slash prefix on the substitution string. Consequently, Apache applies the directory-prefix to the relative URL, which results in the malformed redirect.
For example:
RewriteRule ^(en|fr)$ /$1/ [L,R=301]
The substitution is now a root-relative URL path and Apache just prefixes the scheme + hostname to the external redirect. (The QSA flag is unnecessary here, since any query string is appended by default.)
This needs to go before the existing rewrites (and after the blocking rule for .php requests).
Note that the "internal rewrite" directives are correct to not have the slash prefix.
Aside:
DirectorySlash Off
Note that if you disable the directory slash, you must ensure that auto-generated directory listings (mod_autoindex) are also disabled, otherwise if a directory without a trailing slash is requested then a directory listing will be generated (exposing your file structure), even though there might be a DirectoryIndex document in that directory.
For example, include the following at the top of the .htaccess file:
# Disable auto-generated directory listings (mod_autoindex)
Options -Indexes
UPDATE:
this worked on the production server. As the site is in the server root. Would your know how can I also try and "catch" this on my localhost ? RewriteRule ^(.*)/(en|fr)$ /$1/$2/ [L,R=301] dosent catch but with only RewriteRule ^(en|fr)$ /$1/ [L,R=301] localhost/mypath/fr becomes localhost/fr/
From that I assume the .htaccess file is inside the /mypath subdirectory on your local development server.
The RewriteRule pattern (first argument) matches the URL-path relative to the location of the .htaccess file (so it does not match /mypath). You can then make use of the REQUEST_URI server variable in the substitution that contains the entire (root-relative) URL-path.
For example:
RewriteRule ^(en|fr)$ %{REQUEST_URI}/ [L,R=301]
The REQUEST_URI server variable already includes the slash prefix.
This rule would work OK on both development (in a subdirectory) and in production (root directory), so it should replace the rule above if you need to support both environments with a single .htaccess file.

Resolve content in subdirectory but keep the path in the browser

My goal is this:
We have site that lives in a webroot which is /web.
It contains the .htaccess file but we want to serve up the content from /web/content but we do not want the URL the user sees to contain /content just the initial path they requested.
Example:
The user makes a request to a URL:
example.com/color/cool/blue
This request goes to:
/webroot/color/cool/blue (which does not exist)
The content is in
/webroot/content/color/cool/blue/index.htm
We would like the user to see example.com/color/cool/blue in the browser, but see the content from what is example.com/content/color/cool/blue/index.htm.
We also would like some directories to be directly accessed like:
example.com/exeption/foo.pdf
We are doing this as a conversion of a dynamic site to a static site so simply moving everything to the root or switching the webroot are not options.
Assumptions:
Directory file-paths do not contain dots.
In the root .htaccess file try the following:
# Disable directory listings (mod_autoindex) since "DirectorySlash Off"
Options -Indexes -MultiViews
# Prevent trailing slash being appended to directories
DirectorySlash Off
# File to serve from requested directory
DirectoryIndex index.htm
RewriteEngine On
# Remove trailing slash on any URL that is requested directly (excludes rewritten URLs)
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule (.*)/$ /$1 [R=301,L]
# Rewrite root
RewriteRule ^$ content/ [L]
# If request maps to a directory in "/content" then rewrite and append trailing slash
RewriteCond %{DOCUMENT_ROOT}/content/$1 -d
RewriteRule ^([^.]+)$ content/$1/ [L]
We also would like some directories to be directly accessed like: example/exeption/foo.pdf
You don't necessarily need to add anything in this respect. Although I'm assuming you mean "files", not "directories".

rewrite request for /folder to folder/index.php without 301 redirect with apache

So I put an index.php in /pipe/index.php
I'd like to rewrite (internal, not redirect)
https://host/pipe?token=abc to https://host/pipe/index.php?token=abc
what I tried (caveat, assumes there is always a ? in the url):
RewriteEngine on
RewriteRule "^([^?]*)(.*)$" "$1/$2" [PT]
my hope was to split at the ? and just insert a / there.
But it seems apache finds out that "oh, pipe is a folder" before checking my .htacces (?) Because despite my [PT] it still redirects with 301 to /pipe/?token=abc, when I hoped for internal rewrite.
But it seems apache finds out that "oh, pipe is a folder" before checking my .htacces (?)
Yes, mod_dir will append the trailing slash with a 301 redirect. Although this occurs after mod_rewrite has processed the URL (if indeed it is being processed at all - see below). (The PT flag is irrelevant in .htaccess, since the resulting rewrite is passed through as a URL-path by default.)
RewriteRule "^([^?]*)(.*)$" "$1/$2" [PT]
However, your existing rule (by itself) would result in a rewrite-loop (500 Internal Server Error) since it matches itself and repeatedly appends a slash. If you are seeing a 301 redirect as mentioned above then either this rule is not doing anything (are .htaccess overrides enabled?) or you have a conflict with other rules.
As you've stated, this rule also assumes that the query string (with leading ?) is also matched by the RewriteRule pattern. The RewriteRule directive matches against the URL-path only, not the query string. $2 in the above rule is therefore always empty (unless you have %3F in the URL-path, ie. a %-encoded ?).
The query string is contained in its own variable, QUERY_STRING. But you simply want to pass through the same query string, so you don't need to do anything special here, since that happens by default.
Solution
To prevent mod_dir appending the trailing slash, you need to set DirectorySlash Off at the top of the root .htaccess file.
Note that these directives must go in the .htaccess file in the root/parent directory, as opposed to the subdirectory that has the trailing slash omitted. This is because the mod_rewrite directives (that "fix" the URL by appending the trailing slash) would never actually be processed in the subdirectory .htaccess file. The trailing slash would seem to be required for mod_rewrite to function. (However, the mod_dir DirectorySlash Off directive would still be processed successfully, so the slash would not be appended.)
For example:
# Prevent mod_dir appending the trailing slash
DirectorySlash Off
# Must disable directory listings when "DirectorySlash Off" is set
Options -Indexes
However, you need to then manually append the trailing slash to any directory, where it is omitted, with an internal rewrite to "fix" the URL (and to correctly serve the DirectoryIndex document, ie. index.php).
# Ensure DirectoryIndex is set correctly
DirectoryIndex index.php
RewriteEngine On
# Append trailing slash to any directory where it has been omitted
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [L]
The trailing slash on the directory (via the internal rewrite) is required in order to serve the DirectoryIndex document, otherwise, you get a 403 Forbidden, even if the DirectoryIndex document is present.
If the trailing slash is omitted and directory listings (mod_autoindex) are enabled (disabled above) then a directory listing would be generated even if a DirectoryIndex document is present in that directory. (Which is why directory listings must be disabled when DirectorySlash Off is set.)
NB: You will need to make sure the browser cache is cleared since the earlier 301 redirect by mod_dir to append the trailing slash will have been cached by the browser.
This probably is what you are looking for:
RewriteEngine on
RewriteRule ^/?pipe/?$ /pipe/index.php [QSA,L]
The QSA flag is actually redundant here, it is the default, but it makes things clearer if you compare it to that variant (both work):
RewriteEngine on
RewriteRule ^/?pipe/?$ /pipe/index.php?%{QUERY_STRING} [QSD,L]
The documentation of the rewriting module, more specific of the RewriteRule directive clearly points out that the query string is not part of the path the rule's pattern is matched against.
If you want to have more control about the content of the query string you can use a RewriteCond:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^token=(.*)$
RewriteRule ^/?pipe/?$ /pipe/index.php?token=%1 [QSD,L]
Also you might want to redirect the original URL:
RewriteEngine on
RewriteRule ^/?pipe/index.php /pipe [QSA,R=301,END]
RewriteRule ^/?pipe/?$ /pipe/index.php [QSA,L]
And finally you might also want to take a look at the DirectoryIndex directive which might offer a solution without any rewriting at all, though this depends a bit on your setup ...

htaccess shows 404 error rather than finding index.php

I'm having some issues with a shady URL rewriting
I want to turn http://localhost:81/es/index.php into http://localhost:81/index.php?lengua=es with my .htaccess in order to help the page SEO
This is my current .htaccess
<FilesMatch ".*\.(log|ini|htaccess)$">
deny from all
</FilesMatch>
Options -Indexes
RewriteEngine On
RewriteBase /
FallbackResource "index.php"
RewriteRule ^(en|es|pt)?/?(.*)?$ $2?idioma=$1 [QSA,L]
I have checked that they work with htaccess tester and they're working as expected but when I browse the page it shows a "File not found." error (I do have a index.php, I do not have a es/index.php)
Since my output URL is http://localhost:81/index.php?lengua=es I don't understand why is it not working
I would suggest breaking this in four rewrite rules:
RewriteEngine on
# Redirect to add the trailing slash to language directory
# http://example.com/es > http://example.com/es/
RewriteRule ^/?(en|es|pt)$ /$1/ [R=301,L]
# Redirect to remove `index.php`
# http://example.com/es/index.php > http://example.com/es/
RewriteRule ^/?(en|es|pt)/index\.php$ /$1/ [R=301,L]
# Handle requests for the base language directory
# http://example.com/es/ > http://example.com/index.php?idioma=es
RewriteRule ^/?(en|es|pt)/$ /index.php?idioma=$1 [QSA,L]
# Handle requests for php files within the language directory
# http://example.com/es/foo.php > http://example.com/foo.php?idioma=es
RewriteRule ^/?(en|es|pt)/(.+\.php)$ /$2?idioma=$1 [QSA,L]
I would remove RewriteBase / because I believe that is the default in the root .htaccess file anyway.
I would remove FallbackResource "index.php" because you shouldn't need it based on the examples you have provided. If you keep it, the examples in the documentation show it starting with a slash: FallbackResource /index.php. You should also test without it because it has the potential to conflict with the rewrite rules.
I always like to start rewrite rules with an optional slash ^/? (rather than just ^) so that they can be used both in .htaccess and in httpd.conf without modifications.
The rule in the question makes everything optional including the language code. Rather than (en|es|pt)? my rules use (en|es|pt) so that they don't match if the language code isn't in the URL.
Rather than make the slash after the language directory optional, my rules do different things when it is present and when it is not.
In your rule (.*)? is exactly equivalent to the simpler (.*). I changed it to (.*\.php) so that it only matches PHP files.

Why does mod_rewrite ignore my [L] flag?

This is my .htaccess file. It should deliver static files from assets folder if the url matches them. Otherwise, everything should be redirected to index.php.
Note that the url doesn't contain assets as segemnt here. So example.com/css/style.css directs to assets/css/style.css.
RewriteEngine on
# disable directory browsing
Options -Indexes
# static assets
RewriteCond %{DOCUMENT_ROOT}/assets/$1 -f
RewriteRule ^(.*)$ assets/$1 [L]
# other requests to index.php
RewriteRule !^asset/ index.php [L]
Unfortunately, urls like example.com/assets/css/style.css also deliver the file, since for that url none of my rules applies and Apache's default behavior is applied which delivers the file.
So I tried changing the last line to this. I thought that this would work since the [L] flag in the rule above should stop execution for asset urls and deliver them.
RewriteRule ^(.*)$ index.php [L]
Instead, not all requests are redirected to index.php, even static assets like example.com/css/style.css. Why does the flag not stop execution of rewrite rules and who to fix my problem then?
I found the solution on the pages of the official documentation.
If you are using RewriteRule in either .htaccess files or in
sections, it is important to have some understanding of
how the rules are processed. The simplified form of this is that once
the rules have been processed, the rewritten request is handed back to
the URL parsing engine to do what it may with it. It is possible that
as the rewritten request is handled, the .htaccess file or
section may be encountered again, and thus the ruleset may be run
again from the start. Most commonly this will happen if one of the
rules causes a redirect - either internal or external - causing the
request process to start over.
It is therefore important, if you are using RewriteRule directives in
one of these contexts, that you take explicit steps to avoid rules
looping, and not count solely on the [L] flag to terminate execution
of a series of rules, as shown below.
An alternative flag, [END], can be used to terminate not only the
current round of rewrite processing but prevent any subsequent rewrite
processing from occurring in per-directory (htaccess) context. This
does not apply to new requests resulting from external redirects.
To fix my problem, I changed to [L] flags to [END].
RewriteEngine on
# disable directory browsing
Options -Indexes
# static assets
RewriteCond %{DOCUMENT_ROOT}/assets/$1 -f
RewriteRule ^(.*)$ assets/$1 [END]
# other requests to index.php
RewriteRule !^asset/ index.php [END]