Resolve content in subdirectory but keep the path in the browser - apache

My goal is this:
We have site that lives in a webroot which is /web.
It contains the .htaccess file but we want to serve up the content from /web/content but we do not want the URL the user sees to contain /content just the initial path they requested.
Example:
The user makes a request to a URL:
example.com/color/cool/blue
This request goes to:
/webroot/color/cool/blue (which does not exist)
The content is in
/webroot/content/color/cool/blue/index.htm
We would like the user to see example.com/color/cool/blue in the browser, but see the content from what is example.com/content/color/cool/blue/index.htm.
We also would like some directories to be directly accessed like:
example.com/exeption/foo.pdf
We are doing this as a conversion of a dynamic site to a static site so simply moving everything to the root or switching the webroot are not options.

Assumptions:
Directory file-paths do not contain dots.
In the root .htaccess file try the following:
# Disable directory listings (mod_autoindex) since "DirectorySlash Off"
Options -Indexes -MultiViews
# Prevent trailing slash being appended to directories
DirectorySlash Off
# File to serve from requested directory
DirectoryIndex index.htm
RewriteEngine On
# Remove trailing slash on any URL that is requested directly (excludes rewritten URLs)
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule (.*)/$ /$1 [R=301,L]
# Rewrite root
RewriteRule ^$ content/ [L]
# If request maps to a directory in "/content" then rewrite and append trailing slash
RewriteCond %{DOCUMENT_ROOT}/content/$1 -d
RewriteRule ^([^.]+)$ content/$1/ [L]
We also would like some directories to be directly accessed like: example/exeption/foo.pdf
You don't necessarily need to add anything in this respect. Although I'm assuming you mean "files", not "directories".

Related

How can I create a redirect with .htaccess to correct path instead of page acess

I am making a multilingual dynamic site that creates a virtual path per language.
So french pages go to domain.com/fr/ english domain.com/en/page domain.com/fr/some/page but in reality these pages are in the base folder and /fr/ is converted to a query string.
This is all working with the following .htaccess:
RewriteEngine on
DirectorySlash Off # Fixes the issue where a page and folder can have the same name. See https://stackoverflow.com/questions/2017748
# Return 404 if original request is /foo/bar.php
RewriteCond %{THE_REQUEST} "^[^ ]* .*?\.php[? ].*$"
RewriteRule .* - [L,R=404]
# Remove virtual language/locale component
RewriteRule ^(en|fr)/(.*)$ $2?lang=$1 [L,QSA]
RewriteRule ^(en|fr)/$ index.php?lang=$1 [L,QSA]
# Rewrite /foo/bar to /foo/bar.php
RewriteRule ^([^.?]+)$ %{REQUEST_URI}.php [L]
My problem is that some sites (Like a Linkedin post) somehow remove the trailing / in the index page automatically. So if I put a link in my post of domain.com/fr/ somehow they make the link domain.com/fr even if it shows domain.com/fr/ but that 404's as domain.com/fr dosent exist.
So how can I redirect domain.com/fr to domain.com/fr/ or localhost/mypath/fr (There's many sites in my local workstation) to localhost/mypath/fr/.
I tried something like:
RewriteRule ^(.*)/(en|fr)$ $1/$2/ [L,QSA,R=301]
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
But that ended up somehow adding the full real computer path in the url:
localhost/mypath/fr becomes localhost/thepathofthewebserverinmypc/mypath/fr/
I would very much appreciate some help as I have yet to find the right rule.
Thank you
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
You are just missing the slash prefix on the substitution string. Consequently, Apache applies the directory-prefix to the relative URL, which results in the malformed redirect.
For example:
RewriteRule ^(en|fr)$ /$1/ [L,R=301]
The substitution is now a root-relative URL path and Apache just prefixes the scheme + hostname to the external redirect. (The QSA flag is unnecessary here, since any query string is appended by default.)
This needs to go before the existing rewrites (and after the blocking rule for .php requests).
Note that the "internal rewrite" directives are correct to not have the slash prefix.
Aside:
DirectorySlash Off
Note that if you disable the directory slash, you must ensure that auto-generated directory listings (mod_autoindex) are also disabled, otherwise if a directory without a trailing slash is requested then a directory listing will be generated (exposing your file structure), even though there might be a DirectoryIndex document in that directory.
For example, include the following at the top of the .htaccess file:
# Disable auto-generated directory listings (mod_autoindex)
Options -Indexes
UPDATE:
this worked on the production server. As the site is in the server root. Would your know how can I also try and "catch" this on my localhost ? RewriteRule ^(.*)/(en|fr)$ /$1/$2/ [L,R=301] dosent catch but with only RewriteRule ^(en|fr)$ /$1/ [L,R=301] localhost/mypath/fr becomes localhost/fr/
From that I assume the .htaccess file is inside the /mypath subdirectory on your local development server.
The RewriteRule pattern (first argument) matches the URL-path relative to the location of the .htaccess file (so it does not match /mypath). You can then make use of the REQUEST_URI server variable in the substitution that contains the entire (root-relative) URL-path.
For example:
RewriteRule ^(en|fr)$ %{REQUEST_URI}/ [L,R=301]
The REQUEST_URI server variable already includes the slash prefix.
This rule would work OK on both development (in a subdirectory) and in production (root directory), so it should replace the rule above if you need to support both environments with a single .htaccess file.

.htaccess removing file extension is conflicting with folders of the same name

I'm removing file extensions (like .html) from the url with a .htaccess file. The code in the file is working fine, but as soon as there is a folder with the same name as the file without extension, it redirects to the folder instead of redirecting to the file. For example, if I have a demo.html file and a demo folder in the same directory, as soon as I type in the searchbar of the browser www.example.com/demo, it redirects to the folder, instead of the file. If I delete the folder and I type the same thing again, it works perfectly! Any help would be appreciated :)
Here's the code in the .htaccess file:
RewriteCond %{THE_REQUEST} /([^.]+)\.html [NC]
RewriteRule ^ /%1 [NC,L,R]
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^ %{REQUEST_URI}.html [NC,L]
This is caused by a conflict with mod_dir. When you request a directory without a trailing slash, mod_dir will "fix" the URL and append a trailing slash with a 301 redirect. After which it will attempt to serve a DirectoryIndex document. This takes priority over your internal rewrite.
To resolve this you need to disable this behaviour with DirectorySlash Off.
For example:
# Ensure that directory listings are disabled
Options -Indexes
# Prevent mod_dir appending a slash to physical directories
DirectorySlash Off
# Redirect to remove the ".html" extension
RewriteCond %{THE_REQUEST} /([^.?]+)\.html [NC]
RewriteRule ^ /%1 [NC,L,R=301]
# Rewrite request to append ".html" extension if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.*) $1.html [L]
Directory listings (mod_autoindex) need to be disabled when you disable DirectorySlash because if mod_autoindex is enabled then when you request the directory without a slash, a directory listing will be generated, regardless of whether you have a DirectoryIndex document (eg. index.html) in that directory that would ordinarily prevent the directory listing being generated.
Also, I've "fixed" your existing rules that remove and append the .html extension. The first rule that removes the .html extension could have potentially matched an instance of .html that appeared in the query string. And the second rule that appends the .html extension would have resulted in a rewrite-loop (500 error) if requesting /demo/<does-not-exist> - where demo is a directory and a file basename (as in your example).
See my answer to a related question on ServerFault for more information on this potential rewrite-loop:
https://serverfault.com/questions/989333/using-apache-rewrite-rules-in-htaccess-to-remove-html-causing-a-500-error

Apache htaccess rewrite root and all root folders to subfolder without redirecting

Options +FollowSymLinks -MultiViews
# Turn mod_rewrite on
RewriteEngine On
RewriteBase /
RewriteRule ^$ /subdir/ [L,NC]
I want to rewrite the root domain to subfolder without changing the URL in the browser. The above code works just for the root domain but not any folders and files.
For example, I have https://example.com/ and https://example.com/subdir/.
With the above code in .htaccess file, when I go to https://example.com/ I see the contents of https://example.com/subdir/ which is good.
But when I go to https://example.com/test.txt I should see https://example.com/subdir/test.txt but I get The requested URL was not found on this server.
Same happens when I go to https://example.com/abc expecting to see contents of https://example.com/subdir/abc
Any idea?
RewriteRule ^$ /subdir/ [L,NC]
Change this to read:
RewriteRule !^subdir/ subdir%{REQUEST_URI} [L]
Any request that does not start /subdir/ is internally rewritten to /subdir/<url>. The REQUEST_URI server variable contains the full URL-path (including the slash prefix).
I removed the slash prefix from the substitution string since you have defined a RewriteBase /. (Although neither are strictly necessary here.)
UPDATE:
...when I go to example.com/s I am being redirected to example.com/subdir/s/
s is a subfolder within subdir, does that make any difference?
Ah yes, if /s is a subdirectory then mod_dir will append the trailing slash (to "fix" the URL) with an external 301 redirect. This redirect occurs after the URL has been rewritten to /subdir/s - thus exposing the /subdir subdirectory.
To handle this situation we can add another rule (a redirect) before the existing rewrite that first checks whether the request would map to a directory within the /subdir subdirectory and append a slash if it is omitted (before mod_dir would append the slash to the rewritten URL).
For example:
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/subdir%{REQUEST_URI} -d
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/ [R=301,L]
This states... for any request that:
!\.\w{2,4}$ - does not contain (what looks like) a file extension of between 2 and 4 characters (assuming your directories aren't named this way)
!/$ - and does not currently end in a slash.
-d - and exists as a physical directory in the /subdir subdirectory.
THEN redirect to append the trailing slash on the original request
Whilst this probably should be a 301 (permanent) redirect, you should first test with a 302 (temporary) redirect to avoid potential caching issues.
You will need to clear your browser cache before testing, since the erroneous 301 redirect from /s to /subdir/s/ will have been cached by the browser.
A potential optimisation is to remove the filesystem check and simply assume that any request that does not contain a file extension should map to a directory. (But this depends on whether you are handling these URLs in any other way.)
Summary
Options +FollowSymLinks -MultiViews
# Turn mod_rewrite on
RewriteEngine On
# If the requested URL exists as a directory in "/subdir" then append a slash
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{DOCUMENT_ROOT}/subdir%{REQUEST_URI} -d
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/ [R=301,L]
# Rewrite everything to "/subdir"
RewriteRule !^subdir/ subdir%{REQUEST_URI} [L]

Proper .htaccess config for Next.js SSG

NextJS exports a static site with the following structure:
|-- index.html
|-- article.html
|-- tag.html
|-- article
| |-- somearticle.html
| \-- anotherarticle.html
\-- tag
|-- tag1.html
\-- tag2.html
I'm using an .htaccess file to hide the .html extensions:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
Everything works flawlessly, EXCEPT:
If I follow a link to domain/article it displays the article.html page, but my address bar shows domain/article <--Good.
If I refresh, I get sent to address: domain/article/ (note trailing slash) which lists the contents of the article directory <--Bad (same thing with Tag)
Similarly, manually typing in domain/article takes me to domain/article/ instead of showing article.html without the .html extension.
So...
How do I fix this?
Is this an .htaccess issue?
A nextjs config issue?
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
exportTrailingSlash
I tried playing around with exportTrailingSlash which seems related, but this created other problems like always having a trailing slash at the end of all my links:
Eg: if I go to domain/article/somearticle and hit refresh, something (.httaccess?) is adding a / to the end to give me domain/article/somearticle/ not horrible, just not very clean and inconsistent...
Edit: Actually, it's a little more horrible, because sometimes we get a trailing slash, sometimes we don't on the nextjs links... must be something about how I'm using <Link /> but I can't figure that out.
Regardless, NONE of the .htaccess rules I've tried successfully remove the trailing slash all the time every time...
More details:
In my next app, I have folder:
/articles/
[slug].js
index.js
In various pages, I use nextJS Link component:
import Link from 'next/link';
<Link href="/articles" as="/articles">
<a>Articles</a>
</Link>
If you request /article and /article exists as a physical directory then Apache's mod_dir, will (by default) append the trailing slash in order to "fix" the URL. This is achieved with a 301 permanent redirect - so it will be cached by the browser.
Although having a physical directory with the same basename as a file and using extensionless URLs creates an ambiguity. eg. Is /article supposed to access the directory /article/ or the file /article.html. You don't seem to want to allow direct access to directories anyway, so that would seem to resolve that ambiguity.
To prevent Apache mod_dir appending the trailing slash to directories we need to disable the DirectorySlash. For example:
DirectorySlash Off
But as mentioned, if you have previously visited /article then the redirect to /article/ will have been cached by the browser - so you'll need to clear the browser cache before this will be effective.
Since you are removing the file extension you also need to ensure that MultiViews is disabled, otherwise, mod_negotiation will issue an internal subrequest for the underlying file, and potentially conflict with mod_rewrite. MultiViews is disabled by default, although some shared hosts do enable it for some reason. From the output you are getting it doesn't look like MultiViews is enabled, but better to be sure...
# Ensure that MutliViews is disabled
Options -MultiViews
However, if you need to be able to access the directory itself then you will need to manually append the trailing slash with an internal rewrite. Although this does not seem to be a requirement here. You should, however, ensure that directory listings are disabled:
# Disable directory listings
Options -Indexes
Attempting to access any directory (that does not ultimately map to a file - see below) and does not contain a DirectoryIndex document will return a 403 Forbidden response, instead of a directory listing.
Note that the only difference that could occur between following a link to domain/article, refreshing the page and manually typing domain/article is caching... either by the browser or any intermediary proxy caches. (Unless you have JavaScript that intercepts the click event on the anchor?!)
You do still need to rewrite requests from /foo to /foo.html OR /foo to /foo/index.html (see below), depending on how you have configured your site. Although it would be preferable that you choose one or the other, rather than both (as you seem to imply could be the case).
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html
It is unclear how this is seemingly "working" for you currently - unless you are seeing a cached response? When you request /article, the first condition fails because this exists as a physical directory and the rule is not processed. Even with MultiViews enabled, mod_dir will take priority and append the trailing slash.
The second condition that checks the existence of the .html file isn't necessarily checking the same file that is being rewritten to. eg. If you request /foo/bar, where /foo.html exists, but there is no physical directory /foo then the RewriteCond directive checks for the existence of /foo.html - which is successful, but the request is internally rewritten to /foo/bar.html (from the captured RewriteRule pattern) - this results in an internal rewrite loop and a 500 error response being returned to the client. See my answer to the following ServerFault question that goes into more detail behind what is actually happening here.
We can also make a further optimisation if we assume that any URL that contains what looks like a file extension (eg. your static resources .css, .js and image files) should be ignored, otherwise we are performing filesystem checks on every request, which is relatively expensive.
So, in order to map (internally rewrite) requests of the form /article to /article.html and /article/somearticle to /article/somearticle.html you would need to modify the above rule to read something like:
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
There is no need to backslash escape a literal dot in the RewriteCond TestString - the dot carries no special meaning here; it's not a regex.
Then, to handle requests of the form /foo that should map to /foo/index.html you can do something like the following:
# Rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
Ordinarily, you would allow mod_dir to serve the DirectoryIndex (eg. index.html), but having omitted the trailing slash from the directory, this can be problematic.
Summary
Bringing the above points together, we have:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}/index.html [L]
This could be further optimised, depending on your site structure and whether you are adding any more directives to the .htaccess file. For example:
you could check for file extensions on the requested URL at the top of the file to prevent any further processing. The RewriteRule regex on each subsequent rule could then be "simplified".
Requests that include a trailing slash could be blocked or redirected (to remove the trailing slash).
If the request is for a .html file then redirect to the extensionless URL. This is made slightly more complicated if you are dealing with both /foo.html and /foo/index.html. But this is only really necessary if you are changing an existing URL structure.
For example, implementing #1 and #2 above, would enable the directives to be written like so:
# Disable directory indexes and MultiViews
Options -Indexes -MultiViews
# Prevent mod_dir appending a slash to directory requests
DirectorySlash Off
RewriteEngine On
# Prevent any further processing if the URL already ends with a file extension
RewriteRule \.\w{2,4}$ - [L]
# Redirect any requests to remove a trailing slash
RewriteRule (.*)/$ /$1 [R=301,L]
# Rewrite /foo to /foo.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.*) $1.html [L]
# Otherwise, rewrite /foo to /foo/index.html if it exists
RewriteCond %{DOCUMENT_ROOT}/$1/index.html -f
RewriteRule (.*) $1/index.html [L]
Always test with a 302 (temporary) redirect before changing to a 301 (permanent) redirect in order to avoid caching issues.
(Wouldn't it be better for NextJS to create a article\index.html instead of a file in the root directory?)
Yes! And Next can do that for you:
It is possible to configure Next.js to export pages as index.html
files and require trailing slashes, /about becomes /about/index.html
and is routable via /about/. This was the default behavior prior to
Next.js 9.
To switch back and add a trailing slash, open next.config.js and
enable the exportTrailingSlash config:
module.exports = { exportTrailingSlash: true, }

.htaccess to show a directory index.html without a trailing slash

I've got a Jekyll generated site running on an Apache server and I'm having some trouble getting my .htaccess file set up correctly. Jekyll places index.html files into folders which represent each page so my URLs currently look like domain.com/foo/
I'd like to remove that trailing slash from the URL so that it exactly matches what I had set up previously (and also because I think it looks better).
Currently the section of my .htaccess file dealing with rewites looks like:
<IfModule mod_rewrite.c>
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
</IfModule>
Options -Indexes
DirectoryIndex index.xml index.html
I have tried following the advice here but that puts me into a redirect loop.
Can anybody help me out? In brief, what I want is for a domain.com/foo URL to show the index.html file form the /foo directory and for domain.com/foo/ and domain.com/foo/index.html to redirect to domain.com/foo.
You should be able to use this to turn off the addition of slashes.
DirectorySlash Off
Note that the trailing slash is added for a good reason. Having the trailing slash in the directory name will make relative URLs point at the same thing regardless of whether the URL ends with "foo/bar/index.html" or just "foo/bar/". Without the trailing slash, relative URLs would reference something up one level from what they normally point at. (eg: "baz.jpg" would give the user "/foo/baz.jpg" instead of "/foo/bar/baz.jpg", as the trailing "bar" will get removed if it isn't protected by a trailing slash.) So if you do this, you probably want to avoid relative URLs.
To then rewrite the directory name to return the index.html you could probably do something like this:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule ^(.*)$ /$1/index.html [L]
This checks if REQUEST_URI/index.html exists, and if it does performs an internal redirect.