Apache rewrite slash - apache

I want to create rewrite rule(s) that catches couple of urls and redirects them depending if the content is available on the first location. If not, then call a url on the application so that it will regenerate it (and next time we can access it from the hard drive).
Let me insert the code here, so it will be easier to understand:
# I need to catch more than one page (and it has to work with and without the trailing slash!)
RewriteCond %{REQUEST_FILENAME} ^(/?|/page1/?|/page2/subpage/?)$ [NC]
# If the content exists
RewriteCond "%{DOCUMENT_ROOT}%{REQUEST_FILENAME}" -f
# Go to the exported folder and try to serve the page from there
# The first slash problem is here: if I have trailing slash, it will not work, because it will try to go here: /var/www/contentstatic/export/sites/default/$1//index.html
RewriteRule ^(.*)$ /var/www/contentstatic/export/sites/default/$1/index.html
# Otherwise run this rule (regenerate the file)
# This has to be changed (to something), because this will catch anything, but I need only the paths I defined earlier: ^(/?|/page1/?|/page2/subpage/?)$ <- Also I have to make sure the that last trailing slash is not there
RewriteRule ^(.*)$ http://application1:8080/export/sites/default/$1/index.html [P]
# At the bottom of the VirtualHost, there is another application that catches all the requests by default, so that's why I shouldn't use the "^(.*)$" in the previous RewriteRule
RewriteRule ^/(.*) http://application2:8080/$1 [P]
ProxyPassReverse / http://application2:8080/
The problems I have here:
This has to work with and without the trailing slash
I have to specify exactly what URLs to be served up from the /var/www/ folder or from the /export/sites/default folder, because if I don't do that the default application tries that, but it will fail
I also tried to remove the trailing slash from the url if it is there (in the first RewriteRule), but this rule:
[^/](.*)[^/]
changed the url from this: /page2/ to this: age2, so it removed the slashes and the first and last character.
Is it possible to use the same "^(/?|/page1/?|/page2/subpage/?)$" paths in the 3rd and 4th RewriteRule without repeating them?
Thanks

Related

How can I create a redirect with .htaccess to correct path instead of page acess

I am making a multilingual dynamic site that creates a virtual path per language.
So french pages go to domain.com/fr/ english domain.com/en/page domain.com/fr/some/page but in reality these pages are in the base folder and /fr/ is converted to a query string.
This is all working with the following .htaccess:
RewriteEngine on
DirectorySlash Off # Fixes the issue where a page and folder can have the same name. See https://stackoverflow.com/questions/2017748
# Return 404 if original request is /foo/bar.php
RewriteCond %{THE_REQUEST} "^[^ ]* .*?\.php[? ].*$"
RewriteRule .* - [L,R=404]
# Remove virtual language/locale component
RewriteRule ^(en|fr)/(.*)$ $2?lang=$1 [L,QSA]
RewriteRule ^(en|fr)/$ index.php?lang=$1 [L,QSA]
# Rewrite /foo/bar to /foo/bar.php
RewriteRule ^([^.?]+)$ %{REQUEST_URI}.php [L]
My problem is that some sites (Like a Linkedin post) somehow remove the trailing / in the index page automatically. So if I put a link in my post of domain.com/fr/ somehow they make the link domain.com/fr even if it shows domain.com/fr/ but that 404's as domain.com/fr dosent exist.
So how can I redirect domain.com/fr to domain.com/fr/ or localhost/mypath/fr (There's many sites in my local workstation) to localhost/mypath/fr/.
I tried something like:
RewriteRule ^(.*)/(en|fr)$ $1/$2/ [L,QSA,R=301]
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
But that ended up somehow adding the full real computer path in the url:
localhost/mypath/fr becomes localhost/thepathofthewebserverinmypc/mypath/fr/
I would very much appreciate some help as I have yet to find the right rule.
Thank you
RewriteRule ^(en|fr)$ $1/ [L,QSA,R=301]
You are just missing the slash prefix on the substitution string. Consequently, Apache applies the directory-prefix to the relative URL, which results in the malformed redirect.
For example:
RewriteRule ^(en|fr)$ /$1/ [L,R=301]
The substitution is now a root-relative URL path and Apache just prefixes the scheme + hostname to the external redirect. (The QSA flag is unnecessary here, since any query string is appended by default.)
This needs to go before the existing rewrites (and after the blocking rule for .php requests).
Note that the "internal rewrite" directives are correct to not have the slash prefix.
Aside:
DirectorySlash Off
Note that if you disable the directory slash, you must ensure that auto-generated directory listings (mod_autoindex) are also disabled, otherwise if a directory without a trailing slash is requested then a directory listing will be generated (exposing your file structure), even though there might be a DirectoryIndex document in that directory.
For example, include the following at the top of the .htaccess file:
# Disable auto-generated directory listings (mod_autoindex)
Options -Indexes
UPDATE:
this worked on the production server. As the site is in the server root. Would your know how can I also try and "catch" this on my localhost ? RewriteRule ^(.*)/(en|fr)$ /$1/$2/ [L,R=301] dosent catch but with only RewriteRule ^(en|fr)$ /$1/ [L,R=301] localhost/mypath/fr becomes localhost/fr/
From that I assume the .htaccess file is inside the /mypath subdirectory on your local development server.
The RewriteRule pattern (first argument) matches the URL-path relative to the location of the .htaccess file (so it does not match /mypath). You can then make use of the REQUEST_URI server variable in the substitution that contains the entire (root-relative) URL-path.
For example:
RewriteRule ^(en|fr)$ %{REQUEST_URI}/ [L,R=301]
The REQUEST_URI server variable already includes the slash prefix.
This rule would work OK on both development (in a subdirectory) and in production (root directory), so it should replace the rule above if you need to support both environments with a single .htaccess file.

rewrite request for /folder to folder/index.php without 301 redirect with apache

So I put an index.php in /pipe/index.php
I'd like to rewrite (internal, not redirect)
https://host/pipe?token=abc to https://host/pipe/index.php?token=abc
what I tried (caveat, assumes there is always a ? in the url):
RewriteEngine on
RewriteRule "^([^?]*)(.*)$" "$1/$2" [PT]
my hope was to split at the ? and just insert a / there.
But it seems apache finds out that "oh, pipe is a folder" before checking my .htacces (?) Because despite my [PT] it still redirects with 301 to /pipe/?token=abc, when I hoped for internal rewrite.
But it seems apache finds out that "oh, pipe is a folder" before checking my .htacces (?)
Yes, mod_dir will append the trailing slash with a 301 redirect. Although this occurs after mod_rewrite has processed the URL (if indeed it is being processed at all - see below). (The PT flag is irrelevant in .htaccess, since the resulting rewrite is passed through as a URL-path by default.)
RewriteRule "^([^?]*)(.*)$" "$1/$2" [PT]
However, your existing rule (by itself) would result in a rewrite-loop (500 Internal Server Error) since it matches itself and repeatedly appends a slash. If you are seeing a 301 redirect as mentioned above then either this rule is not doing anything (are .htaccess overrides enabled?) or you have a conflict with other rules.
As you've stated, this rule also assumes that the query string (with leading ?) is also matched by the RewriteRule pattern. The RewriteRule directive matches against the URL-path only, not the query string. $2 in the above rule is therefore always empty (unless you have %3F in the URL-path, ie. a %-encoded ?).
The query string is contained in its own variable, QUERY_STRING. But you simply want to pass through the same query string, so you don't need to do anything special here, since that happens by default.
Solution
To prevent mod_dir appending the trailing slash, you need to set DirectorySlash Off at the top of the root .htaccess file.
Note that these directives must go in the .htaccess file in the root/parent directory, as opposed to the subdirectory that has the trailing slash omitted. This is because the mod_rewrite directives (that "fix" the URL by appending the trailing slash) would never actually be processed in the subdirectory .htaccess file. The trailing slash would seem to be required for mod_rewrite to function. (However, the mod_dir DirectorySlash Off directive would still be processed successfully, so the slash would not be appended.)
For example:
# Prevent mod_dir appending the trailing slash
DirectorySlash Off
# Must disable directory listings when "DirectorySlash Off" is set
Options -Indexes
However, you need to then manually append the trailing slash to any directory, where it is omitted, with an internal rewrite to "fix" the URL (and to correctly serve the DirectoryIndex document, ie. index.php).
# Ensure DirectoryIndex is set correctly
DirectoryIndex index.php
RewriteEngine On
# Append trailing slash to any directory where it has been omitted
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [L]
The trailing slash on the directory (via the internal rewrite) is required in order to serve the DirectoryIndex document, otherwise, you get a 403 Forbidden, even if the DirectoryIndex document is present.
If the trailing slash is omitted and directory listings (mod_autoindex) are enabled (disabled above) then a directory listing would be generated even if a DirectoryIndex document is present in that directory. (Which is why directory listings must be disabled when DirectorySlash Off is set.)
NB: You will need to make sure the browser cache is cleared since the earlier 301 redirect by mod_dir to append the trailing slash will have been cached by the browser.
This probably is what you are looking for:
RewriteEngine on
RewriteRule ^/?pipe/?$ /pipe/index.php [QSA,L]
The QSA flag is actually redundant here, it is the default, but it makes things clearer if you compare it to that variant (both work):
RewriteEngine on
RewriteRule ^/?pipe/?$ /pipe/index.php?%{QUERY_STRING} [QSD,L]
The documentation of the rewriting module, more specific of the RewriteRule directive clearly points out that the query string is not part of the path the rule's pattern is matched against.
If you want to have more control about the content of the query string you can use a RewriteCond:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^token=(.*)$
RewriteRule ^/?pipe/?$ /pipe/index.php?token=%1 [QSD,L]
Also you might want to redirect the original URL:
RewriteEngine on
RewriteRule ^/?pipe/index.php /pipe [QSA,R=301,END]
RewriteRule ^/?pipe/?$ /pipe/index.php [QSA,L]
And finally you might also want to take a look at the DirectoryIndex directive which might offer a solution without any rewriting at all, though this depends a bit on your setup ...

Simulate directory with htaccess

When my main website opens, it retrieves content from /home/parsa/public_html.
I have tried this: rewriteRule "^/(.*)$" "/ppyazi.com/$1"
I need it to retrieve the files from /home/parsa/public_html/ppyazi.com without redirecting to it on the user side.
Here are some examples:
index.php to display contents of ppyazi.com/index.php
users/index.php to display contents of ppyazi.com/users/index.php
I have tried this: rewriteRule "^/(.*)$" "/ppyazi.com/$1"
In .htaccess the URL-path that is matched by the RewriteRule pattern does not start with a slash. So, the pattern ^/(.*)$ will never match and your directive does nothing.
However, unless there is also a .htaccess file in the /ppyazi.com subdirectory with mod_rewrite directives then you need to be careful of rewrite loops.
Try the following instead:
RewriteEngine On
rewriteRule !^ppyazi\.com\ /ppyazi.com%{REQUEST_URI} [L]
The RewriteRule pattern simply checks that the URL does not already start with the directory we are rewriting to. Instead of the $1 backreference (since we are not capturing anything in the RewriteRule pattern) we use the REQUEST_URI server variable instead. Note that REQUEST_URI contains the full URL-path, including the slash prefix, so the slash should be omitted from the susbstitution string.
The L (last) flag is required to prevent any further directives being processed that occur later in the file (in the current round of processing). If this is the last mod_rewrite directive in the file then it is superfluous. Note, however, that in .htaccess the rewriting process essentially starts over (until the URL passes through unchanged), so other directives might still process the request.

How can I get rid of trailing slash for index.html

How do I get rid of the trailing slash on my site?
For example, I have an index.html at
http://example.com/examplepage/index.html
Instead of
http://example.com/examplepage/
I would like the URL to be
http://example.com/examplepage
Could I do this with .htaccess?
The trailing slash is really important for apache, without it, even if you have an index.html sitting in the examplepage folder, people will be able to see the contents of your folders. Apache deals with this by having a module loaded by default that redirects the browser to include the trailing slash everytime a directory/folder is accessed. You can turn that off but it's noted in the documentation that there's a major security concern when you do that; mainly, the contents of your folders can be viewed regardless of having an index file or not.
So you can turn this off, but you probably want to still have the trailing slash at least internally. You can do that with mod_rewrite:
# turn off the mechanism to always redirect to the trailing slash
DirectorySlash Off
# Internally add the trailing slash
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond $1 .*[^/]$
RewriteRule ^(.*)$ /$1/ [L]
That should allow you to access http://example.com/examplepage without getting redirected to http://example.com/examplepage/.

.htaccess to show a directory index.html without a trailing slash

I've got a Jekyll generated site running on an Apache server and I'm having some trouble getting my .htaccess file set up correctly. Jekyll places index.html files into folders which represent each page so my URLs currently look like domain.com/foo/
I'd like to remove that trailing slash from the URL so that it exactly matches what I had set up previously (and also because I think it looks better).
Currently the section of my .htaccess file dealing with rewites looks like:
<IfModule mod_rewrite.c>
RewriteCond %{HTTPS} !=on
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
</IfModule>
Options -Indexes
DirectoryIndex index.xml index.html
I have tried following the advice here but that puts me into a redirect loop.
Can anybody help me out? In brief, what I want is for a domain.com/foo URL to show the index.html file form the /foo directory and for domain.com/foo/ and domain.com/foo/index.html to redirect to domain.com/foo.
You should be able to use this to turn off the addition of slashes.
DirectorySlash Off
Note that the trailing slash is added for a good reason. Having the trailing slash in the directory name will make relative URLs point at the same thing regardless of whether the URL ends with "foo/bar/index.html" or just "foo/bar/". Without the trailing slash, relative URLs would reference something up one level from what they normally point at. (eg: "baz.jpg" would give the user "/foo/baz.jpg" instead of "/foo/bar/baz.jpg", as the trailing "bar" will get removed if it isn't protected by a trailing slash.) So if you do this, you probably want to avoid relative URLs.
To then rewrite the directory name to return the index.html you could probably do something like this:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}/index.html -f
RewriteRule ^(.*)$ /$1/index.html [L]
This checks if REQUEST_URI/index.html exists, and if it does performs an internal redirect.