RewriteRule overrides ProxyPass - apache

On a centos 7 machine, I'd like to run a python server alongside an apache server. I figured the easiest way would be to configure apache as a reverse proxy. This is my VirtualHost configuration:
<VirtualHost *:443>
DocumentRoot /home/username/mydomain/src
ServerName mydomain.com
ErrorLog logs/mydomain-error_log
CustomLog logs/mydomain-access_log common
DirectoryIndex index.php
<Directory /home/username/mydomain/src>
Options -Indexes +FollowSymLinks
AllowOverride None
Require all granted
AddOutputFilterByType DEFLATE text/html text/plain text/xml
</Directory>
ProxyPreserveHost On
ProxyPass /mediaproxy http://127.0.0.1:9001/mediaproxy
ProxyPassReverse /mediaproxy http://127.0.0.1:9001/mediaproxy
LogLevel alert rewrite:trace6
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/api/media/(.*) /data/$1 [L]
RewriteRule ^/api/v1/* /api/v1/index.php [L]
RewriteRule ^/assets/(.*) /site/v1/content/assets/$1 [L]
RewriteRule ^/css/(.*) /site/v1/content/css/$1 [L]
RewriteRule ^/js/(.*) /site/v1/content/js/$1 [L]
RewriteRule ^/fonts/(.*) /site/v1/content/fonts/$1 [L]
RewriteRule ^/* /index.php [L] # problematic rule
// lets encrypt entries
Now, my problem is that rewrite rules takes precedence over ProxyPass. That ism when I visit mydomain.com/mediaproxy/somepage, it serves the content at /index.php, specified with RewriteRule ^/* /index.php [L] . Reverse proxy works correctly if I remove the problematic rule. Unfortunately I need to keep it.
How do I tell apache to use ProxyPass rule first, and use RewriteRule only if there is no match?

RewriteRule ^/* /index.php [L] # problematic rule
Your rule rewrites everything. You could just make an exception for the URL-path you want to proxy. For example:
RewriteRule !^/mediaproxy /index.php [L]
The ! prefix on the RewriteRule pattern negates the expression. So it is successful when it does not match.
This now rewrites everything except URL-paths that start /mediaproxy.
Note that the trailing * quantifier in the regex ^/* repeats the preceding token 0 or more times. The preceding token in this instance is the slash. You are missing the preceding . (dot). Or omit the .* entirely as it's superfluous (and less efficient).
Aside:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/api/media/(.*) /data/$1 [L]
RewriteRule ^/api/v1/* /api/v1/index.php [L]
RewriteRule ^/assets/(.*) /site/v1/content/assets/$1 [L]
RewriteRule ^/css/(.*) /site/v1/content/css/$1 [L]
RewriteRule ^/js/(.*) /site/v1/content/js/$1 [L]
RewriteRule ^/fonts/(.*) /site/v1/content/fonts/$1 [L]
RewriteRule ^/* /index.php [L] # problematic rule
The two conditions (RewriteCond directives) are not doing anything here. When used in a virtualhost context, REQUEST_FILENAME is the same as REQUEST_URI, since it is processed early, before the request is mapped to the filesystem. Consequently, both (negated) conditions will always be successful and the following rule is always processed. In a vhost context you need to use a lookahead, ie. LA-U:REQUEST_FILENAME, OR construct the file-path using the DOCUMENT_ROOT server variable, OR move the rules into a directory context.
However, those two conditions only apply to the first rule that follows. So all the remaining rules (including the last "problematic" rule) are processed unconditionally anyway. This is generally incorrect for a front-controller pattern (the last rule) and should perhaps be written like this instead:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteRule !^/mediaproxy /index.php [L]
This now rewrites everything except URL-paths that do not start /mediaproxy AND do not map to a directory AND do not map to a file.
Alternatively, if these condtions should be applied to all rules then create a negated rule instead. For example:
DirectoryIndex index.php
RewriteEngine on
# Prevent further processing if root directory or "index.php" requested
RewriteRule ^/(index\.php)?$ - [L]
# Prevent further processing if the request maps to a directory or file
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f
RewriteRule ^/. - [L]
RewriteRule ^/api/media/(.*) /data/$1 [L]
# This rule is not required since the DirectoryIndex handles this case (the regex is also "incorrect").
#RewriteRule ^/api/v1/* /api/v1/index.php [L]
RewriteRule ^/assets/(.*) /site/v1/content/assets/$1 [L]
RewriteRule ^/css/(.*) /site/v1/content/css/$1 [L]
RewriteRule ^/js/(.*) /site/v1/content/js/$1 [L]
RewriteRule ^/fonts/(.*) /site/v1/content/fonts/$1 [L]
RewriteRule !^/mediaproxy /index.php [L]

Related

htaccess RewriteRule with a variable not working

This rule takes us to the error page
RewriteRule ^latest/([A-Za-z0-9]+)$ latest?auth=$1 [NC,L]
I have the following in my .htaccess file
<IfModule mod_rewrite.c>
RewriteRule ^sucuri-(.*)\.php$ - [L]
</IfModule>
# END - Allow Sucuri Services
<Files 403.shtml>
order allow,deny
allow from all
</Files>
ErrorDocument 404 /404.php
Options +FollowSymLinks
Options +MultiViews
RewriteEngine on
RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R,L]
RewriteCond %{HTTP_HOST} !^www.xxxxx.com$ [NC]
RewriteRule ^(.*)$ https://www.xxxxx.com/$1 [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ ci_index.php?/$1 [L]
## Remove php extension
RewriteCond %{REQUEST_URI} !^/index.php$
RewriteRule ^([^\.]+)$ $1.php [NC,L]
RewriteRule ^latest/([A-Za-z0-9]+)$ latest?auth=$1 [NC,L]```
With the following rule
```RewriteRule ^latest/([A-Za-z0-9]+)$ latest?auth=$1 [NC,L]```
Trying to achieve the following -
```https://www.xxxxx.com/latest?auth=US-mobile-county
to
https://www.xxxxx.com/latest/US-mobile-county```
This rule takes us to the error page RewriteRule ^latest/([A-Za-z0-9]+)$ latest?auth=$1 [NC,L]
You've not stated precisely what "error page" you are referring to? Or what is expected to handle this request. This directive is not correct by itself, so it's not immediately clear what it is you are trying to do. I'm assuming the intention is to rewrite to latest.php (not latest as suggested by this rule, and mentioned later in the question) - since this would seem to be the only reason to implement such a rule (and your question is tagged php). By rewriting to latest only you are dependent on other directives appending the .php extension - and therein lies a conflict.
There are a number of issues with the directives as posted that is preventing this from working. Notably, the rules are in the wrong order and the use of MultiViews (probably in an attempt to get extensionless URLs working) is compounding matters. In fact, it doesn't look like the rule in question is actually being processed at all.
Without MultiViews, and due to the order of the directives, a request of the form /latest/something would be rewritten to /ci_index.php?/latest/something (presumably a CodeIgniter front-controller) which I would guess would result in a CI generated 404 response. However, since MultiViews has been enabled mod_negotiation first "rewrites" the request to /latest.php/something, which doesn't match any of your rules so either results in a 404 (depending on your server config) or calls latest.php but without any URL parameter, which presumably causes your script to fail?
https://www.xxxxx.com/latest/US-mobile-county
Also, note that your example URL contains hyphens (-), but the regex in your directive (ie. ^latest/([A-Za-z0-9]+)$) does not permit hyphens so it wouldn't have matched anyway.
Try the following instead, replacing everything after the ErrorDocument directive:
# Disable MultiViews
Options +FollowSymLinks -MultiViews
RewriteEngine on
# Redirect HTTP to HTTPS
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
# Redirect non-www to www
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteRule ^ https://www.example.com%{REQUEST_URI} [R=301,L]
# Rewrite "/latest/something" to "/latest.php?auth=something"
RewriteRule ^latest/([A-Za-z0-9-]+)$ latest.php?auth=$1 [L]
# Allow extensionless PHP URLs to work
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
RewriteRule ^([^.]+)$ $1.php [L]
# Front-controller
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ ci_index.php?/$1 [L]
Note that I've reversed the order of the directives so that the rule in question is now first and the CI front-controller is now last. The order of the directives in .htaccess is important.
Since you had enabled MultiViews (now disabled in the above), your rule to enable PHP extensionless URLs (that you had labelled "Remove php extension") was not actually being used at all (unless you had directories or files that contained dots, other than that used to delimit the file extension).

SEO URL and Rewrite Rule force on one domain

I have a rewrite rule to force user to go from example.com to www.example.com. This is for SEO reasons so that I have no duplicated websites and content on my Google results.
# BEGIN Spark
AddDefaultCharset UTF-8
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP:Authorization} ^(.*)
RewriteRule .* - [e=HTTP_AUTHORIZATION:%1]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.php [QSA,L]
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
</IfModule>
<IfModule mod_deflate.c>
<FilesMatch "\.(html|php|txt|xml|js|css|ttf|otf|ico|json|svg|)$">
SetOutputFilter DEFLATE
</FilesMatch>
</IfModule>
# END Spark
My question. If now people post a link example.com/news they always will redirect to the frontpage like www.example.com. How can I manage that they still can use and post short URLs like example.com/news1 or example.com/news2 and will redirect to www.example.com/news1 or www.example.com/news2 respectively.
You've put the rule in the wrong place. It needs to go before the internal rewrite (to the front-controller: index.php). eg. at the top of the file, not at the end.
By placing it last it will only correctly redirect physical directories (which includes the homepage) and static resources (images, CSS, JS, etc.). All other "short URLs" that are routed through the front-controller will be redirected to index.php and you'll see the "frontpage".
And... to avoid a double redirect when requesting example.com with a URL that contains a trailing slash then include an absolute URL in the trailing slash removal rule and include your non-www to www rule immediately after that.
For example:
RewriteEngine On
# Remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ https://www.example.com/$1 [L,R=301]
# Redirect non-www to www
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP:Authorization} ^(.*)
RewriteRule .* - [e=HTTP_AUTHORIZATION:%1]
# Front-controller
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^ index.php [L]
You will need to clear the browser cache before testing since the erroneous 301 (permanent) redirects will have been cached by the browser. (Test with 302 - temporary - redirects to avoid caching issues.)
Aside: An additional concern is that these directives are inside a # BEGIN Spark ... # END Spark code block - which makes it look as if these directives are perhaps maintained by some automated process? In which case, they might be overwritten?

.htaccess RewriteRule with existing file and folder

My web structure looks like this:
public_html/
/images/
/user/
/userimage1.jpg
/userimage2.jpg
/userimage3.jpg
/icons/
/index.php
/user.php
...
I have 2 domains: example.com and images.example.com and I want to use a .htaccess RewriteRule that the images.example.com subdomain leads to the /images/-folder but also to use URLs without the file extension.
My .htaccess looks like this:
<IfModule mod_rewrite.c>
Options +FollowSymLinks +MultiViews
RewriteEngine on
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
RewriteCond %{HTTP_HOST} ^images\.example\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/images/
RewriteRule ^(.*)$ /images/$1 [NC,L]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}\.php -f
RewriteRule ^(.*)$ $1.php [L]
</IfModule>
Now, https://example.com/user/ works fine, but when I try to open https://images.example.com/user/userimage1.jpg it says that %{REQUEST_URI} is /images/redirect:/images/user.php/userimage1.jpg
Unfortunately, both, the domain and the subdomain have to be installed with public_html as the root folder.
How do I have to adept my .htaccess file so that both URLs, https://example.com/user/ and https://images.example.com/user/userimage1.jpg work fine?
You have a conflict with MultiViews (which you've enabled at the top). The fact that "https://example.com/user/ works fine" (with a trailing slash) is because of MultiViews, not because of your mod_rewrite directives. (The mod_rewrite directives as written would only "work" with /user - no trailing slash.)
When you request https://images.example.com/user/userimage1.jpg, MultiViews triggers an internal subrequest for /user.php/userimage1.jpg (/user.php with additional path-info /userimage1.jpg), but mod_rewrite has also tried to rewrite the request (an internal "redirect") - hence the seemingly malformed rewrite.
Generally, you need to avoid using MultiViews with mod_rewrite rewrites - a common cause of conflict.
Try the following instead:
Options +FollowSymLinks -MultiViews
RewriteEngine on
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
# Rewrite images subdomain
RewriteCond %{HTTP_HOST} ^images\.example\.com$ [NC]
RewriteCond %{REQUEST_URI} !^/images/
RewriteRule ^(.*)$ /images/$1 [L]
# Append .php file extensions
RewriteCond %{DOCUMENT_ROOT}/$1 !-d
RewriteCond %{DOCUMENT_ROOT}/$1\.php -f
RewriteRule ^(.*)/$ $1.php [L]
Note that I've included the trailing slash in the RewriteRule pattern and taken this out of the capturing subpattern - this is assuming that the trailing slash is mandatory on your URLs (as in your example).
You don't need the <IfModule> wrapper unless mod_rewrite really is optional? (It's not.)

How to remove `/index.php` from all urls, including the last trailing slash?

I've been looking around the web and especially here on stackoverflow for THE answer to this question.
I only use directories with an index.php file in it. Includables and private stuff are outside the public_html (cpanel) directory.
By using directories, anchor links and queries will look like:
http://domain.com/sub/#anchor or http://domain.com/sub/?query
And I just don't like it. People keep telling me that this is irrelevant for SEO but I don't think that this is the case. Firstly because I want consistency, secondly, trailing slashes creates duplicates thus I need 301 redirects! Consistency and duplicates are indeed a SEO problem!
This is just an example of how my website is structured:
/
·--index.php
|
·--/about-us/
| |
| ·--index.php
·--/contact-us/
|
·--index.php
Users will never know that about-us is a directory, and they won't type the last trailing slash anyway. This creates duplicates and HTTP errors.
On the web I've found only non-working examples, and as for what I've understood, I have to internally add the trailing slash and index.php. This helps to avoid security problems and makes all the thing work!
Here, and here I asked something similar. But in that case I had to create the /public directory. Now I am managing to change host, and I will be able to use the non public directory to store php files.
Since I don't need the /public directory anymore, I copied part of that code and pasted it on a new .htaccess.
Here is the .htaccess
# Options
Options +FollowSymLinks -MultiViews -Indexes
DirectoryIndex index.php index.html
DirectorySlash off
# Enable Rewrite Engine
RewriteEngine on
RewriteBase /
# www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteCond %1##%{HTTPS}s ^(.+)##(?:on(s)|)
RewriteRule ^ http%2://%1%{REQUEST_URI} [L,R=301,NE]
# remove trailing slash from all URLs
RewriteCond %{THE_REQUEST} \s(.+?)/+[?\s]
RewriteRule ^(.+)/$ /$1 [R=301,L,NE]
# To externally redirect /dir/file.php to /dir/file
RewriteCond %{THE_REQUEST} \s/+(.+?)\.php[\s?] [NC]
RewriteRule ^ /%1 [R=301,NE,L]
# internally add trailing / to directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule !/$ %{REQUEST_URI}/ [L]
# To internally forward /dir/file to /dir/file.php
RewriteCond %{DOCUMENT_ROOT}/$1.php -f [NC]
RewriteRule ^(.+?)/?$ $1.php [L]
<FilesMatch ^\.>
order allow,deny
deny from all
</FilesMatch>
<Files *.inc>
order allow,deny
deny from all
</Files>
The code seems to work well in subdirectories but not in root. I guess that this is because the code above was tailored to work on the /public subdirectory.
How can I make it work on root too?
To sum-up, I need 301 redirects, otherwise there will be duplicates of content!!
http://www.domain.com/ R--> http://domain.com/
http://domain.com/ R--> http://domain.com
http://domain.com/sub/ R--> http://domain.com/sub
http://domain.com/sub/index.php R--> http://domain.com/sub
http://domain.com/sub/index.php#anchor R--> http://domain.com/sub#anchor
http://domain.com/sub/#anchor R--> http://domain.com/sub#anchor
You can use these rules to meet all the requirements including removal of index.php. Do remember that anchors cannot be preserved on server side as server won't even get #anchor in the HTTP request i.e. only http://domain.com/sub/ will be received in Apache logs.
# Options
Options +FollowSymLinks -MultiViews -Indexes
DirectoryIndex index.php index.html
DirectorySlash off
# Enable Rewrite Engine
RewriteEngine on
RewriteBase /
# www to non-www
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteCond %1##%{HTTPS}s ^(.+)##(?:on(s)|)
RewriteRule ^ http%2://%1%{REQUEST_URI} [L,R=301,NE]
# remove index.php
RewriteCond %{THE_REQUEST} \s/*(/.*)?/index\.php[?\s] [NC]
RewriteRule ^ %1 [L,R=301,NE]
# remove trailing slash from all URLs
RewriteCond %{THE_REQUEST} \s(.+?)/+[?\s]
RewriteRule ^(.+)/$ /$1 [R=301,L,NE]
# To externally redirect /dir/file.php to /dir/file
RewriteCond %{THE_REQUEST} \s/+(.+?)\.php[\s?] [NC]
RewriteRule ^ /%1 [R=301,NE,L]
# internally add trailing / to directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule [^/]$ %{REQUEST_URI}/ [L]
# To internally forward /dir/file to /dir/file.php
RewriteCond %{DOCUMENT_ROOT}/$1.php -f [NC]
RewriteRule ^(.+?)/?$ $1.php [L]
<FilesMatch ^\.>
order allow,deny
deny from all
</FilesMatch>
<Files *.inc>
order allow,deny
deny from all
</Files>
Don't trim trailing slashes, you're fighting the normal directory structure of URLs. There is no such URL as http://domain.com even if your browser hides it, the slash is still being sent in the request and will be added back to a URL copied from an address bar.
Doing a redirect for a single domain is good. Automatic 301 redirects (the kind that Apache does) from http://domain.com/foo to http://domain.com/foo/ are good for SEO. How are index.php strings in your URLs in the first place? There's no need for them to be there; make sure all your own links are free of them. The easiest way to ensure only unique pages are indexed is with <link rel=canonical href="…">.

How to prevent ReweiteRules to match subdomains

How can I limit my mod_rewrite RewriteRules to only apply to the www and no-subdomain?
The subdomains are in different folders -- rather website root folder, however all of my RewirteRules apply to all of the subdomains, which is not what I want.
I know that it's possible to match every RewriteRule by a RewriteCond that only matches www and no-subdomain, but then I have to repeat the same thing for all the RewriteRules, which is not what I want again.
So I was wondering if there is any way to globally prevent RewriteRules to apply to other subdomains? I can also place .htaccess files in each subdomain as well to prevent matching, if that's a possibility.
Here is part of what I have in .htaccess right now:
options -Indexes -MultiViews +FollowSymLinks
Header set Access-Control-Allow-Origin *
RewriteEngine On
RewriteBase /
RewriteRule ^city/([^/].+)/([^/].+)/([^/].+)/$ index.php?page=$3&city=$1 [L]
RewriteRule ^city/([^/].+)/([^/].+)/$ index.php?city=$1 [L]
RewriteRule ^flights/([^/].+)/$ index.php?page=flights&mode=$1 [L]
RewriteRule ^health/([^/].+)/$ index.php?page=health&view=$1 [L]
# so on ...
RewriteRule ^([^/].+)/$ index.php?page=$1 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule . index.php [L]
Update #1
Just to explain the problem better, right now sub.domain.com shows domain.com instead of it's actual content.
You can insert this single rule below RewriteBase line to ignore all sub-domains from rest of the rules:
# ignore all sub domains
RewriteCond %{HTTP_HOST} !^(www\.)?example\.com$ [NC]
RewriteRule ^ - [L]
Replace example.com with your actual domain.