Virtual Hosts (Apache) with mod_rewrite issues - apache

I am trying to fix this whole day without success, so I hope someone might be able to help me. I have an app at http://localhost/, and it uses Pylons for the app I am hosting. In addition to that, I need to host a PHP/MySQL site, so I had to use Apache too.
My current setup is that I use haproxy with this config for the Apache backend:
backend apache
mode http
timeout connect 4000
timeout server 30000
timeout queue 60000
balance roundrobin
server app02-8002 localhost:8002 maxconn 1000
This is triggered by this:
acl image url_sub images
use_backend apache if image
So, when I open my IP/images, it will trigger that and open Apache then, with port 8002.
For Apache, I created virtual hosts, and this is the "image" one:
<VirtualHost *:8002>
ServerAdmin my#email.com
ServerName image
ServerAlias image
DocumentRoot /srv/www/image/public_html/
ErrorLog /srv/www/image/logs/error.log
CustomLog /srv/www/image/logs/access.log combined
</VirtualHost>
So, that all works nicely, when I type IP/images it open the /srv/www/image/public_html. But then the issues come. As I am using the image uploading script, it involves a lot of rewriting, so I had to enable that mod. This is the .htaccess which is located in the public_html/images folder (I somehow had to make this subfolder too, to "match" the URL with the actual location in the public_html.
SetEnv PHP_VER 5_3
RewriteEngine On
# You must define your installation directory and uncomment the line :
RewriteBase /images/
RewriteRule ^([a-zA-Z]+)\.(jpg|gif|png|wbmp)$ controller/Resizer.php?m=original&a=$1&e=$2 [L]
RewriteRule ^(icon|small|medium|square)\/([a-zA-Z]+)\.(jpg|gif|png|wbmp)$ controller/Resizer.php?m=$1&a=$2&e=$3 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) application.php?request=$1 [L,QSA]
So, basically, this is somethow not working. I suppose there is a conflict between this virtual host, subdirectory, rewriting or something, but I can't seem to isolate it.
It is a bit confusing that when I open the IP/images/xxxx.jpg it opens the image, which is located in the public_html/images/upload/original folder, so the rewrite is working. The the other rules seem not to be working. All of the thumbnails and smaller versions are not rendering properly (with the icon, small, medium, square), so that makes the site quite unsusable.
Here is the link of the development server: http://localhost/images/
Thanks in advance for your time and help!

The first thing you should do is determine whether mod_rewrite is in fact part of the problem by accessing one of the failing URLs directly via its rewritten form and verifying that you get the expected result.
Indeed, the problem might simply be that the PHP script for the smaller resolutions "doesn't work" while it does for the original size ones. The first of the following URLs nicely served me an image; the second one is supposed to give me a smaller version of the same image, but served me an HTTP 500:
http://106.186.21.176/images/controller/Resizer.php?m=original&a=q&e=png
http://106.186.21.176/images/controller/Resizer.php?m=small&a=q&e=png
I got the same result (HTTP 500) for any of the smaller-size format names mentioned in your post, which matches your problem description.
Once you've verified that the script works as expected, it's likely that the problem is with mod_rewrite. If so, enable rewrite logging: use the RewriteLog directive to activate it, and RewriteLogLevel to control its verbosity. Especially at the higher log levels, it can give you very detailed information about exactly what it's doing. This should make the problem readily apparent from the logs.
Also, if possible, try to avoid configuring mod_rewrite rules in .htaccess files -- move them into your main server config file instead. The reason is explained on Apache mod_rewrite Technical Details, section "API phases":
Unbelievably mod_rewrite provides URL manipulations in per-directory context, i.e., within .htaccess files, although these are reached a very long time after the URLs have been translated to filenames. It has to be this way because .htaccess files live in the filesystem, so processing has already reached this stage. In other words: According to the API phases at this time it is too late for any URL manipulations. To overcome this chicken and egg problem mod_rewrite uses a trick: When you manipulate a URL/filename in per-directory context mod_rewrite first rewrites the filename back to its corresponding URL (which is usually impossible, but see the RewriteBase directive below for the trick to achieve this) and then initiates a new internal sub-request with the new URL. This restarts processing of the API phases.
Again mod_rewrite tries hard to make this complicated step totally transparent to the user, but you should remember here: While URL manipulations in per-server context are really fast and efficient, per-directory rewrites are slow and inefficient due to this chicken and egg problem. But on the other hand this is the only way mod_rewrite can provide (locally restricted) URL manipulations to the average user.
In general, not using .htaccess at all has the added advantage that you can tell Apache to not even bother and disable the functionality all together, which save Apache from having to scan each directory level it serves from for the .htaccess files.

Related

Rewrite subdomain.domain.com to domain.com/subdomain without redirect

I've read plenty of Stackoverflows but I seem to be missing something.
I have a PHP application running on https://subdomain.example.com/page/x but for SEO reasons I want people/bots to see https://example.com/subdomain/page/x.
I can rewrite the URL by using:
RewriteEngine on
RewriteCond %{HTTP_HOST} subdomain.example.com
RewriteRule ^(.*)$ https://example.com/subdomain/$1 [L,NC,QSA]
This rewrite results in: https://example.com/subdomain/page/x, but I keep recieving a 404 error since the "main" domain doesn't know the path /subdomain/page/x of course.
What I want is to have the URL https://example.com/subdomain/page/x but run it on https://subdomain.example.com/ in the background since this is the place where the PHP application is running.
Is this possible? How should I do this?
There is no strong SEO reason not to use subdomains. See Do subdomains help/hurt SEO? I recommend using subdirectories most of the time but subdomains when they are warranted.
One place where subdomains are warranted is when your content is hosted on a separate server in a separate hosting location. While it is technically possible to serve the content from a subdirectory from the separate server, that comes with its own set of SEO problems:
It will be slow.
It will introduce duplicate content.
From a technical standpoint, you would need to use a reverse proxy to on your example.com webserver to fetch content for the /subdomain/ subdirectory from subdomain.example.com. The code for doing so in the .htaccess file of example.com would be something like:
RewriteEngine on
RewriteRule ^subdomain/(.*)$ https://subdomain.example.com/$1 [P]
The [P] flag means "reverse proxy" which will cause the server to fetch the content from the remote subdomain. This will necessarily make it slower for users. So much so that it would be better for SEO to use a subdomain.
For this to work you would also need to leave the subdomain up and running and serving content for the main server to fetch. This causes duplicate content. You could solve this issue by implementing canonical tags pointing to the subdirectory.
This requires several Apache modules to be available. On my Debian based system I needed to run sudo a2enmod ssl proxy rewrite proxy_connect proxy_http and sudo service apache2 reload. I also had to add SSLProxyEngine on in my <VirtualHost> directive for the site I wanted to use this on.

How would you block all referring domains via .htaccess while allowing a certain one through. Not IP address. Script isn't working

I need to have a .htaccess file made to the following specifications.
Allow ONLY "exampledomain.com/sometext" to access "mydomain.com/folder"
Block all other referrers and redirect them to "google.com"
But I am not sure how to go about doing this.
So far this is what I have got. But it is not working.
<If "%{HTTP_HOST} != 'google.com'">
Redirect / http://www.yahoo.com/
</If>
Using the latest Apache. I really appreciate the help here. I have gone through a few other posts but can't seem to figure it out.
Does the 'google.com' have to include the full url or is their a way to make it be a wildcard like *google.com*?
This should point you into the right direction:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^mydomain\.com$
RewriteCond %{HTTP_REFERER} !^exampledomain\.com$
RewriteRule ^ https://www.google.com [R=301,END]
It is a good idea to start out with a 302 temporary redirection and only change that to a 301 permanent redirection later, once you are certain everything is correctly set up. That prevents caching issues while trying things out...
In case you receive an internal server error (http status 500) using the rule above then chances are that you operate a very old version of the apache http server. You will see a definite hint to an unsupported [END] flag in your http servers error log file in that case. You can either try to upgrade or use the older [L] flag, it probably will work the same in this situation, though that depends a bit on your setup.
This implementation will work likewise in the http servers host configuration or inside a distributed configuration file (".htaccess" file). Obviously the rewriting module needs to be loaded inside the http server and enabled in the http host. In case you use a distributed configuration file you need to take care that it's interpretation is enabled at all in the host configuration and that it is located in the host's DOCUMENT_ROOT folder.
And a general remark: you should always prefer to place such rules in the http servers host configuration instead of using distributed configuration files (".htaccess"). Those distributed configuration files add complexity, are often a cause of unexpected behavior, hard to debug and they really slow down the http server. They are only provided as a last option for situations where you do not have access to the real http servers host configuration (read: really cheap service providers) or for applications insisting on writing their own rules (which is an obvious security nightmare).

.htaccess file on localhost

I have a localhost on ubuntu 16. In the root localhost directory (/var/www/html/) i put this htaccess file.
AddDefaultCharset utf-8
RewriteEngine on
RewriteRule ^index?$ index.php
When I type localhost/index apache says me
The requested URL /index was not found on this server.
Is that an error in Apache configuration?
Basicly I want to make redirects to index.php in the root of my site and here I want to parse something like this localhost/cart/item/1 to array and then realize MVC. I am new in web dev and do not realy understand how can I do it, please help me.
You have to enable the interpretation of such dynamic configuration files (".htaccess" style files) first. They are disabled by default, since they slow down the server considerably. Usually it is preferable to place such rules directly in the servers static configuration files.
To enable them take a look at the AllowOverride command: https://httpd.apache.org/docs/2.4/mod/core.html#allowoverride
<Directory "/var/www/html">
AllowOverride All
</Directory>
So since you have to modify that configuration anyway... why don't you place your rewrite rules in there too? Easier, more robust and faster too...
Apart from that it sometimes is a good idea to implement rewrite rules in such way that they work in both locations, the http servers (virtual) host configuration and dynamic configuration files:
RewriteEngine on
RewriteRule ^/?index/?$ index.php [L]
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).
Maybe you need to create a folder, and then, work in your application inside there. Ex:
Create a var/www/html/project;
Put yout .htaccess inside;
Work with your application inside that folder.

Use Apache to load a page sitting on a different server with the same URL

We have a situation where ideally we would like a user to access a page on our site at a URL such as https://example.com/path/to/page. However, the HTML to render that page is sitting on an entirely different server (S3 to be exact) that we have control over, and we would like to render that page for that URL without redirecting (i.e. changing the URL itself).
I took a brief look at the Apache mod_proxy module, but it doesn't seem to do the job as we just get 500 or 404 errors. Here is an example entry from our .htaccess:
<IfModule mod_proxy.c>
RewriteRule "/path/to/page/(.*)$" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/$1" [P]
</IfModule>
Any help or a pointer in the right direction would be appreciated.
Most likely you stumble over the fact that you are using an absolute path inside a dynamic cohnfiguration files RewriteRule. Have a try with that instead:
RewriteEngine on
RewriteRule "/?path/to/page/(.*)$" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/$1" [P]
That slightly modified will work in dynamic configuration files and in the real http servers host configuration.
But as mentioned in the comment I wonder why you should not be able to use the proxy module directly to simplify things. You'd have to do that in in http servers host configuration though, this is not possible in dynamic configuration files:
ProxyRequests off
ProxyPass "/path/to/page/" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/"
ProxyPassReverse "/path/to/page/" "https://bucketname.s3-website-eu-west-1.amazonaws.com/path/to/page/"
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).

How can I redirect requests to specific files above the site root?

I'm starting up a new web-site, and I'm having difficulties enforcing my desired file/folder organization:
For argument's sake, let's say that my website will be hosted at:
http://mywebsite.com/
I'd like (have set up) Apache's Virtual Host to map http://mywebsite.com/ to the /fileserver/mywebsite_com/www folder.
The problem arises when I've decided that I'd like to put a few files (favicon.ico and robots.txt) into a folder that is ABOVE the /www that Apache is mounting the http://mywebsite.com/ into
robots.txt+favicon.ico go into => /fileserver/files/mywebsite_com/stuff
So, when people go to http://mywebsite.com/robots.txt, Apache would be serving them the file from /fileserver/mywebsite_com/stuff/robots.txt
I've tried to setup a redirection via mod_rewrite, but alas:
RewriteRule ^(robots\.txt|favicon\.ico)$ ../stuff/$1 [L]
did me no good, because basically I was telling apache to serve something that is above it's mounted root.
Is it somehow possible to achieve the desired functionality by setting up Apache's (2.2.9) Virtual Hosts differently, or defining a RewriteMap of some kind that would rewrite the URLs in question not into other URLs, but into system file paths instead?
If not, what would be the preffered course of action for the desired organization (if any)?
I know that I can access the before mentioned files via PHP and then stream them - say with readfile(..), but I'd like to have Apache do as much work as necessary - it's bound to be faster than doing I/O through PHP.
Thanks a lot, this has deprived me of hours of constructive work already. Not to mention poor Apache getting restarted every few minutes. Think of the poor Apache :)
It seems you are set to using a RewriteRule. However, I suggest you use an Alias:
Alias /robots.txt /fileserver/files/mywebsite_com/stuff/robots.txt
Additionally, you will have to tell Apache about the restrictions on that file. If you have more than one file treated this way, do it for the complete directory:
<Directory /fileserver/files/mywebsite_com/stuff>
Order allow,deny
Allow from all
</Directory>
Can you use symlinks?
ln -s /fileserver/files/mywebsite_com/stuff/robots.txt /fileserver/files/mywebsite_com/stuff/favicon.ico /fileserver/mywebsite_com/www/
(ln is like cp, but creates symlinks instead of copies with -s.)