Apache mod_rewrite/mod_proxy - re-write last part of URI as query string? - apache

We have a web resource that can be accessed with a URL/URL of the form:
http://[host1]:[port1]/aaa/bbb.ccc?param1=xxx&param2=yyy...
However, we are working with an external (i.e., not developed by us, so not under our control, i.e., we can't change it) client app that is attempting to access our resource with a URL that looks like:
http://[host2]/[port2]/ddd/fff/param1=xxx&param2=yyy...
In other words, the client is including the "query string" (the ?param1=xxx&param2=yyy... part) as if it's part of the URI, instead of as a proper query string.
We have a separate Apache proxy instance, and we're thinking that we could use that with some RewriteCond/RewriteRule to take the incoming requests (the ones with the query string at the end of the "URI", and without the "?") and rewrite the URI to a "proper" URI with a "proper" query string and then use that modified/re-written URI to access our resource via proxy.
We'd also like to do that without having an HTTP re-direct (e.g., 30x) going back to the client, because it appears that they may not be able to handle such a re-direct.
I've been trying various things, but I'm not that familiar with Apache mod_rewrite, so I was wondering if someone could tell me (1) if this is possible and (2) suggest what RewriteCond/RewriteRule would accomplish this?
P.S. I have gotten some progress. The following re-writes the URL correctly, but when I test, I'm seeing a 302 redirect to the re-written URL, instead of Apache just proxying immediately to the re-written URL. Is it possible to do this without the re-direct (302)?
<Location /test/users/>
RewriteEngine on
RewriteCond %{REQUEST_URI} ^/(.*)/param1=
RewriteRule ^/(.*)/param1=(.*) http://192.168.0.xxx:yyyy/aaa/bbbbb.ccc?base=param1=$2
</Location>
Thanks, Jim

Related

301 Redirect in .htaccess for re-submitting URL-s

I want to ask how do I redirect Search Engines to take a second look on my new, fresh, rewritten URL-s?
So, my former URL-s were structured like this :
http://www.sample.com/tutorials.php?name=something
and now they look much more cleaner and better :
http://www.sample.com/tutorials/programming/something.php
So, as I said, I want Google (and other engines) to take a look at my new links, which are much more SEO friendly and for that I will be indexed better.
I was told the 301 redirect method was the best, but I don't have a clue what is it, how it works and where to learn how to use it. So, I am asking you.
Side note : Would updating my sitemap.xml file and re-submitting it to Google Webmaster Tools help in this process?
Thanks in advance!
There are 2 kinds (in this context) redirects. When a client, be it a browser, search engine indexing bot, or whatever, requests a URI, the server can tell the client "What you are looking for exists, but it's somewhere else". In the case of a 302 or temporary redirect, it's essentially telling the client "What you are looking for exists, but it's temporarily over here at this URL". In the case of a 301 or permanent redirect, it's essentially telling the client "What you are looking for exists, but it has permanently moved over to this URL".
In the case of the later, browsers, proxy servers, and search engine indexes know that the old URL is no longer valid and to stop using it, and from now on to use the new URL that was returned by the server via a 301 redirect. In the case of a search engine like Google, it has an index of the old URL and all the data that its accumulated over the lifetime of that URL assoicated with it. When one of its bots sees a 301, it knows that the old URL, and its content, isn't gone, but it just permanently moved to another URL. All of the associated data Google has collected for the old URL gets trasnfered to the new URL. Google can probably figure most of this stuff out without a 301 redirect, but it's a sure way to make sure Google has got a right.
You can do such a redirect via mod_rewrite:
RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /tutorials\.php\?name=([^&\ ]+)
RewriteRule ^ /tutorials/programming/%1.php [L,R=301]
You should put this near the top of the htaccess file in your document root. The condition checks that an actual request has been made for /tutorials.php with a query string name="something". The "something" part gets grouped by the match and is accessed via the %1 backreference.
The 301 redirect is a response that the server can make which signals to the user (or search engine) that the page they are looking for has been permanently moved to a specified other page. It is possible to configure apache to give a 301 for certain urls, but it is probably easier to have the whatever server-side language you are using take the request, and then issue a 301.
The chances are that Google will work out what is going on fairly quickly without 301s or anything else, but submitting a sitemap to them or using the URL Parameters functionality in Google's Webmaster Tools might help.

Understanding difference between redirect and rewrite .htaccess

I'd like to understand the difference between redirecting and rewriting a URL using .htaccess.
So here's an example: Say I have a link like www.abc.com/ index.php?page=product_types&cat=88 (call this the "original" url)
But when the user types in abc.com/shoes (let's call this the "desired" url), they need to see the contents of the above link. To accomplish this, I would do this:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteRule ^(.*)shoes(.*)$ index.php?page=product_types&cat=88
Nothing wrong with this code and it does the trick. However, if I type in the original url in the address bar, the content comes up, but the url does not change. So it remains as www.abc.com/index.php?page=product_types&cat=88
But what if I wanted the desired url (/shoes) to show up in the address bar if I typed in www.abc.com/ index.php?page=product_types&cat=88? How would this be accomplished using .htaccess? Am I running into a potential loop?
Some of the explanation can be found here: https://stackoverflow.com/a/11711948/851273
The gist is that a rewrite happens solely on the server, the client (browser) is blind to it. The browser sends a request and gets content, it is none the wiser to what happened on the server in order to serve the request.
A redirect is a server response to a request, that tells the client (browser) to submit a new request. The browser asks for a url, this url is what's in the location bar, the server gets that request and responds with a redirect, the browser gets the response and loads the URL in the server's response. The URL in the location bar is now the new URL and the browser sends a request for the new URL.
Simply rewriting internally on the server does absolutely nothing to URLs in the wild. If google or reddit or whatever site has a link to www.abc.com/index.php?page=product_types&cat=88, your internal server rewrite rule does absolutely nothing to that, nor to anyone who clicks on that link, or any client that happens to request that URL for any reason whatsoever. All the rewrite rule does is internally change something that contains shoes to /index.php?page=product_types&cat=88 within the server.
If you want make it so a request is made for the index.php page with all of the query strings, you can tell the client (browser) to redirect to the nicer looking URL. You need to be careful because rewrite rules loop and your redirect will be internally rewritten which will cause a redirect which will be internally rewritten, etc.. causing a loop and will throw a 500 Server Error. So you can match specifically to the request itself:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?page=product_types&cat=88
RewriteRule ^/?index.php$ /shoes [L,R=301]
This should only be used to make it so links in the wild get pointed to the right place. You must ensure that your content is generating the correct links. That means everything on your site is using the /shoes link instead of the /index.php?page=product_types&cat=88 link.

should redirect to www cause a one second delay?

I am using Magento 1.6.2.0 on a shared host running Litespeed web server, and I have begun investigating ways to speed up page loads. Currently I am using Pingdom to look at requests and it appears that I am losing an entire second from the get-go when I type my URL without the www. The browser redirects to the www page, it's just that it takes so long. Is this something I can fix? I presume that I can change Magento's base-url to not include the www, but then I presume I'll have the same delay when going to the www url instead.
I took a look at the link you gave, and I indeed see an about 1 second delay before I receive a 302 redirect to the URL with www. prepended. Not entirely coincidentally, the actual page HTML also takes quite long (about 1.7 seconds) to load.
This is a fairly common issue with large web applications: to return even a simple response like a redirect, the entire application must load and run its startup code. Couple this with a not so fast shared webserver that isn't optimized for that one application, and you can get quite slow page load times. It's nothing unique to Magento; I've seen the same effect with MediaWiki myself, and I expect that it happens with other applications too.
The obvious solution is just to avoid redirects: as long as you make sure all your URLs have the right hostname, the extra delay due to wrong hostnames will not appear. Magento itself will presumably take care of this for its own URLs, but if you have any other code (or static pages) that link to your Magento URLs, make sure they use the right hostname.
You can also sign up for Google Webmaster Tools (and similar tools for other search engines) and configure your preferred domain there (it's under Site configuration → Settings) so that Google will automatically prepend www. to any links to your site it indexes.
You can (and should) also try to reduce Magento's startup time in general. This will speed up not only redirects, but all other page loads as well. I'm not familiar enough with Magento to be able to give much detailed advice on this, but the obvious first step for any massive PHP application is to make sure you're using a PHP accelerator such as APC.
Finally, the fastest way to redirect visitors to the correct hostname is to make your webserver send the redirect directly without ever invoking Magento at all. The details on how to do this depend on the server software you're using, but apparently LiteSpeed supports the same RewriteRule syntax as Apache's mod_rewrite, so you should be able to do this just by adding the following lines to your main .htaccess file:
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.mmmspeciosa\.com$ [NC]
RewriteRule ^(.*)$ http://www.mmmspeciosa.com/$1 [R=301,L]
(By the way, I'm using HTTP 301 permanent redirects here instead of the HTTP 302 temporary redirects Magento seems to be using. This is not only more appropriate according to the HTTP standard, but also works better with search engines, which treat a 301 redirect as an indication to index the target URL instead of the source of the redirect. If this redirect type is not configurable in Magento, I would consider it a bug. If it is configurable, you should set it to 301.)

Preventing a double redirect with htaccess

I'm trying to create a redirect in-between page of sorts, because the URL that I'm redirecting TO includes more information than the URL I'm redirecting FROM. I'm using a short domain (hrci.me) with an htaccess file to redirect to the full domain (currently reachchallenges.infectionist.com). An example would be:
hrci.me/ch123
The path, ch123, includes the identifier that lets me know it's a challenge link (ch), and the 123 is the challenge ID. Each challenge has a title that I like to append to the end of the URL for SEO purposes. This example URL would redirect to:
reachchallenges.infectionist.com/challenge/123/Challenge+Title
The "Challenge+Title" part is stored in the database and needs to be retrieved by the challenge id, so I wrote a simple PHP script that does just that and then handles the redirect itself. My htaccess rule looks like this:
RewriteRule ^ch([0-9]{1,4})(/)?$ redirhandler.php?chid=$1 [L]
So the request to /ch123 should redirect to redirhandler.php?chid=123, which would get the title then redirect to the other domain at /challenge/123/Challenge+Title. The problem is, the short domain is set up to forward all incoming requests to the long domain, maintaining the original path (so hrci.me/something would redirect to reachchallenges.infectionist.com/something), and I'm finding that after the htaccess handles the rewrite to redirhandler.php, it then redirects that to reachchallenges.infectionist.com/redirhandler.php...
Basically, I need it to ignore any further redirects if the path is redirhandler.php, allowing the php script to handle the rest o the redirect. I'm thinking a RewriteCond is how I might do this, but I can't figure it out.
It sounds like your rule that forwards all incoming requests to the long domain is higher up in the .htaccess file than the more specific rule for /ch* requests. Try putting the more specific rule before the more general one.

Prevent users from accessing files using non apache-rewritten urls

May be a noob question but I'm just starting playing around with apache and have not found a precise answer yet.
I am setting up a web app using url-rewriting massively, to show nice urls like [mywebsite.com/product/x] instead of [mywebsite.com/app/controllers/product.php?id=x].
However, I can still access the required page by typing the url [mywebsite.com/app/controllers/product.php?id=x]. I'd like to make it not possible, ie. redirect people to an error page if they do so, and allow them to access this page with the "rewritten" syntax only.
What would be the easiest way to do that? And do you think it is a necessary measure to secure an app?
In your PHP file, examine the $_SERVER['REQUEST_URI'] and ensure it is being accessed the way you want it to be.
There is no reason why this should be a security issue.
RewriteCond %{REDIRECT_URL} ! ^/app/controllers/product.php$
RewriteRule ^app/controllers/product.php$ /product/x [R,L]
RewriteRule ^product/(.*)$ /app/controllers/product.php?id=$1 [L]
The first rule will redirect any request to /app/controllers/product.php with no REDIRECT_URL variable set to the clean url. The Rewrite (last rule) will set this variable when calling the real page and won't be redirected.