Short-urls without index files? - apache

I'm wondering how urls like these are generated: http://www.example.com/Xj7hF
This is a practice I have seen used by many url shorteners as well as other websites that supposedly don't want to display data in the url in a parameter format.
Surely they can't be placing index files in the folder destination /Xj7hF etc with a redirect to the actual url, so I'm wondering how this is done.
Any help would be very appreciated!
(I'm running on a Linux server with Apache).

Different web development frameworks and web servers do it in different ways, but, the most common is probably using mod_rewrite with apache. Basically, the web server sends the request to a dynamic scripting language (eg. PHP) rewritten in such a way that the script doesn't need to know what the original request URI looked like and the client browser doesn't need to know what script actually processed the request.
For example, You will often see:
http://something.com/123/
This is a request for /123 which Apache may rewrite as a request to /my_script.php?id=123 based on how the user configured mod_rewrite.
(.htaccess example)
# if the request is for a file or directory that
# does not actually exist, serve index.php
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?url=$1

This is known as URL rewriting and is usually performed via proper configuration of the webserver. StackOverflow has several tags for this, so you should be able to find more information there.

Related

.htaccess redirects if the condition doe not match/ negative condition

I am modifying the .htaccess file of a legacy PHP web application. I am not familiar with apache .htaccess syntax. I found this tutorial. What I am trying to do is that I am trying to redirect all the requests to a URL/ path if the request URL is not a specific URL/ path. For example, all the requests to the website will be redirected to localhost/my-custom-page unless the request URL is localhost/my-custom-page.
I know how to redirect mapping 1 to 1 as follows:
RewriteEngine on
RewriteRule ^my-old-url.html$ /my-new-url.html [R=301,L]
But, what I am trying to do is that redirecting all the requests to the specific page unless the request is to that page. Even the home page will be redirected to that page. How can I do that?
When I tried the following solution
RewriteEngine on
RewriteCond %{REQUEST_URI} !/my-new-url\.html
RewriteRule ^ /my-new-url.html [R=301]
I get the error
I want to check using OR condition as well. For example, if the path is not path-one or path-two, redirect all the requests to path-one.
Your question is a bit vague, due to your wording. But I assume this is what you are actually looking for:
RewriteEngine on
RewriteCond %{REQUEST_URI} !/my-new-url\.html
RewriteRule ^ /my-new-url.html [R=301]
In case you receive an internal server error (http status 500) using the rule above then chances are that you operate a very old version of the apache http server. You will see a definite hint to an unsupported [END] flag in your http servers error log file in that case. You can either try to upgrade or use the older [L] flag, it probably will work the same in this situation, though that depends a bit on your setup.
It is a good idea to start out with a 302 temporary redirection and only change that to a 301 permanent redirection later, once you are certain everything is correctly set up. That prevents caching issues while trying things out...
This rule will work likewise in the http servers host configuration or inside a dynamic configuration file (".htaccess" file). Obviously the rewriting module needs to be loaded inside the http server and enabled in the http host. In case you use a dynamic configuration file you need to take care that it's interpretation is enabled at all in the host configuration and that it is located in the host's DOCUMENT_ROOT folder.
And a general remark: you should always prefer to place such rules in the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those dynamic configuration files add complexity, are often a cause of unexpected behavior, hard to debug and they really slow down the http server. They are only provided as a last option for situations where you do not have access to the real http servers host configuration (read: really cheap service providers) or for applications insisting on writing their own rules (which is an obvious security nightmare).
RewriteCond %{REQUEST_URI} !/my-new-url\.html
RewriteRule ^ /my-new-url.html [R=301]
There are a few potential issues with this, particularly since you hint in a comment that you are perhaps using a front-controller to "route" the URL.
This redirect satisfies the conditions outlined in the question, but does assume that you have no other rewrites, have an essentially "static site" and are not linking to any static resources.
You are missing an L (last) flag, so processing will continue through the file and possibly be rewritten if you have later rewrites.
If you are rewriting the URL to a front-controller in order to route the URL (as you suggest in comments) then this redirect will break, as it will redirect away from the front-controller. You need to only redirect direct requests, ie. when the REDIRECT_STATUS environment variable is empty.
If you are linking to any static resources in the same file space then these will also be redirected. You need to create an exception for any static resources you are using, either by file extension (eg. (css|js|jpg|png)) or by location (eg. /static).
So, try the following instead:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_URI} !\.(js|css|jpg|png)$
RewriteRule !^my-custom-url$ /my-custom-url [R=302,L]
You don't need a separate condition to implement the exception for the URL you are redirecting to. It is more efficient to do this directly in the RewriteRule pattern.
The first condition ensures we are only redirecting direct requests and not rewritten requests to your front-controller.
The second condition avoids any static resources also being redirected. You could alternatively check the filesystem path if all your resources are stored under a common root. Or, as a last resort, implement filesystem checks (ie. RewriteCond %{REQUEST_FILENAME} !-f) if your static resources are too varied - but note that this is less efficient.
You will need to clear your browser cache before testing, since any earlier (erroneous) 301s are cached persistently by the browser.

Apache 301 redirect with get parameters

I am trying to do a 301 redirect with lightspeed webserver htaccess with no luck.
I need to do a url to url redirect without any related parameters.
for example:
from: http://www.example.com/?cat=123
to: http://www.example.com/some_url
I have tried:
RewriteRule http://www.example.com/?cat=123 http://www.example.com/some_url/ [R=301,L,NC]
Any help will be appreciated.
Thanks for adding your code to your question. Once more we see how important that is:
your issue is that a RewriteRule does not operate on URLs, but on paths. So you need something like that instead:
RewriteEngine on
RewriteRule ^/?$ /some_url/ [R=301,L,NC,QSD]
From your question it is not clear if you want to ignore any GET parameters or if you only want to redirect if certain parameters are set. So here is a variant that will only get applied if some parameter is actually set in the request:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)cat=123(?:&|$)
RewriteRule ^/?$ /some_url/ [R=301,L,NC,QSD]
Another thing that does not really get clear is if you want all URLs below http://www.example.com/ (so below the path /) to be rewritten, or only that exact URL. If you want to keep any potential further path component of a request and still rewrite (for example http://www.example.com/foo => http://www.example.com/some_url/foo), then you need to add a capture in your regular expression and reuse the captured path components:
RewriteEngine on
RewriteRule ^/?(.*)$ /some_url/$1 [R=301,L,NC,QSD]
For either of this to work you need to have the interpretation of .htaccess style files enabled by means of the AllowOverride command. See the official documentation of the rewriting module for details. And you have to take care that that -htaccess style file is actually readable by the http server process and that it is located right inside the http hosts DOCUMENT_ROOT folder in the local file system.
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using .htaccess style files. Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).

SEO URLs with ColdFusion controller?

quick ref: area = portal type page.
I would like old urls http://domain.com/long/rubbish/url/blah/blah/index.cfm?id=12345
to redirect to http://domain.com/area/12345-short-title
http://domain.com/area/12345-short-title should display the content.
I have worked out so far to do this I could use apache to write all URLs to
http://domain.com/index.cfm/long/rubbish/url/blah/blah/index.cfm?id=12345
and
http://domain.com/index.cfm/area/12345-short-title
The index.cfm will either server the content or apply a permanent redirect, but it will need to get the title and area information from the database first.
There are 50,000 pages on this website. I also have other ideas for subdomain redirects, and permanent subdomains and controlling how they act through the index.cfm.
Infrastructure are keen to do as much through Apache rewrite as possible, we suspect it would be faster. However I'm not sure we have that choice if we need to get the area and title information for each page.
Has anyone got some experience with this that can provide input?
--
Something to note, I'm assuming we'll have to keep all the internal URLs used on the website in the old format. It would be a mega job to change them all.
This means all internal URLs will have to use a permanent redirect every time.
Rather than redirecting both groups of URLs to the same script, why not simply send them to two distinct scripts?
Simply like this:
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^\w+/\d+-[\w-]+$ /content.cfm/$0 [L]
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^.* /redirect.cfm/$0 [L,QSA]
Then, the redirect.cfm can lookup the replacement URL and do the 301 redirect, whilst content.cfm simply serves the content.
(You haven't specified how your CF is setup; you may need to update the Jrun/Tomcat/other config to support /content.cfm/* and /redirect.cfm/* - it'll be done the same as it's done for index.cfm)
For performance reasons, you still want to avoid the database hits for redirecting if you can, and you can do that by generating rewrite rules for each page that performs the 301 redirect on the Apache side. This can be as simple as appending a line to the .htaccess file, like so:
<cfset NewLine = 'RewriteRule #ReEscape(OldUrl)# #NewUrl# [L,QSA,R=301]' />
<cffile action="append" file="./.htaccess" output=#NewLine# />
(Where OldUrl and NewUrl have been looked-up from the database.)
You might also want to investigate using mod_alias redirect instead of mod_rewrite RewriteRule, where the syntax would be Redirect permanent #OldUrl# #NewUrl# - since the OldUrl is an exact path match it would likely be faster.
Note that these rules will need to be checked before the above redirect.cfm redirect is done - if they are in the same .htaccess you can't simply do an append, but if they are in the site's general Apache config files then the .htaccess rules will be checked first.
Also, as per Sharon's comment, you should verify if your Apache will handle 50k rules - whilst I've seen it reported that "thousands" of regex-based Apache rewrites are perfectly fine, there may well be some limit (or at least the need to split across multiple files).
Using apache rewrites would only be faster if they were static rewrites, or if they all followed some rule that you could write in regex within the .htaccess file. If you're having to touch the database for these redirects, then it may not make sense to do it in .htaccess.
Another approach is the one used by most CMSs for handling virtual directories and redirects. An index.cfm file at the root of the site handles all incoming requests and returns the correct pages and pathing. MURA CMS uses this approach (as well as Joomla and most of the others.)
Basically you're using the CGI.path_info variable on an incoming request, searching for it in your DB, and doing a redirect to the new path. As usual, Ben Nadel has a good write-up of how to use this approach: Ben Nadel: Using IIS URL Rewriting And CGI.PATH_INFO With IIS MOD-Rewrite
You can, however, use the .htaccess to remove the "index.cfm" from the url string entirely if you want by redirecting all incoming requests to the root URL with something that looks like this in your .htaccess:
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^([a-zA-Z0-9-]{1,})/([a-zA-Z0-9/-]+)$ /$1/index.cfm/$2 [PT]
Basically this would redirect something like http://www.yourdomain.com/your-new-url/ to http://www.yourdomain.com/index.cfm/your-new-url/ where it could be processed as described by the blog post above. The user would never see the index.cfm.

How manage several folders in my domain?

I want create subdomains like this:
domain.com/type/city
An examples:
domain.com/restaurants/new_york
domain.com/hotels/new_york
domain.com/restaurants/chicago
I have thousand of cities in a mysql database.
I thinked in some options:
Thousand of folders with an index.php for redirect (I think wrong way).
Create an sitemap with all links (domain.com?type=hotels&city=chicago) and manage they by code with the database.
Apache?
Please, which will be the best way for this? Thanks in advance!
You can solve this with a combination of PHP and Apache configuration. That is the most common solution and seen in popular PHP website software such as Drupal and Wordpress.
The idea is to let Apache send all traffic to one index.php file and pass the rest of the path as a parameter for PHP to handle with it.
You will need to be carefull with a few edgecases though; if file such as ./public/styles.css is requested, you don't want to serve that trough your PHP application but want apache to serve the file directly. Existing files will need to be handled by apache, all else by you application.
In your .htaccess:
# Rewrite URLs of the form 'x' to the form 'index.php?q=x'.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
The first line tells Apache to send normal files by itlself. Second line does the same for existing directories. Third line avoids that browsers (most notably version IE6) who request the example.com/favicon.ico don't hammer your PHP application.
Then it passes everything along to index.php and adds the rest of the path into the q param.
Inside index.php you can then read that, and take action with that:
<?php
$path = $_GET['q'];
$params = explode('/', $path);
print $path;
print_r($params);
?>
Thousands of folders would be the wrong way that is for sure.
If you start creating the sitemap with links of the type domain.com/?type=hotels&city=chicago you get a nice structure that you can manage programatically.
First get this started and working, then look up .htaccess and mod_rewrite which you can then use to map from domain.com/type/city to your links already functioning.
This seems both to be a good strategy for getting something working fast, and for ending up with the prettiest solution.

Apache mod_perl handler/dispatcher returning control to apache

Is it possible to have an apache mod_perl handler, which receives all incoming requests and decides based upon a set of rules if this request is something it wants to act upon, and if not, return control to apache which would serve the request as normal?
A use-case:
A legacy site which uses
DirectoryIndex for serving index.html
(or similar) and default handlers for
perl scripts etc, is being given a
freshened up url-scheme
(django/catalyst-ish). A dispatcher
will have a set of urls mapped to
controllers that are dispatched based
on the incoming url.
However, the tricky part is having
this dispatcher within the same
namespace on the same vhost as the old
site. The thought is to rewrite the
site piece by piece, as a "update all"
migration gives no chance in testing
site performance with the new system,
nor is it feasible due to the sheer
size of the site.
One of the many problems, is that the dispatcher now receives all URLs as expected, but DirectoryIndex and static content (which is mostly served by a different host, but not everything) is not served properly. The dispatcher returns an Apache::Const::DECLINED for non-matching urls, but Apache does not continue to serve the request as it normally would, but instead gives the default error page. Apache does not seem to try to look for /index.html etc.
How can this be solved? Do you need to use internal redirects? Change the handler stack in the dispatcher? Use some clever directives? All of the above? Not possible at all?
All suggestions are welcome!
I have done a similar thing but a while back, so I might be a little vague:
I think you need to have the standard file handler (I believe this is done using the set-handler directive) as well as the perl handler in the stack
You might need to use PerlTransHandler or a similar one to hook into the filename/url mapping phase and make sure the next handler inline will pick the right file up off the filesystem.
Maybe you will have success using a mod_rewrite configuration which only does rewrite URLs to your dispatcher if a requested file does not exist in the file system. That way your new application acts as an overlay to the old application and can be replaced in successive steps by just removing old parts of the application during deployment of new parts.
This can be accomblished by a combination of RewriteCond and RewriteRule. Your new application needs to sit in a private "namespace" (location) not otherwise used in the old application.
I am not a mod_perl expert but with eg mod_php it could work like this:
RewriteEngine on
# do not rewrite requests into the new application(s) / namespaces
RewriteRule ^new_app/ - [L]
# do not rewrite requests to existing file system objects
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
# do the actual rewrite here
RewriteRule ^(.*)$ new_app/dispatcher.php/$1