Shorten URLs with mod_rewrite - apache

I am currently trying to make a URL shortener feature for one of my projects; what I want to do if a user visits the site with a URL that does not contain any slashes (for directories) or file extensions, it should redirect to a PHP script that will serve up the correct file. For example:
http://example.com/A123 would be rewritten as http://example.com/view.php?id=A123
but
http://example.com/A123/ would not be rewritten, and
http://example.com/A123.png would not be rewritten either. I have been messing with mod_rewrite for a few hours now and for the life of me I cannot get this to work...

With no way to identify the URI that needs to be shortened you need to exclude all other possibilities. This will likely require you to build a lengthy list of exclusions. Below is a starting point. Each of these conditions verifies the requesting URI does NOT match (signified by the !). When it doesn't match all conditions the rule is run.
RewriteCond %{REQUEST_URI} !^/view.php
RewriteCond %{REQUEST_URI} !.html$
RewriteCond %{REQUEST_URI} !/$
RewriteRule ^/(.*)$ http://example.com/view.php?id=$1 [QSA]
The above also requires you (as you have requested) to break a standard practice rule, which is to handle directory requests without a trailing slash. You are likely to come across other issues, as the rules above break your Apache server side directory rules.
Rethinking the logic. If you had some way to identify the URL that is to be shortened it would be much easier. For example 's', http://example.com/s/A123.
RewriteCond %{REQUEST_URI} ^/s/
RewriteRule ^/s/(.*)$ http://example.com/view.php?id=$1 [QSA]

I'm definitely no guru at this, but its similar to what I'm trying to accomplish (see my yet unanswered question)
However, if I understand correctly, this (untested) RewriteRule may work:
RewriteRule ^([^\.\/]*)$ view.php?id=$1 [L]
The magic part is the [^\.\/]* which says: 1 or more (*) instances of a charactor ([]) which is not ([^ ]) a period or a slash (\ escapes these charactors).
Like I said, I haven't tested this, nor am I an expert, but perhaps this will help.

Related

Block direct access to .php files, less the index.php file and ajax.php file

been looking for your help, i found a method, but it is not as i wish. if someone can help me.
What I want is that nobody can enter a direct URL with .php
example when I enter my domain.com/buy/product.php, I want it to be forbidden,
I was looking for information here, I found this code that worked for me but in .htaccess
RewriteEngine On
RewriteCond %{THE_REQUEST} "^.+? [^?]+\.php(?:[?/# ]|$)" [NC]
RewriteRule !^index\.php$ - [F,L,NC]
it worked fine for me, but the problem that I in a directory /include/ajax.php , I use an ajax. and it gives me error to execute the ajax by browsing.
Now what I'm thinking how to make it work with that htaccess code that you can enter the index.php and /include/ajax.php, I tried all means but it does not work for me.
In another case if you know any code to add to my php or how to do for my version which is version 7.3, but without ruining my code.
Rather than giving you the answer straight out, I'm going to give you some hints so that you aren't copying code you don't understand.
Each RewriteRule has three parts:
the pattern to match against the URL sent by the browser
the URL to rewrite to
an optional set of flags for extra options
Before each rule, you can optionally have one or more RewriteCond lines which apply extra conditions to the rule; each has three parts:
a variable to match against
the pattern to match
an optional set of flags for extra options
The most important flag in this case is [F], short for [forbidden], which says "if the rule matches, instead of rewriting or redirecting, just server a 403 response.
You should very rarely need to test against %{THE_REQUEST}, which is a raw version of the request line from the browser; much more often, you want %{REQUEST_URI} and/or %{QUERY_STRING}.
The patterns in both RewriteRule and RewriteCond can be negated (i.e. "must not match this pattern") by starting them with !
So, if you wanted to return a 403 for all URLs ending ".bad", except for URLs ending "not.bad" or "only-a-little.bad", you could write this (note that $ is the way to say "must end here" in the regex patterns):
RewriteCond %{REQUEST_URI} !not.bad$
RewriteCond %{REQUEST_URI} !only-a-little.bad$
RewriteRule .bad$ - [F]
Hopefully it should be straight-forward enough to see how to adapt that to your requirements.
The full list of options and variables available is in the Apache manual.
After 2 days of looking for some code, I was able to read and understand.
study how htaccess works.
Thanks to the users who guided me, I found the solution.
Although my title is not quite correct.
My intention was always to block all .php that always the user wanted to enter directly by .PHP, I had found the code above, but it did not work with a specific file in the /include/ajax.php folder, exactly it was an ajax, I could not find solution.
exactly it was an Ajax, I could not find the solution to make it work.
Until I managed to solve this way.
RewriteEngine on
RewriteCond %{THE_REQUEST} ajax\.php [NC]
RewriteRule ^ - [NC,L]
RewriteCond %{THE_REQUEST} .+\.php [NC]
RewriteRule ^ - [F,L]
This causes all .php to be blocked, except the index.php and the /include/ajax.php file.
This is how it worked for me.
If I am right or wrong, can you give me some guidance.
I leave this in case someone might find it useful in the future.
I was always recommended to route my php, that I would forget about these problems.
I will keep it in mind as I move forward in the future, to route my php.

Htaccess Match Random 6 characters, with exceptions?

Alright, so I've been trying to wrap my head around (what I believe to be) a simple mod_rewrite case. Maybe it's not, but I'm hoping you can help me with that, Stack Overflow.
So what I want is this: there are several folders that need to be ignored (ie, "css", "js", "bootstrap", etc). If the url string doesn't match those, I want to check if it's a string of exactly six letters and numbers, and redirect that to one url. Otherwise, it gets redirected to another url.
This is what I have:
RewriteCond %{REQUEST_URI} !^/(index\.php|images|robots\.txt|bootstrap|phpmyadmin|css|js|font|recaptchalib.php|uploads)/
RewriteRule ^(a-z0-9+){6}$ /index.php/download/index/$1 [L]
RewriteRule ^(.*)$ /index.php/$1 [L]
If I take out the middle line, it works fine except I don't get the "match 6 random characters" functionality. With the middle line, I get a 500 error on every page.
Could someone help me out please?
You need to repeat the condition. A RewriteCond only applies to the immediately following RewriteRule. So you're second rule doesn't exclude all those folders in the pattern that you have in your condition. Try:
RewriteCond %{REQUEST_URI} !^/(index\.php|images|robots\.txt|bootstrap|phpmyadmin|css|js|font|recaptchalib.php|uploads)/
RewriteRule ^(a-z0-9+){6}$ /index.php/download/index/$1 [L]
RewriteCond %{REQUEST_URI} !^/(index\.php|images|robots\.txt|bootstrap|phpmyadmin|css|js|font|recaptchalib.php|uploads)/
RewriteRule ^(.*)$ /index.php/$1 [L]
You should let CodeIgniter handle this. Keep your .htaccess for routing to the index.php front controller, and use route(s) to handle the URI and where it should go from there.
This is especially true because now ANY six-letter URI is going to be defaulted. What if you have a controller like example.com/bloggers? It will always be assumed to be a download item, even if it's a real controller URI.
The "easiest" option (read: option that does not conflict with existing controllers/routes) is to utilize the 404_controller to check the URI and see if it's a valid download URL. Then you can run the appropriate code.
To explain a likely reason why your .htaccess code is not working: your regular expression for matching six alpha-numeric characters is wrong. Here's what you need:
^([a-zA-Z0-9]{6})$
This regex can be used as a CodeIgniter route, also, if you go that route (heh). Just remove the ^$ beginning/end characters, as CI puts them there for you.
As mentioned by Jon Lin, you also need to duplicate RewriteCond conditionals, as they are only good for one RewriteRule. After one, the conditionals reset.

How does .htaccess work?

I'm trying to make my website display the other pages as a www.example.com/pageone/ link instead of www.example.com/pageone.html.
Problem is, i'm reading up on ways to do that using .htaccess and its getting me confused because i don't understand the commands.
I understand that the first step, however, is to write this in the .htaccess file:
RewriteEngine On
After this step, i have absolutely no idea whats !-d nor {REQUEST_FILENAME} nor ^(.*) and all that programming text. Is there a documentation that i can refer to?
Or can anyone provide me a simple explanation on how to configure the .htaccess file to understand that if i want to go to
www.example.com/pageone.html
, all i need to type into the URL is
www.example.com/pageone/
and PHP files, etc as well?
First of all, there's the Official Documentation. To solve your specific problem, I would go about this way:
RewriteEngine on #Turn on rewrite
RewriteCond %{REQUEST_FILENAME} !-f #If requested is not a filename...
RewriteCond %{REQUEST_FILENAME} !-d #And not a directory
RewriteRule ^([^/]+)/?$ /$1.html [L] #Preform this redirect
The RewriteConds only apply to the next following rule. If you were to have multiple rules, you'd need to write the conditions for each one.
Now, the Apache server matches the requested path (everything after www.example.com/), to see if it matches any of the rules you've specified. In which case, there is only one:
^([^/]+)$
This regular expression matches any number of characters, which are not slash /, followed by an optional trailing slash. If the match was found, it will rewrite the request to the second parameter: /$1.html, $1 means "Whatever was matched between the brackets", which in our case is all of the non-slash characters.
The [L] flag, tells the rewriting engine to stop looking for rules if this rule was matched.
So to conclude, www.example.com/whatever/ will be rewritten sliently at the server to www.example.com/whatever.html
RewriteEngine on
RewriteBase /
RewriteRule ^([^/]+)$ /$1.html
That should be all you need for this rewrite. It basically says "Anything that is not a forward slash will be assigned to the variable $1. So /foo would point to /foo.html
For official documentation you can look here Apache httpd mod_rewrite.
On Google you can search with keywords such as url rewriting tutorial.
The weird characters are called regular expressions. It's not an easy part to learn but there is a lot of tutorial about them.
PS: this is not a straight answer but some stuff to let you go further and understand how url rewriting works.

Dealing with multiple, optional parameters with mod_rewrite

I'm using apache's mod_rewrite to make my application's URL's pretty. I have the basics of mod_rewrite down pat - several parts of my application use simple and predictable rewrites.
However, I've written a blog function, which use several different parameters.
http://www.somedomain.com/blog/
http://www.somedomain.com/blog/tag/
http://www.somedomain.com/blog/page/2/
I have the following rules in my .htaccess:
RewriteRule ^blog/ index.php?action=blog [NC]
RewriteRule ^blog/(.*) index.php?action=blog&tag=$1 [NC]
RewriteRule ^blog/page/(.*) index.php?action=blog&page=$1 [NC]
However, the rules do not work together. The computer matches the first rule, and then stops processing - even though to my way of thinking, it should not match. I'm telling the machine to match ^blog/ and it goes ahead and matches ^blog/tag/ and ^blog/page/2/ which seems wrong to me.
What's going wrong with my rules? Why are they not being evaluated in the way I'm intending?
Edit: The answer was to terminate the input using $, and re-order the rules, ever so slightly:
RewriteRule ^blog/$ index.php?action=blog [NC,L]
RewriteRule ^blog/page/(.*)$ index.php?action=blog&page=$1 [NC,L]
RewriteRule ^blog/(.*)$ index.php?action=blog&tag=$1 [NC,L]
These rules produced the desired effect.
If you don't want ^blog/ to match anything more than that, specify the end of the input in the match as well:
^blog/$
However, the way many apps do it is to just have a single page that all URLs redirect to, that then processes the rest of the URL internally in the page code. Usually most web languages have a way to get the URI of the original request, which can be parsed out to determine what "variables" were specified, even though Apache points all of them to the same page. Then via includes or some other framework/templating engine you can load the proper logic.
As another note - usually the "more general" rewrite rules are put last, so that things which match a more specific redirect will be processed first. This, coupled with the [L] option after the rule, will ensure that if a more specific rule matches, more general ones won't be evaluated.
I think you need to add an [L] after the [NC] statements otherwise it'll carry on even if its already been matched

Why would mod_rewrite rewrite twice?

I only recently found out about URL rewriting, so I've still got a lot to learn.
While following the Easy Mod Rewrite tutorial, the results of one of their examples is really confusing me.
RewriteBase /
RewriteRule (.*) index.php?page=$1 [QSA,L]
Rewrites /home as /index.php?page=index.php&page=home.
I thought the duplicates might have had been caused by something in my host's configs, but a clean install of XAMPP does the same.
So, does anyone know why this seems to parse twice?
And, to me this seems like, if it's going to do this, it would be an infinite loop -- why does it stop at 2 cycles?
From Example 1 on this page, which is part of the tutorial linked in your question:
Assume you are using a CMS system that rewrites requests for everything to a single index.php script.
RewriteRule ^(.*)$ index.php?PAGE=$1 [L,QSA]
Yet every time you run that, regardless of which file you request, the PAGE variable always contains "index.php".
Why? You will end up doing two rewrites. Firstly, you request test.php. This gets rewritten to index.php?PAGE=test.php. A second request is now made for index.php?PAGE=test.php. This still matches your rewrite pattern, and in turn gets rewritten to index.php?PAGE=index.php.
One solution would be to add a RewriteCond that checks if the file is already "index.php". A better solution that also allows you to keep images and CSS files in the same directory is to use a RewriteCond that checks if the file exists, using -f.
1the link is to the Internet Archive, since the tutorial website appears to be offline
From the Apache Module mod_rewrite documentation:
'last|L' (last rule)
[…] if the RewriteRule generates an internal redirect […] this will reinject the request and will cause processing to be repeated starting from the first RewriteRule.
To prevent this you could either use an additional RewriteCond directive:
RewriteCond %{REQUEST_URI} !^/index\.php$
RewriteRule (.*) index.php?page=$1 [QSA,L]
Or you alter the pattern to not match index.php and use the REQUEST_URI variable, either in the redirect or later in PHP ($_SERVER['REQUEST_URI']).
RewriteRule !^index\.php$ index.php?page=%{REQUEST_URI} [QSA,L]