Redirect strange URLs with two periods in them - apache

I had a static HTML website that I recently converted over to drupal. I have been monitoring my sites 404 errors in webmaster tools and drupal reports and have noticed that google has indexed strange urls. My guess is they came from relative links that were created improperly from the older static HTML site.
Here is an example:
www.example.com/items../item-page.html
The actual page is:
www.example.com/items/item-page.html
The new drupal site doesn't even have .html extensions. I am using URL redirect and path auto modules and have redirects setup for all of the old urls to make sure they are 301'd to the new URL structure (e.g. www.example.com/items/item-page.html would be 301 to www.example.com/items/item-page).
I have access to the server so I am doing my redirects in the apache httpd.conf file instead of .htaccess. I tried the following code to redirect ../ to / but I am not having any luck:
RewriteRule ^\.\./(.*) /$1 [R=301,NC,L]
This rule doesn't do anything when I go to a url with the ../ in it. Is there a rewriterule that can match ../ and remove it from any url?
NOTE: I have other redirects in apache httpd.conf that are working fine...such as:
RewriteRule ^items/pdf/(.*)$ /sites/default/files/documents/items/$1 [R=301,NC,L]
So I don't think its my server configuration.
EDIT:
I noticed that the rewrite rule above for rewriting the pdf directory works even with the .. in the URL. Example:
http://www.example.com/items../pdf/somedocument.pdf
redirects to
http://www.example.com/sites/default/files/documents/items/somedocument.pdf
So it looks like the .. is completely ignored in rewrite rules, which is why I can't get anything to work. Does anyone know a way around that?

I believe you need to escape your forward slash.
I believe rewriterule acts against the HTTP request URI, not the page link.
Consequently, I believe you will need to remove the carat to find a match, and analyze your intentional URLs to see if that would have a negative impact elsewhere.

You can use this
RewriteRule ^/items\.\.(.+)$ /items/$1 [L,R]
This will redirect /items..foobar to /items/foobar

I was not able to fix the issue using rewrite rules in apache due to the rewrite rules not finding ".." in the url for unknown reasons.
My solution was to create a custom drupal module that looks to see if ".." is in the URL. If the ".." string is found, then I have it set to redirect to the url without ".." in it using built-in drupal functions. Here is the code I used in my module.
function doubledot_fix_init() {
$destination = drupal_get_destination();
$alias = drupal_get_path_alias($destination['destination']);
$fixpath = str_replace("..", "", $alias, $count);
if ($count > 0) {
drupal_goto($fixpath, array(), 301);
}
}
I do not see any reason this fix will break anything, because ".." should never really be found in any URL. If anyone can think of a situation that this fix could cause an issue, or if you know of a better solution, please let me know.

Related

htaccess 301 redirect "works" but I get a 404 error at new site

Sorry for such a long title, but it pretty well describes what is happening.
Details: I have two sites, different domains. Previously, I had a temporary site in a not-visible, but published directory in the older domain. Only those who had the extra directory (or the extra path would normally see the temporary site).
Now that I have a new domain and a permanent new site, I simply want to redirect any attempts to access the old directory/pages/site. Here is the line I added to the old site's htaccess file (last line, BTW):
redirect 301 /mailscamalert.com/weather2/ http://www.mid-southweather.com/
That "works" at least in the sense that the user ends up at the new site. But that site throws up a 404 'flag' and the user ends up at my "erer" page. All the new site's navigation is on that page, of course, but it is probably very confusing!
I've tried removing the trailing "/" on 'weather2/' and/or "...com/", adding
"index.html" to the new site's url. No change in ending up at the error page. Also have tried "meta" redirects and even a bit of php:
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.mid-southweather.com/index.html");
Any helpful suggestions or links, greatly appreciated!
Thanks!
Please correct your rewrite rule to the following:
RewriteEngine On
RewriteRule ^weather2/ http://www.mid-southweather.com/ [R=301,L]
If you had multiple pages in that directory and want them all to redirect to your new target domain, do this:
RewriteRule ^weather2/.* http://www.mid-southweather.com/ [R=301,L]
Your current rewrite rule appears to append the actual directory you are trying to redirect to the target url.
Location: http://mid-southweather.com/weather2/
As I discovered using live http headers extension. That weather2 directory of course doesn't exist on your new site, thus the 404.
Just dump your lines and replace them with mine and it should all work nicely. And undo any other changes you may have done in the process of trying to get it working.
Make sure you don't have other rewrite rules going on, it looks to me like you might have one more running somewhere.

htaccess rewrite rules are not working with urls that end with .cfm

I'm working on fixing all my URL's to be shorter with 301 redirects. I have fix almost all of them, however there is a url that is ending with .cfm that will not rewrite.
FROM: http://www.mydomain.com/index.cfm/catlink/17/pagelink/7/sublink/34/art/41/rec/1/page.cfm
TO: http://www.mydomain.com/story/resources/health/page/168/page.html
If I change /page.cfm to /page.html then the rewrite will work.
Here is the rewrite rule that works for my other urls
RewriteRule ^index.cfm/catlink/([a-zA-Z0-9/-]+)([/])pagelink/([a-zA-Z0-9/-]+)([/])sublink/([a-zA-Z0-9/-]+)([/])art/([a-zA-Z0-9/-]+)(.*)$
http://localhost/index.cfm?page=moved&cat=$3&subcat=$5&article=$7&story=$8 [R=301]
Why does it work when the URL ends with .html but not when it ends with .cfm? What am I doing wrong?
This is current link and will not work:
http://www.mydomain.com/index.cfm/catlink/17/pagelink/7/sublink/34/art/41/rec/1/page.cfm
If I manually change the end of it to .html, I can get it to work:
http://www.mydomain.com/index.cfm/catlink/17/pagelink/7/sublink/34/art/41/rec/1/page.html
The issue is that Apache httpd is passing it off to Tomcat before Apache looks at the .htaccess. To test this, move your rewrite rules into your vhost. If they work, then that's what the problem was.
First off, change your the first part of your RewriteRule to be the following, more concise expression:
^index.cfm/catlink/(\d+)/pagelink/(\d+)/sublink/(\d+)/art/(\d+)/(.*)$
I believe that alone might resolve the issue. However, if it does not, and you don't care about the rest of the URL, try the following:
^index.cfm/catlink/(\d+)/pagelink/(\d+)/sublink/(\d+)/art/(\d+)/
Note: this removes the anchor ($) and therefore allows the URL to be open ended.

How can I redirect people accessing my files as directories?

I have the following situation:
On my webserver I have an instance of websvn running, where specific repositories and revisions can be accessed by a URL like
http://www.myhost.com/listing.php?repname=repository1&path=%2Ftrunk%2Fbackend
Somehow, out there in the wild, a wrong URL is being used to access this
http://www.myhost.com/listing.php/?repname=repository1&path=%2Ftrunk%2Fbackend
(Notice the slash after listing.php)
Now, although the URL works and websvn still shows the webpage, images and stylesheets do not get loaded correctly, since they are referenced relative.
I tried to add an .htaccess file to the webroot to redirect people accessing the file as directory to the correct URL.
I have tried multiple variations and ended up with this file:
RewriteEngine on
RewriteRule ^/listing.php/ listing.php [R=301,QSA]
But, since I am writing here, you already guessed it: It doesn't work.
I also tried
RewriteEngine on
RewriteRule ^/listing.php(.*) listing.php$1 [R=301,QSA]
What am I doing wrong?
Perhaps among other things, a RewriteRule within .htaccess that starts with “^/” will never match anything at all. (Examples that include a leading slash are for the global configuration file.) Remove the leading forward slash and see if that helps.
Also, I recommend changing the 301 to a 307 until you get it working. Otherwise, your browser will cache the 301 result, redirecting on subsequent references without consulting your server at all and likely giving you very confusing results.

How do I rewrite www.sitename.com/thing/thing.php?otherthing=something-like-this to www.sitename.com/something-like-this?

How do I rewrite
www.sitename.com/thing/thing.php?otherthing=something-like-this
to
www.sitename.com/something-like-this?
please help me with this as I can't seem to succeed. My host uses apache 2.2. Many thanks for your help!
Update
No I don't need that trailing ? However, I used the Rewrite rule you offered me and it still ain't working. I also added a RewriteEngine On before the rules.
I have Linux hosting, .htaccess and the code is obviously semantically correct, cause otherwise I would get the all so popular 500 internal server error. I placed the .htaccess file in the folder thing and in the root of the site, but it still won't work.
There should be an option to display it in directory format instead of the PHP ? format. If not, you could use the .htaccess mod_rewrite rule to make that display in the /folder/ way.
The way I do it is that I just upload my files and each page name is index.html and then I create folders, and put each index.html in the folder. Like this:
/guidelines/
In that folder is index.html, so instead of it being /guidelines.html it's /guidelines/
Looks better without .html
You need to use mod_rewrite:
RewriteCond %{QUERY_STRING} ^otherthing=(.*)$
RewriteRule ^thing/thing.php$ /%1? [L]
No idea if you meant to have that trailing ? at the end of the rewrite, I don't think that's possible. Note that the ? at the end of the RewriteRule is to get rid of the query string, otherwise, the rewritten URL will still have the ?otherthing=something-like-this at the end.

apache .htaccess - cut a string from url and redirect

For some reason google indexed several pages of my website as:
http://myapp.com/index.php/this-can-be-enything/1234
Now, I want to redirect with apache .htaccess those pages to correct urls:
http://myapp.com/this-can-be-enything/1234
I've googled and tried many options but with no success.
Any tip will be helpful.
I've added to my .htaccess file following lines:
RewriteCond %{THE_REQUEST} ^.*index.php.*
RewriteRule ^(.*)index.php(.*)$ $1$2 [NC,R=301,L]
I don't know if this is best solution but works ok for me.
Two Parts of problem
To make Google aware that indexed page is moved to some other destination you need to handle that # apache level and issue 301 ( moved permanently )
Handler to handle the cached requested URL to new URL using the #1 handler itself.