Massive URL Change - apache

We need to make changes to an app that will cause all its URLS to change, we don't want to lose value, and have too many urls redirect to 301. I am looking to change a mod rewritten URL to a non-written one.
My thoughts would be to
Leave the mod rewritten URLS active (Temporarily)
Place a canonical tag with the NEW correct URL
Make sure no links are currently linking to old URLS - all internal links updated etc
Make sure our robots.txt and sitemap submissions are updated to date.
Would a massive change in URLs - even if backed up by canonical URLs and updated sitemap.xml - have a negative affect on listings in google?
What are peoples thoughts / experience in this?

Thinking about it, if you're using mod_rewrite and are wanting to switch to a non mod rewritten URL then the chances are you can make the changes purely by adding the 301 response to the end of your rewrite rule to make something like this:
RewriteRule ^whatever/(.*)$ http://www.domain.com/$1/ [R=301,L]

Actually, a 301 redirect should not impact your search ranking - it's exactly how you're supposed to do that kind of thing and it's search engine independant. The "canonical" header is an invention of Google and has the disadvantage that people still using the old URLs from outside links will not be redirected and thus keep using the old URLs in links and bookmarks.

Using a permanent HTTP redirect is the best solution for both, your users and the search engines.

I'd also be interested to know then on this topic how these 301's should be handled, in .htaccess or at a code level? Surely 100's of 301's in a .htaccess is too many?

Related

How to setup wild card 301 redirect that will remove URL parameters from index.php

ok I previously had some issues setting up a wild card redirect to strip parameters from an old url format that were non-seo friendly to our root. With the help of stackoverflow we've got it corrected and working but now I am experiencing issues with index.php?
Im seeing tons of duplicate urls in the engines using this url format index.php?cPath=# and Google is stressing me because they think its duplicate index pages.
I tried setting up this redirect in our htaccess but no go. It won't strip out the parameters or redirect it just duplicates the page with a valid 200 response.
RewriteRule ^index.php/.*$ /? [R=301,NE,NC,L]
I want to redirect anything with index.php?cPath=# to our root domain. Any ideas on how i can tackle this 301 redirect using a mod_rewrite on apache?
I've also just noticed that our site has another issue with this url format:
domain.com/?cPath=#...
so now I also need to write a rewrite for domain.com/?cPath=# I don't know where this one popped up from but I can see its going to cause issues hence the valid 200 response code it gives. The engines are really going to love me for this one.

301 Redirect in .htaccess for re-submitting URL-s

I want to ask how do I redirect Search Engines to take a second look on my new, fresh, rewritten URL-s?
So, my former URL-s were structured like this :
http://www.sample.com/tutorials.php?name=something
and now they look much more cleaner and better :
http://www.sample.com/tutorials/programming/something.php
So, as I said, I want Google (and other engines) to take a look at my new links, which are much more SEO friendly and for that I will be indexed better.
I was told the 301 redirect method was the best, but I don't have a clue what is it, how it works and where to learn how to use it. So, I am asking you.
Side note : Would updating my sitemap.xml file and re-submitting it to Google Webmaster Tools help in this process?
Thanks in advance!
There are 2 kinds (in this context) redirects. When a client, be it a browser, search engine indexing bot, or whatever, requests a URI, the server can tell the client "What you are looking for exists, but it's somewhere else". In the case of a 302 or temporary redirect, it's essentially telling the client "What you are looking for exists, but it's temporarily over here at this URL". In the case of a 301 or permanent redirect, it's essentially telling the client "What you are looking for exists, but it has permanently moved over to this URL".
In the case of the later, browsers, proxy servers, and search engine indexes know that the old URL is no longer valid and to stop using it, and from now on to use the new URL that was returned by the server via a 301 redirect. In the case of a search engine like Google, it has an index of the old URL and all the data that its accumulated over the lifetime of that URL assoicated with it. When one of its bots sees a 301, it knows that the old URL, and its content, isn't gone, but it just permanently moved to another URL. All of the associated data Google has collected for the old URL gets trasnfered to the new URL. Google can probably figure most of this stuff out without a 301 redirect, but it's a sure way to make sure Google has got a right.
You can do such a redirect via mod_rewrite:
RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /tutorials\.php\?name=([^&\ ]+)
RewriteRule ^ /tutorials/programming/%1.php [L,R=301]
You should put this near the top of the htaccess file in your document root. The condition checks that an actual request has been made for /tutorials.php with a query string name="something". The "something" part gets grouped by the match and is accessed via the %1 backreference.
The 301 redirect is a response that the server can make which signals to the user (or search engine) that the page they are looking for has been permanently moved to a specified other page. It is possible to configure apache to give a 301 for certain urls, but it is probably easier to have the whatever server-side language you are using take the request, and then issue a 301.
The chances are that Google will work out what is going on fairly quickly without 301s or anything else, but submitting a sitemap to them or using the URL Parameters functionality in Google's Webmaster Tools might help.

How to prevent a search engine from indexing a directory for a particular domain?

I have a web hosting package with 2 domains pointing to it. I've noticed on Google that it has indexed the directory of one of the domains for the other domain. Is there a way of preventing this from happening.
You could try with the Robots exclusion standard but is no guarantee.
Redirect all pages of one of your domains to the other one. You can do that with .htaccess and modRewrite similar to this:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
This would perform a 301 redirect (Permanently moved) from example.com to www.example.com.
For SEO purposes you never want to have duplicate content (identical pages on different URLs), there should always be exactly one URL for your content, all other possible URLs should redirect to that one.
Updating your robots.txt will definitely solve the problem in the future, but I think the question you should be asking is, How did Google know those pages were there?
First, you should ensure that a user can't traverse your site's filesystem (if your server is *nix, .htaccess should have something like Options -Indexes). And if you had a public link anywhere that joined the two sites on a single domain, that could be how Google found it. If you are careful to keep your site clean and never point to the files in the other docroot, there should be no problem hosting one domain off the subdirectory of another domain.
You can clear Google's index of those pages by using their Webmaster Tools. In order to identify yourself as the site's owner, you'll need to install a unique file (they create it for you) in the root directory of your various document roots, then you can manually update the parts of your site that they've indexed. This applies only to Google.
If you've been indexed by other search engines (and you probably have been if Google indexed you), you should try to figure out how they got there, fix the problem, move the second site to another folder (causing the pages to report 404 Page Not Found on your main domain) and then get the the search engines to reindex.
If you are using Linux, then some additions to your .htaccess file would probably work, but the specifics would depend on your site setup.

need help with 301 redirect and seo urls

Ok, i used the below to "seoize" my urls. It works great..the only problem is when i go to the old page it doesnt redirect to the new page.. so i have a feeling i will get two pages indexed in google... how can i just permenantly redirect the old pages eto new urls...
RewriteRule ^city/([^/]+)/([^/]+) /rate-page.php?state=$1&city=$2 [NC]
http: / / www.ratemycommunity.com/city/Kansas/Independence
and old page = http://www.ratemycommunity.com/rate-page.php?state=Kansas&city=Independence
The problem is that the ugly url must be visible, as you need them for the rewrite. Just don't make any links to the ugly urls.
If search engines already know about the ugly urls, you can add another query parameter, say show=yes.
In the rewrite rule, ensure that you have the last parameter show=yes. If not, redirect to the nice url, which in turn will rewrite to the ugly url with the last parameter. Then, never link externally to the ugly url with the show=yes parameter.
Example:
/rate-page.php?state=Somestate&city=Somecity&show=yes
Accessing this page will show the content, but you must not make that link visible from anywhere.
/city/Somestate/Somecity
should be rewritten to /rate-page.php?state=Somestate&city=Somecity&show=yes, and
/rate-page.php?state=Somestate&city=Somecity
should be redirected to /city/Somestate/Somecity
The best thing to do is use cannonicalization, a recently introduced page tagging concept that tells Google and other crawlers what you want to be the URL of record. Check out this documentation and video by Google SEO guru Matt Cutts.
In your case, it will look like this:
<link rel="canonical" href="http://www.ratemycommunity.com/city/Kansas/Independence"/>

Using Apache mod_rewrite to remove sub-directories from URL

I'm managing an instance of Wordpress where the URLs are in the following format:
http://www.example.com/example-category/blog-post-permalink/
The blog author did an inconsistent job of adding categories to posts, so while some of them had legitimate categories in their URLS, at least half are "uncategorised".
I can easily change Wordpress to render the URL without the category name (e.g., http://www.example.com/blog-post-permalink/), but I'd like to create a mod_rewrite rule to automatically redirect any requests for the previous format to the new, cleaner one.
How can I use a mod_rewrite recipe to handle this, taking into account that I want to honor requests for the real WordPress directories that are in my webroot?
Something as simple as:
RewriteRule ^/[^/]+/([^/]+)/?$ /$2 [R]
Perhaps would do it?
That simple redirects /foo/bar/ to /bar.