I changed my file extensions, but google hasn't changed - seo

Well, I changed my file extensions to .shtml and google still has them indexed as .html. I've tried uploading a sitemap and other things to google webmaster tools but with no luck. Is there any other way I can force google to crawl my site? Im guessing thats the only way the links will update.

Have you set up redirection from the old .html filenames to the new .shtml filenames? You should do that - Google will pick up your changes more quickly (because it doesn't have to discover all your pages again) and it will transfer PageRank from the old URLs to the new ones. There are a million descriptions on the net about how to do that.
But having said that, can't you just keep the .html filenames and do some trickery with .htaccess (or equivalent; you don't say which web server you're using) to make the .html files respect SSI directives? This:
AddType text/html .html
AddHandler server-parsed .html
ought to do it for Apache.

You can try setting your change frequency to always or something more frequent to get indexed sooner. I make no guarantees though. Otherwise, you're just going to have to wait like the rest of us.

If your website gets some traffic then there's a good chance that Googlebot has already visited. My website isn't big and I get crawled daily. You can be pretty sure that Google already knows your new URLs.
The thing is that Google only updates it's indexes every 3-4 weeks or so. There are a few exceptions to that, such as regularly updated blogs and forums where new threads and posts get picked up very fast. But usually it can take a few weeks until your new URLs are in the main seach index.
Oh and a tip: If you have rewrite rules to redirect your old .html addresses to your new .shtml addresses, make sure you use 301 redirects and not 302 redirects. With 301 redirects, Google will assign the "link juice" that the old URL has to the new URL. With 302 redirects it does not do that and you could possibly loose pagerank.

Google crawls the web periodically, just wait for a while, it will re-index your site.

Related

robots.txt which folders to disallow - SEO?

I am currently writing my robots.txt file and have some trouble deciding whether I should allow or disallow some folders for SEO purposes.
Here are the folders I have:
/css/ (css)
/js/ (javascript)
/img/ (images i use for the website)
/php/ (PHP which will return a blank page such as for example checkemail.php which checks an email address or register.php which puts data into a SQL database and sends an email)
/error/ (my error 401,403,404,406,500 html pages)
/include/ (header.html and footer.html I include)
I was thinking about disallowing only the PHP pages and let the rest.
What do you think?
Thanks a lot
Laurent
/css and /js -- CSS and Javascript files will probably be crawled by googlebot whether or not you have them in robots.txt. Google uses them to render your pages for site preview. Google has asked nicely that you not put them in robots.txt.
/img -- Googlebot may crawl this even when in robots.txt the same way as CSS and Javascript. Putting your images in robots.txt generally prevents them from being indexed in Google image search. Google image search may be a source of visitors to your site so you may wish to be indexed there.
/php -- sounds like you don't want spiders hitting the urls that perform actions. Good call to use robots.txt
/error -- If your site is set up correctly the spiders will probably never know what directory your error pages are served from. They generally get served at the url that has the error and the spider never sees their actual url. This isn't the case if you redirect to them, which isn't recommended practice anyway. As such, I would say there is no need to put them in robots.txt

301 Redirect in .htaccess for re-submitting URL-s

I want to ask how do I redirect Search Engines to take a second look on my new, fresh, rewritten URL-s?
So, my former URL-s were structured like this :
http://www.sample.com/tutorials.php?name=something
and now they look much more cleaner and better :
http://www.sample.com/tutorials/programming/something.php
So, as I said, I want Google (and other engines) to take a look at my new links, which are much more SEO friendly and for that I will be indexed better.
I was told the 301 redirect method was the best, but I don't have a clue what is it, how it works and where to learn how to use it. So, I am asking you.
Side note : Would updating my sitemap.xml file and re-submitting it to Google Webmaster Tools help in this process?
Thanks in advance!
There are 2 kinds (in this context) redirects. When a client, be it a browser, search engine indexing bot, or whatever, requests a URI, the server can tell the client "What you are looking for exists, but it's somewhere else". In the case of a 302 or temporary redirect, it's essentially telling the client "What you are looking for exists, but it's temporarily over here at this URL". In the case of a 301 or permanent redirect, it's essentially telling the client "What you are looking for exists, but it has permanently moved over to this URL".
In the case of the later, browsers, proxy servers, and search engine indexes know that the old URL is no longer valid and to stop using it, and from now on to use the new URL that was returned by the server via a 301 redirect. In the case of a search engine like Google, it has an index of the old URL and all the data that its accumulated over the lifetime of that URL assoicated with it. When one of its bots sees a 301, it knows that the old URL, and its content, isn't gone, but it just permanently moved to another URL. All of the associated data Google has collected for the old URL gets trasnfered to the new URL. Google can probably figure most of this stuff out without a 301 redirect, but it's a sure way to make sure Google has got a right.
You can do such a redirect via mod_rewrite:
RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /tutorials\.php\?name=([^&\ ]+)
RewriteRule ^ /tutorials/programming/%1.php [L,R=301]
You should put this near the top of the htaccess file in your document root. The condition checks that an actual request has been made for /tutorials.php with a query string name="something". The "something" part gets grouped by the match and is accessed via the %1 backreference.
The 301 redirect is a response that the server can make which signals to the user (or search engine) that the page they are looking for has been permanently moved to a specified other page. It is possible to configure apache to give a 301 for certain urls, but it is probably easier to have the whatever server-side language you are using take the request, and then issue a 301.
The chances are that Google will work out what is going on fairly quickly without 301s or anything else, but submitting a sitemap to them or using the URL Parameters functionality in Google's Webmaster Tools might help.

should redirect to www cause a one second delay?

I am using Magento 1.6.2.0 on a shared host running Litespeed web server, and I have begun investigating ways to speed up page loads. Currently I am using Pingdom to look at requests and it appears that I am losing an entire second from the get-go when I type my URL without the www. The browser redirects to the www page, it's just that it takes so long. Is this something I can fix? I presume that I can change Magento's base-url to not include the www, but then I presume I'll have the same delay when going to the www url instead.
I took a look at the link you gave, and I indeed see an about 1 second delay before I receive a 302 redirect to the URL with www. prepended. Not entirely coincidentally, the actual page HTML also takes quite long (about 1.7 seconds) to load.
This is a fairly common issue with large web applications: to return even a simple response like a redirect, the entire application must load and run its startup code. Couple this with a not so fast shared webserver that isn't optimized for that one application, and you can get quite slow page load times. It's nothing unique to Magento; I've seen the same effect with MediaWiki myself, and I expect that it happens with other applications too.
The obvious solution is just to avoid redirects: as long as you make sure all your URLs have the right hostname, the extra delay due to wrong hostnames will not appear. Magento itself will presumably take care of this for its own URLs, but if you have any other code (or static pages) that link to your Magento URLs, make sure they use the right hostname.
You can also sign up for Google Webmaster Tools (and similar tools for other search engines) and configure your preferred domain there (it's under Site configuration → Settings) so that Google will automatically prepend www. to any links to your site it indexes.
You can (and should) also try to reduce Magento's startup time in general. This will speed up not only redirects, but all other page loads as well. I'm not familiar enough with Magento to be able to give much detailed advice on this, but the obvious first step for any massive PHP application is to make sure you're using a PHP accelerator such as APC.
Finally, the fastest way to redirect visitors to the correct hostname is to make your webserver send the redirect directly without ever invoking Magento at all. The details on how to do this depend on the server software you're using, but apparently LiteSpeed supports the same RewriteRule syntax as Apache's mod_rewrite, so you should be able to do this just by adding the following lines to your main .htaccess file:
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.mmmspeciosa\.com$ [NC]
RewriteRule ^(.*)$ http://www.mmmspeciosa.com/$1 [R=301,L]
(By the way, I'm using HTTP 301 permanent redirects here instead of the HTTP 302 temporary redirects Magento seems to be using. This is not only more appropriate according to the HTTP standard, but also works better with search engines, which treat a 301 redirect as an indication to index the target URL instead of the source of the redirect. If this redirect type is not configurable in Magento, I would consider it a bug. If it is configurable, you should set it to 301.)

Massive URL Change

We need to make changes to an app that will cause all its URLS to change, we don't want to lose value, and have too many urls redirect to 301. I am looking to change a mod rewritten URL to a non-written one.
My thoughts would be to
Leave the mod rewritten URLS active (Temporarily)
Place a canonical tag with the NEW correct URL
Make sure no links are currently linking to old URLS - all internal links updated etc
Make sure our robots.txt and sitemap submissions are updated to date.
Would a massive change in URLs - even if backed up by canonical URLs and updated sitemap.xml - have a negative affect on listings in google?
What are peoples thoughts / experience in this?
Thinking about it, if you're using mod_rewrite and are wanting to switch to a non mod rewritten URL then the chances are you can make the changes purely by adding the 301 response to the end of your rewrite rule to make something like this:
RewriteRule ^whatever/(.*)$ http://www.domain.com/$1/ [R=301,L]
Actually, a 301 redirect should not impact your search ranking - it's exactly how you're supposed to do that kind of thing and it's search engine independant. The "canonical" header is an invention of Google and has the disadvantage that people still using the old URLs from outside links will not be redirected and thus keep using the old URLs in links and bookmarks.
Using a permanent HTTP redirect is the best solution for both, your users and the search engines.
I'd also be interested to know then on this topic how these 301's should be handled, in .htaccess or at a code level? Surely 100's of 301's in a .htaccess is too many?

Google found my backup web site. What can I do about It?

A few days ago we replaced our web site with an updated version. The original site's content was migrated to http://backup.example.com. Search engines do not know about the old site, and I do not want them to know.
While we were in the process of updating our site, Google crawled the old version.
Now when using Google to search for our web site, we get results for both the new and old sites (e.g., http://www.example.com and http://backup.example.com).
Here are my questions:
Can I update the backup site content with the new content? Then we can get rid all of old content. My concern is that Google will lower our page ranking due to duplicate content.
If I prevent the old site from being accessed, how long will it take for the information to clear out of Google's search results?
Can I use google disallow to block Google from the old web site.
You should probably put a robots.txt file in your backup site and tell robots not to crawl it at all. Google will obey the restrictions though not all crawlers will. You might want to check out the options available to you at Google's WebMaster Central. Ask Google and see if they will remove the errant links for you from their data.
you can always use robot.txt on backup.* site to disallow google to index it.
More info here: link text
Are the URL formats consistent enough between the backup and current site that you could redirect a given page on the backup site to its equivalent on the current one? If so you could do so, having the backup site send 301 Permanent Redirects to each of the equivalent pages on the site you actually want indexed. The redirecting pages should drop out of the index (after how much time, I do not know).
If not, definitely look into robots.txt as Zepplock mentioned. After setting the robots.txt you can expedite removal from Google's index with their Webmaster Tools
Also you can make a rule in your scripts to redirect with header 301 each page to new one
Robots.txt is a good suggestion but...Google doesn't always listen. Yea, that's right, they don't always listen.
So, disallow all spiders but....also put this in your header
<meta name="robots" content="noindex, nofollow, noarchive" />
It's better to be safe than sorry. Meta commands are like yelling at Google "I DONT WANT YOU TO DO THIS TO THIS PAGE". :)
Do both, save yourself some pain. :)
I suggest you to either add no index meta tag in all old page or just disallow by robots.txt. Best way to just blocked the by robots.txt. One thing more add the sitemap in new site and submit it in webmaster that improve your new website indexing.
Password protect your webpages or directories that you don't want web spiders to crawl/index by putting password protecting code in the .htaccess file (if present in your website's root directory on the server or create a new one and upload it).
The web spiders will never know that password and hence won't be able to index the protected directories or web pages.
you can block any particular urls in webmasters check once...even you can block using robots.txt....remove sitemap for your old backup site and put noindex no follow tag for all of your old backup pages...i too handled this situation for one of my client............