So much crawling errors in Webmaster Tools - seo

most of these errors are due to the older pages that do not exists now..so there is no scope for them on my site now..pls help

Broken links do hurt SEO a lot. But do not worry much about the old Webmaster Tools data. Do care more about current broken links in the website. Use some tools like Xenu Broken Link checker or Screamingfrog broken link checker to find. If any, fix it. That will enhance user experience better. Webmaster tool data generally be old and sometimes you can ignore some errors if it is not relevant. It will remove the data soon. But do check Webmaster Tool data often to notice the site improvements and errors.

Related

Yesterday, Site on Page1, Today, Site Gone

I understand the whole work of SEO etc. However my site has been steadily holding page 2 for months. I had time to add contextual links, and site went direct to page 1 yesterday. This site and URL now can't be found at all. I should think if anything it would get back to page 2.
Thanks a lot for adding thoughts?
Thanks a lot,
Byron
You have not mentioned anything important to answer your question.
Anyway, assuming that you are searching in Google for your website, I have one suggestion.
Do check your site health in webmaster tools. And also read the basic seo optimization guides. Enhance the webpages with HTML5 technologies and Schema. Have patience.

Hiding a page part from Google, does it hurt SEO?

We all know that showing inexistent stuff to Google bots is not allowed and will hurt the search positioning but what about the other way around; showing stuff to visitors that are not displayed for Google bots?
I need to do this because I have photo pages each with the short title and the photo along with textarea containing the embed HTML code. googlebot is taking the embed code and putting it at the page description on its search results which is very ugly.
Please advise.
When you start playing with tricks like that, you need to consider several things.
... showing stuff to visitors that are not displayed for Google bots.
That approach is a bit tricky.
You can certainly check User-agents to see if a visitor is Googlebot, but Google can add any number of new spiders with different User-agents, which will index your images in the end. You will have to constantly monitor that.
Testing of each code release your website will have to check "images and Googlebot" scenario. That will extend testing phase and testing cost.
That can also affect future development - all changes will have to be done with "images and Googlebot" scenario in mind which can introduce additional constraints to your system.
Personally I would choose a bit different approach:
First of all review if you can use any methods recommended by Google. Google provides a few nice pages describing that problem e.g. Blocking Google or Block or remove pages using a robots.txt file.
If that is not enough, maybe restructuring of you HTML would help. Consider using JavaScript to build some customer facing interfaces.
And whatever you do, try to keep it as simple as possible, otherwise very complex solutions can turn around and bite you.
It is very difficult to give you very good advise without knowledge of your system, constraints and strategy. But I hope my answer will help you out to choose good architecture / solution for your system.
Boy, you want more.
Google does not because of a respect therefore judge you cheat, he needs a review, as long as your purpose to the user experience, the common cheating tactics, Google does not think you cheating.
just block these pages with robots.txt and you`ll be fine, it is not cheating - that's why they came with solution like that in the first place

Is there a way that is more efficient than sitemap to add/force recrawl/remove your website's index entries in google?

Pretty much that is the question. Is there a way that is more efficient than the standart sitemap.xml to [add/force recrawl/remove] i.e. manage your website's index entries in google?
I remember a few years ago I was reading an article of an unknown blogger that was saying that when he write news in his website, the url entry of the news will appear immediately in google's search result. I think he was mentioning about something special. I don't remember exactly what.. . some automatic re-crawling system that is offered by google themselves? However, I'm not sure about it. So I ask, do you think that I am blundering myself and there is NO OTHER way to manage index content besides sitemap.xml ? I just need to be sure about this.
Thank you.
I don't think you will find that magical "silver bullet" answer you're looking for, but here's some additional information and tips that may help:
Depth of crawl and rate of crawl is directly influenced by PageRank (one of the few things it does influence). So increasing your site's homepage and internal pages back-link count and quality will assist you.
QDF - this Google algorithm factor, "Query Deserves Freshness", does have a real impact and is one of the core reasons behind the Google Caffeine infrastructure project to allow much faster finding of fresh content. This is one of the main reasons that blogs and sites like SE do well - because the content is "fresh" and matches the query.
XML sitemaps do help with indexation, but they won't result in better ranking. Use them to assist search bots to find content that is deep in your architecture.
Pinging, especially by blogs, to services that monitor site changes like ping-o-matic, can really assist in pushing notification of your new content - this can also ensure the search engines become immediately aware of it.
Crawl Budget - be mindful of wasting a search engine's time on parts of your site that don't change or don't deserve a place in the index - using robots.txt and the robots meta tags can herd the search bots to different parts of your site (use with caution so as to not remove high value content).
Many of these topics are covered online, but there are other intrinsic things like navigational structure, internal linking, site architecture etc that also contribute just as much as any "trick" or "device".
Getting many links, from good sites, to your website will make the Google "spiders" reach your site faster.
Also links from social sites like Twitter can help the crawlers visit your site (although the Twitter links do not pass "link juice" - the spiders still go through them).
One last thing, update your content regularly, think of content as "Google Spider Food". If the spiders will come to your site, and will not find new food, they will not come back again soon, if each time they come, there is new food, they will come a lot. Article directories for example, get indexed several times a day.

How do I make sure my website can block automation scripts, bots?

I'd like to make sure that my website blocks automation tools like Selenium and QTP. Is there a way to do that ?
What settings on a website is Selenium bound to fail with ?
With due consideration to the comments on the original question asking "why on earth would you do this?", you basically need to follow the same strategy that any site uses to verify that a user is actually human. Methods such as asking users to authenticate or enter text from images or the like will probably work, but this will likely have the effect of blocking google crawlers and everything else.
Doing anything based on user agent strings or anything like that is mostly useless. Those are trivial to fake.
Rate-limiting connections or similar might have limited effectiveness, but it seems like you're going to inadvertently block any web crawlers too.
While this questions seems to be strange it is funny, so I tried to investigate possibilities
Besides adding a CAPTCHA which is the best and the only ultimate solution, you can block Selenium by adding the following JavaScript to your pages (this example will redirect to the Google page, but you can do anything you want):
<script>
var loc = window.parent.location.toString();
if (loc.indexOf("RemoteRunner.html")!=-1) {
// It is run in Selenium RC, so do something
document.location="http://www.google.com";
}
</script>
I do not know how can you block other automation tools and I am not sure if this will not block Selenium IDE
to be 100% certain that no automated bots/scripts can be run against your websites, don't have a website online. This will meet your requirement with certainty.
CAPTCHA are easy to break if not cheap, thanks to crowdsourcing and OCR methods.
Proxies can be found in the wild for free or bulk are available at extremely low costs. Again, useless to limit connection rates or detect bots.
One possible approach can be in your application logic, implement ways to increase time and cost for access to the site by having things like phone verification, credit card verification. Your website will never get off the ground because nobody will trust your site at it's infancy.
Solution: Do not put your website online and expect to be able to effectively eliminate bots and scripts from running.

How to Develop a Successful Sitemap

I have been browsing around on the internet and researching effective sitemap web pages. I have encountered these two sitemaps and questioning their effectiveness.
http://www.webanswers.com/sitemap/
http://www.answerbag.com/sitemap/
Are these sitemaps effective?
Jeff Atwood, (One of the guys who made this site) wrote a great article on the importance of sitemaps.
I'm a little aggravated that we have
to set up this special file for the
Googlebot to do its job properly; it
seems to me that web crawlers should
be able to spider down our simple
paging URL scheme without me giving
them an explicit assist.
The good news is that since we set up
our sitemaps.xml, every question on
Stack Overflow is eminently findable.
But when 50% of your traffic comes
from one source, perhaps it's best not
to ask these kinds of questions.
So yeah, effective for people, or effective for google?
I would have thought a HTML sitemap should be useful to a human, whereas these 2 sites aren't. If you're trying to target a search engine then a sitemap.xml file that conforms to sitemaps.org would be a better approach. Whilst the html approach would work it's easier to generate a xml file and have your robots.txt file pointing at this.