robots.txt: disallow all but a select few, why not? [closed] - seo

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've been thinking a while about disallowing every crawler except Ask, Google, Microsoft, and Yahoo! from my site.
The reasoning behind this is that I've never seen any traffic being generated by any of the other web-crawlers out there.
My questions are:
Is there any reason not to?
Has anybody done this?
Did you notice any negative effects?
Update:
Up till now I used the blacklist approach: if I do not like the crawler, I add them to the disallow list.
I'm no fan of blacklisting however as this is a never ending story: there are always more crawlers out there.
I'm no so much worried about the real ugly misbehaving crawlers, they are detected and blocked automatically. (and they typically do no ask for robots.txt anyhow :)
However, many crawlers are not really misbehaving in any way, they just do not seem to generate any value for me / my customers.
There are for example a couple of crawlers that power website who claim they will be The Next Google; Only Better. I've never seen any traffic coming from them and I'm quite sceptical about them becoming better than any of the four search engines mentioned above.
Update 2:
I've been analysing the traffic to several sites for some time now, and it seems that for reasonable small sites, 100 unique human visitors a day (=visitors that I cannot identify as being not human). About 52% of the generated traffic is by automated processes.
60% of all automated visitors is not reading robots.txt, 40% (21% of total traffic)
does request robots.txt. (this includes Ask, Google, Microsoft, and Yahoo!)
So my thinking is, If I block all the well behaved crawlers that do not seem to generate any value for me, I could reduce the bandwidth use and server load by around 12% - 17%.

The internet is a publishing mechanism. If you want to whitelist your site, you're against the grain, but that's fine.
Do you want to whitelist your site?
Bear in mind that badly behaved bots which ignore robots.txt aren't affected anyway (obviously), and well behaved bots are probably there for a good reason, it's just that that's opaque to you.

Whilst other sites that crawl your sites might not be sending any content your way, its possible that they themselves are being indexed by google et al, and so adding to your page rank, blocking them from your site might affect this.

Is there any reason not to?
Do you want to be left out of something which could be including your site which you have no knowledge of and is indirectly bringing a lot of content your way.
If some strange crawlers are hammering your site and eating your bandwidth you may want to, but it is quite possible that such crawlers wouldn’t honour your robots.txt either.
Examine your log files and see what crawlers you have and what proportion of your bandwidth they are eating. There may be more direct ways to block traffic which is bombarding your site.

This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory.

My only worry is that you may miss the next big thing.
There was a long period where AltaVista was the search engine. Possibly even more than Google is now. (there was no bing, or Ask, and Yahoo was a directory, rather than a search engine as such). Sites that blocked all but Altavista back then would have never seen traffic from Google, and therefore never known how popular it was getting, unless they heard about it from another source, which might have put them at a considerable disadvantage for a while.
Pagerank tends to be biased towards older sites. You don't want to appear newer than you are because you were blocking access via robots.txt for no reason. These guys: http://www.dotnetdotcom.org/ may be completely useless now, but maybe in 5 years time, the fact that you weren't in their index now will count against you in the next big search engine.

Related

What impact does having multiple domain names for a site, have on SEO rankings? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have searched but not really found anything clear on the matter from what I have read so far, what impact does having your domain name across multiple tlds (e.g. mycompany.com and mycompany.fr and mycompany.es) have on your rankings?? I'm being told having them point to the same content is likely to get the site shot down by google.
Google doesn't have a parked domain detector according to Matt Cutts, so if the domain names simply all point to one location it won't hurt you.
However, if you have duplicate content that's another story. In your example it sounds like you might have multiple sites that all have the same content, but are different domain names.
Matt Cutts, the head of Google's Webspam team, claims that duplicate content will not hurt your ranking. You can watch that video here
He gives the disclaimer that it can hurt if it's "spammy" without going into very specific detail what that actually means. In my experience (I've had about 5-6 clients that did this) Google would typically look at one of their domains and ignore the duplicates, but not hurt their main site. The only exception to this is if one of the sites that isn't your main one starts getting more backlinks or traffic and then Google sees it as more relevant and then ignores your main site's content... Google's going to favor the duplicate that appears the most relevant.
I'm pretty cautious about duplicate content though because it has the possibility of hurting your site if Google thinks it's "spamy" and they change their algorithm so frequently now that its hard to keep up.
My recommendation is set up the other domain names as parked domains instead of duplicating the site. As you build up any backlinks focus on linking to just one domain name too.
Yes, if these serve the same content, it will sooner or later trigger a content issue or some kind of manually penalty. If Google finds out you own all those domain names (or they belong to a small network of owners), then they will take action for sure. The penalty will sink you in SERPs.
It is not natural to have many domain names sharing the same content. It does not happen by accident and there is no good reason one would need to achieve this.
I would never recommend using different ccTLDs for the same content in the same languages.
However, if the websites are localized, you can use hreflang and "connect" each version of a page with appropriate language. Check this link: https://support.google.com/webmasters/answer/189077?hl=en

SEO and Internal Links: Which is Better if Matters? Absolute or Relative [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
I am charging SEO of my company's SEO, which I really hate. I believe a web site with decent web design and semantic code(structure), spiced up with attractive content is the best thing we should do. Yet, we are still far from there, in me case especially. So usually I take a very close look at other sites, their design, code, etc. And I suspect I got paranoid on this.
Today, I find a highly respected site which is using absolute internal links while we are using relative links. As far as I know, it does not matter, but I can not help asking you guys to make sure about this.
If this is a ridiculous question, then I am sorry. As I said I become a paranoia.
Taken from the Search Engine Optimisation FAQ at the SitePoint Forums:
Should I use relative links or absolute links?
Absolute links. It is recommended by Google as it is possible for crawlers to miss some relative links.
If I can find the link that Google states this I'll update this post.
EDIT: This might be what the post is referring to, but I've stated my reasons as to why this might be correct in the comments.
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35156
I never heard or seen anything that indicates it matters. All you're likely to do is complicate your development. The "highly respected" site is getting good ranking because it's popular, that's all.
It's pretty well a given that search engines store the full path at some point, it's unlikely they wouldn't perform this conversion during the crawl process to remove duplicates.
I don't really follow your logic anyway. You know good structure, relevant content and popularity are the key to ranking so what makes you think you'll gain anything by spending even a minute on random optimisations like this?
I highly doubt Google will be missing any relative links. Apparently the latest version of their Crawler will even execute some javascript. Don't bother with absolute links, instead, great a good sitemap and submit it to google through webmaster tools. Yahoo and Microsoft also allow you to submit your sitemap so it might be worthwhile to look into that too - google it.
I don't think there is an answer to this question, but I will weigh in anyways. Personally,
I think that using absolute URLs is the best. The web is full of crappy content scrapers. Many of the people who wrote these scrapers forget to change the original URLs (in absolute links) before they post the content onto their own page. So, in that regard, absolute URLs can turn into a really dodgy way to get a couple extra links.
If I follow that, it seems logical that absolute links would also be a great indicator of duplicate content caused by content scrapers.
A couple of years ago, I did some research into what happens to a page's search rankings when you dramatically change content/navigation (ie - in the case of a dramatic re-design). At that point, I found that having absolute URLs seemed to spook Google a little less. But, there were some problems with my research:
a) The 'absolute URL bonus' was barely quantifiable (an average of less than two positions of difference)
b) The 'absolute URL bonus' only lasted a few weeks before Google settled down and started treating both pages the same
c) The research is two years old and the Google algorithm has changed dramatically in that time
When I add a and b together, I'm left with a very unsettled feeling. Google gets a little weird from time to time, so the bonus may have been a fluke that I attributed to absolute URLs. Good old experimental bias.....Either way though, the difference was so slight and lasted for such a short time that I don't think it is worth spending a whole lot of extra time making absolutes!
Best of luck with your site
Greg

Does hosting overseas affect seo, and where does this leave cloud computing? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Does hosting overseas affect seo, and where does this leave cloud computing?
Google has said that location of the server is one of the factors they use to determine location determined results. You can use Google Webmaster tools to officially declare your targeted location and that removes all doubts and overrides other factors. That's friendly to cloud hosting as well.
SEO is a bit of a dark art and I am unsure as to whether it is actually worth it or not. I don't know if anyone who doesn't work for the search engines themselves would be able to give a definitive answer.
Failing to ask the search engines themselves (who are probably secretive about their algorithms), if you really want to pay for this sort of information the best SEO companies to ask are probably the ones ranked top on a Google search for "SEO" ;)
Your main concern and by far the biggest influence on search placement would be content. High quality, unique and well written content will rank you highest. It sometimes takes time though for your content to soak into the wider web, so patience is also key. Keep creating good consistent unique content and it will eventually climb its way up.
Even if server location does affect placement (which doesn't make much sense to me), then as cloud computing becomes more popular and mainstream the search engines would change their algorithms to take this into account.
So nothing to worry about really. Stay away from micro optimisations and spend your efforts on content.
The location of the server is factored by Google in IFF:
a) your tdl is not a country one (so, it 's a com,org,net and not for example co.uk or .de)
b) In webmaster tools, you have not specified the location of your site.
It's not really possible to determine where a server is based on IP. Many front end loadbalancers are geographically redundant, and modern routing protocols account for this.
I doubt that google will use this to determine webserver ranking.

What are the common sense SEO practices that aren't dodgy or crap? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
In SEO there are a few techniques that have been flagged that need to avoided at all costs. These are all techniques that used to be perfectly acceptable but are now taboo. Number 1: Spammy guest blogging: Blowing up a page with guest comments is no longer a benefit. Number 2: Optimized Anchors: These have become counterproductive, instead use safe anchors. Number 3: Low Quality Links: Often sites will be flooded with hyperlinks that take you to low quality Q&A sites, don’t do this. Number 4: Keyword Heavy Content: Try and avoid too many of these, use longer well written sections more liberally. Number 5: Link-Back Overuse: Back links can be a great way to redirect to your site but over saturation will make people feel trapped
Content, Content, CONTENT! Create worthwhile content that other people will want to link to from their sites.
Google has the best tools for webmasters, but remember that they aren't the only search engine around. You should also look into Bing and Yahoo!'s webmaster tool offerings (here are the tools for Bing; here for Yahoo). Both of them also accept sitemap.xml files, so if you're going to make one for Google, then you may as well submit it elsewhere as well.
Google Analytics is very useful for helping you tweak this sort of thing. It makes it easy to see the effect that your changes are having.
Google and Bing both have very useful SEO blogs. Here is Google's. Here is Bing's. Read through them--they have a lot of useful information.
Meta keywords and meta descriptions may or may not be useful these days. I don't see the harm in including them if they are applicable.
If your page might be reached by more than one URL (i.e., www.mysite.com/default.aspx versus mysite.com/default.aspx versus www.mysite.com/), then be aware that that sort of thing sometimes confuses search engines, and they may penalize you for what they perceive as duplicated content. Use the link rel="canoncial" element to help avoid this problem.
Adjust your site's layout so that the main content comes as early as possible in the HTML source.
Understand and utilize your robots.txt and meta robots tags.
When you register your domain name, go ahead and claim it for as long of a period of time as you can. If your domain name registration is set to expire ten years from now rather than one year from now, search engines will take you more seriously.
As you probably know already, having other reputable sites that link to your site is a good thing (as long as those links are legitimate).
I'm sure there are many more tips as well. Good luck!
In addition to having quality content, content should be added/updated regularly. I believe that Google (an likely others) will have some bias toward the general "freshness" of content on your site.
Also, try to make sure that the content that the crawler sees is as close as possible to what the user will see (can be tricky for localized pages). If you're careless, your site may be be blacklisted for "bait-and-switch" tactics.
Don't implement important text-based
sections in Flash - Google will
probably not see them and if it does,
it'll screw it up.
Google can Index Flash. I don't know how well but it can. :)
A well organized, easy to navigate, hierarchical site.
There are many SEO practices that all work and that people should take into consideration. But fundamentally, I think it's important to remember that Google doesn't necessarily want people to be using SEO. More and more, google is striving to create a search engine that is capable of ranking websites based on how good the content is, and solely on that. It wants to be able to see what good content is in ways in which we can't trick it. Think about, at the very beginning of search engines, a site which had the same keyword on the same webpage repeated 200 times was sure to rank for that keyword, just like a site with any number of backlinks, regardless of the quality or PR of the sites they come from, was assured Google popularity. We're past that now, but is SEO is still , in a certain way, tricking a search engine into making it believe that your site has good content, because you buy backlinks, or comments, or such things.
I'm not saying that SEO is a bad practice, far from that. But Google is taking more and more measures to make its search results independant of the regular SEO practices we use today. That is way I can't stress this enough: write good content. Content, content, content. Make it unique, make it new, add it as often as you can. A lot of it. That's what matters. Google will always rank a site if it sees that there is a lot of new content, and even more so if it sees content coming onto the site in other ways, especially through commenting.
Common sense is uncommon. Things that appear obvious to me or you wouldn't be so obvious to someone else.
SEO is the process of effectively creating and promoting valuable content or tools, ensuring either is totally accessible to people and robots (search engine robots).
The SEO process includes and is far from being limited to such uncommon sense principles as:
Improving page load time (through minification, including a trailing slash in URLs, eliminating unnecessary code or db calls, etc.)
Canonicalization and redirection of broken links (organizing information and ensuring people/robots find what they're looking for)
Coherent, semantic use of language (from inclusion and emphasis of targeted keywords where they semantically make sense [and earn a rankings boost from SE's] all the way through semantic permalink architecture)
Mining search data to determine what people are going to be searching for before they do, and preparing awesome tools/content to serve their needs
SEO matters when you want your content to be found/accessed by people -- especially for topics/industries where many players compete for attention.
SEO does not matter if you do not want your content to be found/accessed, and there are times when SEO is inappropriate. Motives for not wanting your content found -- the only instances when SEO doesn't matter -- might vary, and include:
Privacy
When you want to hide content from the general public for some reason, you have no incentive to optimize a site for search engines.
Exclusivity
If you're offering something you don't want the general public to have, you need not necessarily optimize that.
Security
For example, say, you're an SEO looking to improve your domain's page load time, so you serve static content through a cookieless domain. Although the cookieless domain is used to improve the SEO of another domain, the cookieless domain need not be optimized itself for search engines.
Testing In Isolation
Let's say you want to measure how many people link to a site within a year which is completely promoted with AdWords, and through no other medium.
When One's Business Doesn't Rely On The Web For Traffic, Nor Would They Want To
Many local businesses or businesses which rely on point-of-sale or earning their traffic through some other mechanism than digital marketing may not want to even consider optimizing their site for search engines because they've already optimized it for some other system, perhaps like people walking down a street after emptying out of bars or an amusement park.
When Competing Differently In An A Saturated Market
Let's say you want to market entirely through social media, or internet cred & reputation here on SE. In such instances, you don't have to worry much about SEO.
Go real and do for user not for robots you will reach the success!!
Thanks!

Getting Good Google PageRank [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
In SEO people talk a lot about Google PageRank. It's kind of a catch 22 because until your site is actually big and you don't really need search engines as much, it's unlikely that big sites will link to you and increase your PageRank!
I've been told that it's easiest to simply get a couple high quality links to point to a site to raise it's PageRank. I've also been told that there are certain Open Directories like dmoz.org that Google pays special attention to (since they're human managed links). Can anyone speak to the validity of this or suggest another site/technique to increase a site's PageRank?
Have great content
Nothing helps your google rank more than having content or offering a service people are interested in. If your web site is better than the competition and solves a real need you will naturally generate more traffic and inbound links.
Keep your content fresh
Use friendly url's that contain keywords
Good: http://cars.com/products/cars/ford/focus/
Bad: http://cars.com/p?id=1232
Make sure the page title is relevant and well constructed
For example: Buy A House In France :. Property Purchasing in France
Use a domain name that describes your site
Good: http://cars.com/
Bad: http://somerandomunrelateddomainname.com/
Example
Type car into Google, out of the top 5 links all 4 have car in the domain: http://www.google.co.uk/search?q=car
Make it accessible
Make sure people can read your content. This includes a variety of different audiences
People with disabilities: Sight, motor, cognitive disabilities etc..
Search bots
In particular make sure search bots can read every single relevant page on your site. Quite often search bots get blocked by the use of javascript to link between pages or the use of frames / flash / silverlight. One easy way to do this is have a site map page that gives access to the whole site, dividing it into categories / sub categories etc..
Down level browsers
Submit your site map automatically
Most search engines allow you to submit a list of pages on your site including when they were last updated.
Google: https://www.google.com/webmasters/tools/docs/en/about.html
Inbound links
Generate as much buzz about your website as possible, to increase the likely hood of people linking to you. Blog / podcast about your website if appropriate. List it in online directories (if appropriate).
References
Google Search Engine Ranking Factors, by an SEO company
Creating a Google-friendly site: Best practices
Wikipedia - Search engine optimization
Good content.
Update it often.
Read and digest everything at Creating a Google-friendly site: Best practices.
Be active on the web. Comment in blogs, correspond genuinely with people, in email, im, twitter.
I'm not too sure about the domain name. Wikipedia? What does that mean? Mozilla? What word is that? Google? Was a typo. Yahoo? Sounds like that chocolate drink Yoohoo.
Trying to keyword the domain name shoehorns you anyway. And it can be construed as a SEO technique in the future (if it isn't already!)
Answer all email. Answer blog comments. Be nice and helpful.
Go watch garyvee's Better Than Zero. That'll motivate you.
If it's appropriate, having a blog is a good way of keeping content fresh, especially if you post often. A CMS would be handy too, as it reduces the friction of updating. The best way would be user-generated content, as other people make your site bigger and updated, and they may well link to their content from their other sites.
Google doesn't want you to have to engineer your site specifically to get a good PageRank. Having popular content and a well designed website should naturally get you the results you want.
A easy trick is to use
Google webmaster tool https://www.google.com/webmasters/tools
And you can generate a sitemap using http://www.xml-sitemaps.com/
Then, don't miss to use www.google.com/analytics/
And be careful, most SEO guides are not correct, playing fair is not always the good approach. For example,everyone says that spamming .edu sites is bad and ineffective but it is effective.