Could you explain how Google bot's visit frequency is determined? - indexing

Dear fellow developers,
I am operating a website for my small research items.
I found one weird thing regarding my website, nowadays, and would like to ask a question.
Although my website has never been updated often (probably, once or twice a month), Google bots visited my website almost 10 ~ 20 times everyday.
So, my website has almost nothing to crawl and index, the bots kept visiting my website.
Is there anyone who can tell me why?
How is the frequency of the bots' visits determined?
I am well aware of the fact that Google did not expose the technical details of the crawlers but any wild guess will be welcomed.
Thank you for your answer, in advance

Related

Yesterday, Site on Page1, Today, Site Gone

I understand the whole work of SEO etc. However my site has been steadily holding page 2 for months. I had time to add contextual links, and site went direct to page 1 yesterday. This site and URL now can't be found at all. I should think if anything it would get back to page 2.
Thanks a lot for adding thoughts?
Thanks a lot,
Byron
You have not mentioned anything important to answer your question.
Anyway, assuming that you are searching in Google for your website, I have one suggestion.
Do check your site health in webmaster tools. And also read the basic seo optimization guides. Enhance the webpages with HTML5 technologies and Schema. Have patience.

Does changing query string in Destination URL via Adwords API trigger editorial?

This is an ongoing debate at my company. We run massive Adwords accounts. I need to automate adding or changing query string parameters on large numbers of keywords. Our SEM team is nervous that this could cause the Adwords editorial staff to disable large numbers of keywords while they review the changes (effectively killing traffic).
I can understand why changing the domain or page in the URL might cause this. But changing the query string should not, I would think.
Can anybody confirm/deny this possibility?
UPDATE 2013-03-19
I'm not sure this qualifies as an "answer". I posted this same question on Google Groups and, as usual, Google was noncommittal. However, this is what I have gathered so far (I'm sharing for posterity):
First of all, "editorial" doesn't seem to be a term that is widely accepted at Google. I'm saying this based on our discussions with our account reps. The term didn't mean much to them and required clarification. So, to clarify, when I say "editorial" I mean Google's process of reviewing your keyword term, ad copy, landing page, and potentially resetting statistics (quality score). This is mostly automated in their system from what I can tell. However, there are times when it appears that human beings actually get involved in the process.
And now the answer: It seems that modifying keyword destination URLs does not cause manual review nor reset of statistics. Possible exceptions are trademarked terms or pharmaceutical terms. I translate that more broadly as "any terms that have special rules".
NOTE: Folks in the Google Groups seem to all agree that modifying URLs at the Creative level DOES reset statistics. So tread carefully.
Here is the Google Group thread.
UPDATE 2013-07-11
I'm just talking to myself now. I feel so alone.
I just received the "tumbleweed" award for this post.
The Adwords team finally told us that modifying Destination URLS should not affect traffic.
So, we modified several hundred thousand Destination URLs and our SEM Team reported that some of their high volume keywords stopped getting traffic for about 10 days and then magically came back. No explanation. The Adwords team dug up some "expert" from the bowels of their staff and subsequently told us that modifying Destination URLs does affect traffic.
However, we were also transitioning these accounts to Enhanced Campaigns at the same time. So the results are inconclusive in my mind.
I don't think even the Adwords team knows how Adwords works.
It's cold here. Need to make a fire...

Is there a way that is more efficient than sitemap to add/force recrawl/remove your website's index entries in google?

Pretty much that is the question. Is there a way that is more efficient than the standart sitemap.xml to [add/force recrawl/remove] i.e. manage your website's index entries in google?
I remember a few years ago I was reading an article of an unknown blogger that was saying that when he write news in his website, the url entry of the news will appear immediately in google's search result. I think he was mentioning about something special. I don't remember exactly what.. . some automatic re-crawling system that is offered by google themselves? However, I'm not sure about it. So I ask, do you think that I am blundering myself and there is NO OTHER way to manage index content besides sitemap.xml ? I just need to be sure about this.
Thank you.
I don't think you will find that magical "silver bullet" answer you're looking for, but here's some additional information and tips that may help:
Depth of crawl and rate of crawl is directly influenced by PageRank (one of the few things it does influence). So increasing your site's homepage and internal pages back-link count and quality will assist you.
QDF - this Google algorithm factor, "Query Deserves Freshness", does have a real impact and is one of the core reasons behind the Google Caffeine infrastructure project to allow much faster finding of fresh content. This is one of the main reasons that blogs and sites like SE do well - because the content is "fresh" and matches the query.
XML sitemaps do help with indexation, but they won't result in better ranking. Use them to assist search bots to find content that is deep in your architecture.
Pinging, especially by blogs, to services that monitor site changes like ping-o-matic, can really assist in pushing notification of your new content - this can also ensure the search engines become immediately aware of it.
Crawl Budget - be mindful of wasting a search engine's time on parts of your site that don't change or don't deserve a place in the index - using robots.txt and the robots meta tags can herd the search bots to different parts of your site (use with caution so as to not remove high value content).
Many of these topics are covered online, but there are other intrinsic things like navigational structure, internal linking, site architecture etc that also contribute just as much as any "trick" or "device".
Getting many links, from good sites, to your website will make the Google "spiders" reach your site faster.
Also links from social sites like Twitter can help the crawlers visit your site (although the Twitter links do not pass "link juice" - the spiders still go through them).
One last thing, update your content regularly, think of content as "Google Spider Food". If the spiders will come to your site, and will not find new food, they will not come back again soon, if each time they come, there is new food, they will come a lot. Article directories for example, get indexed several times a day.

robots.txt: disallow all but a select few, why not? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've been thinking a while about disallowing every crawler except Ask, Google, Microsoft, and Yahoo! from my site.
The reasoning behind this is that I've never seen any traffic being generated by any of the other web-crawlers out there.
My questions are:
Is there any reason not to?
Has anybody done this?
Did you notice any negative effects?
Update:
Up till now I used the blacklist approach: if I do not like the crawler, I add them to the disallow list.
I'm no fan of blacklisting however as this is a never ending story: there are always more crawlers out there.
I'm no so much worried about the real ugly misbehaving crawlers, they are detected and blocked automatically. (and they typically do no ask for robots.txt anyhow :)
However, many crawlers are not really misbehaving in any way, they just do not seem to generate any value for me / my customers.
There are for example a couple of crawlers that power website who claim they will be The Next Google; Only Better. I've never seen any traffic coming from them and I'm quite sceptical about them becoming better than any of the four search engines mentioned above.
Update 2:
I've been analysing the traffic to several sites for some time now, and it seems that for reasonable small sites, 100 unique human visitors a day (=visitors that I cannot identify as being not human). About 52% of the generated traffic is by automated processes.
60% of all automated visitors is not reading robots.txt, 40% (21% of total traffic)
does request robots.txt. (this includes Ask, Google, Microsoft, and Yahoo!)
So my thinking is, If I block all the well behaved crawlers that do not seem to generate any value for me, I could reduce the bandwidth use and server load by around 12% - 17%.
The internet is a publishing mechanism. If you want to whitelist your site, you're against the grain, but that's fine.
Do you want to whitelist your site?
Bear in mind that badly behaved bots which ignore robots.txt aren't affected anyway (obviously), and well behaved bots are probably there for a good reason, it's just that that's opaque to you.
Whilst other sites that crawl your sites might not be sending any content your way, its possible that they themselves are being indexed by google et al, and so adding to your page rank, blocking them from your site might affect this.
Is there any reason not to?
Do you want to be left out of something which could be including your site which you have no knowledge of and is indirectly bringing a lot of content your way.
If some strange crawlers are hammering your site and eating your bandwidth you may want to, but it is quite possible that such crawlers wouldn’t honour your robots.txt either.
Examine your log files and see what crawlers you have and what proportion of your bandwidth they are eating. There may be more direct ways to block traffic which is bombarding your site.
This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory.
My only worry is that you may miss the next big thing.
There was a long period where AltaVista was the search engine. Possibly even more than Google is now. (there was no bing, or Ask, and Yahoo was a directory, rather than a search engine as such). Sites that blocked all but Altavista back then would have never seen traffic from Google, and therefore never known how popular it was getting, unless they heard about it from another source, which might have put them at a considerable disadvantage for a while.
Pagerank tends to be biased towards older sites. You don't want to appear newer than you are because you were blocking access via robots.txt for no reason. These guys: http://www.dotnetdotcom.org/ may be completely useless now, but maybe in 5 years time, the fact that you weren't in their index now will count against you in the next big search engine.

Getting Good Google PageRank [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
In SEO people talk a lot about Google PageRank. It's kind of a catch 22 because until your site is actually big and you don't really need search engines as much, it's unlikely that big sites will link to you and increase your PageRank!
I've been told that it's easiest to simply get a couple high quality links to point to a site to raise it's PageRank. I've also been told that there are certain Open Directories like dmoz.org that Google pays special attention to (since they're human managed links). Can anyone speak to the validity of this or suggest another site/technique to increase a site's PageRank?
Have great content
Nothing helps your google rank more than having content or offering a service people are interested in. If your web site is better than the competition and solves a real need you will naturally generate more traffic and inbound links.
Keep your content fresh
Use friendly url's that contain keywords
Good: http://cars.com/products/cars/ford/focus/
Bad: http://cars.com/p?id=1232
Make sure the page title is relevant and well constructed
For example: Buy A House In France :. Property Purchasing in France
Use a domain name that describes your site
Good: http://cars.com/
Bad: http://somerandomunrelateddomainname.com/
Example
Type car into Google, out of the top 5 links all 4 have car in the domain: http://www.google.co.uk/search?q=car
Make it accessible
Make sure people can read your content. This includes a variety of different audiences
People with disabilities: Sight, motor, cognitive disabilities etc..
Search bots
In particular make sure search bots can read every single relevant page on your site. Quite often search bots get blocked by the use of javascript to link between pages or the use of frames / flash / silverlight. One easy way to do this is have a site map page that gives access to the whole site, dividing it into categories / sub categories etc..
Down level browsers
Submit your site map automatically
Most search engines allow you to submit a list of pages on your site including when they were last updated.
Google: https://www.google.com/webmasters/tools/docs/en/about.html
Inbound links
Generate as much buzz about your website as possible, to increase the likely hood of people linking to you. Blog / podcast about your website if appropriate. List it in online directories (if appropriate).
References
Google Search Engine Ranking Factors, by an SEO company
Creating a Google-friendly site: Best practices
Wikipedia - Search engine optimization
Good content.
Update it often.
Read and digest everything at Creating a Google-friendly site: Best practices.
Be active on the web. Comment in blogs, correspond genuinely with people, in email, im, twitter.
I'm not too sure about the domain name. Wikipedia? What does that mean? Mozilla? What word is that? Google? Was a typo. Yahoo? Sounds like that chocolate drink Yoohoo.
Trying to keyword the domain name shoehorns you anyway. And it can be construed as a SEO technique in the future (if it isn't already!)
Answer all email. Answer blog comments. Be nice and helpful.
Go watch garyvee's Better Than Zero. That'll motivate you.
If it's appropriate, having a blog is a good way of keeping content fresh, especially if you post often. A CMS would be handy too, as it reduces the friction of updating. The best way would be user-generated content, as other people make your site bigger and updated, and they may well link to their content from their other sites.
Google doesn't want you to have to engineer your site specifically to get a good PageRank. Having popular content and a well designed website should naturally get you the results you want.
A easy trick is to use
Google webmaster tool https://www.google.com/webmasters/tools
And you can generate a sitemap using http://www.xml-sitemaps.com/
Then, don't miss to use www.google.com/analytics/
And be careful, most SEO guides are not correct, playing fair is not always the good approach. For example,everyone says that spamming .edu sites is bad and ineffective but it is effective.