Is it possible to help search engines by giving them a list of urls to crawl? It might be hard to make the site SEO friendly when using heavy AJAX logic. Let's say that the user chooses a category, then a sub-category and a product. It seems unnecessary to give categories and subcategories urls. But giving only products a url makes sense. When I see the url for the product, I can make the application navigate to that product. So, is it possible to use robots.txt or some other method to direct search engines to the urls I designate?
I am open to other suggestions if this somehow does not make sense.
Yes. What you're describing is called a sitemap -- it's a list of pages on your site which search engines can use to help them crawl your web site.
There are a couple ways of formatting a sitemap, but by far the easiest is to just list out all the URLs in a text file available on your web site -- one per line -- and reference it in robots.txt like so:
Sitemap: http://example.com/sitemap.txt
Here's Google's documentation on the topic: https://support.google.com/webmasters/answer/183668?hl=en
I am working on a small php script and i have some links like this
*-phones-*.html
* are variables i want to disallow google to index this kind of links using robots.txt, it is possible ?
You're not disallowing anything. robots.txt is just a set of guidelines for webcrawlers, who can choose to follow them or not.
Rude crawlers should of course be IP banned. But you can't avoid that the webcrawler might come across that page. Anyway, you can add it to your robots.txt and googles webcrawler might obey.
I've google a lot and read a lot of articles, but got mixed reactions.
I'm a little confused about which is a better option if I want a certain section of my site to be blocked from being indexed by Search Engines. Basically I make a lot of updates to my site and also design for clients, I don't want all the "test data" that I upload for previews to be indexed to avoid the duplicate content issue.
Should I use a sub-domain and block the whole sub-domain
or
Create a sub-directory and block it using robots.txt.
I'm new to web-designing and was a little insecure about using sub-domains (read somewhere that it's a little advanced procedure and even a tiny mistake could have big consequences, moreover Matt Cutts has also mentioned something similar (source):
"I’d recommend using sub directories until you start to feel pretty
confident with the architecture of your site. At that point, you’ll be
better equipped to make the right decision for your own site."
But on the other hand I'm hesitant on using robots.txt as well as anyone could access the file.
What are the pros and cons of both?
For now I am under the impression that Google treats both similarly and it would be best to go for a sub-directory with robots.txt, but I'd like a second opinion before "taking the plunge".
Either you ask bots not to index your content (→ robots.txt) or you lock everyone out (→ password protection).
For this decision it's not relevant whether you use a separate subdomain or a folder. You can use robots.txt or password protection for both. Note that the robots.txt always has to be put in the document root.
Using robots.txt gives no guaranty, it's only a polite request. Polite bots will honor it, others not. Human users will still be able to visit your "disallowed" pages. Even those bots that honor your robots.txt (e.g. Google) may still link to your "disallowed" content in their search (they won't index content, though).
Using a login mechanism protects your pages from all bots and visitors.
I have a site with a huge number (well, thousands or tens of thousands) of dynamic URLs, plus a few static URLs.
In theory, due to some cunning SEO linkage on the homepage, it should be possible for any spider to crawl the site and discover all the dynamic urls via a spider-friendly search.
Given this, do I really need to worry about expending the effort to produce a dynamic sitemap index that includes all these URLs, or should I simply ensure that all the main static URLs are in there?
That actual way in which I would generate this isn't a concern - I'm just questioning the need to actually do it.
Indeed, the Google FAQ (and yes, I know they're not the only search engine!) about this recommends including URLs in the sitemap that might not be discovered by a crawl; based on that fact, then, if every URL in your site is reachable from another, surely the only URL you really need as a baseline in your sitemap for a well-designed site is your homepage?
If there is more than one way to get to a page, you should pick a main URL for each page that contains the actual content, and put those URLs in the site map. I.e. the site map should contain links to the actual content, not every possible URL to get to the same content.
Also consider putting canonical meta tags in the pages with this main URL, so that spiders can recognise a page even if it's reachable through different dynamical URLs.
Spiders only spend a limited time searching each site, so you should make it easy to find the actual content as soon as possible. A site map can be a great help as you can use it to point directly to the actual content so that the spider doesn't have to look for it.
We have had a pretty good results using these methods, and Google now indexes 80-90% of our dynamic content. :)
In an SO podcast they talked about limitations on the number of links you could include/submit in a sitemap (around 500 per page with a page limit based on pagerank?) and how you would need to break them over multiple pages.
Given this, do I really need to worry
about expending the effort to produce
a dynamic sitemap index that includes
all these URLs, or should I simply
ensure that all the main static URLs
are in there?
I was under the impression that the sitemap wasn't necessarily about disconnected pages but rather about increasing the crawling of existing pages. In my experience when a site includes a sitemap, minor pages even when prominently linked to are more likely to appear on Google results. Depending on the pagerank/inbound links etc. of your site this may be less of an issue.
I have my main application site https://drchrono.com, and I have a blog sub-domain under http://blog.drchrono.com. I was told by some bloggers that the blog sub-domain of your site helps the pagerank of your main site. Does traffic to your blog sub-domain help the Google Pagerank of your site and count as traffic to your main site?
I don't think Google gives any special treatment to sub domains named "blog". If they did, that would be a wide open door for abuse, and they're smart enough to realize that.
At one time, I think there were advantages to putting your blog on a separate subdomain though. Links from your blog to your main site could help with your main site's page rank if your blog has a decent page rank.
However, it seems like that has changed. Here's an interesting post about setting up blog subdomains vs. folders. It seems like they are actually treated the same by Google now, although nobody but Google really knows for sure how they treat them.
With regard to traffic, your Google ranking is only incidentally related to the amount of traffic your site gets. Google rankings are based primarily on content and number & quality of incoming links, not on how much traffic you get. Which makes sense since Google really has no way of knowing how much traffic you get to your site other than perhaps the traffic they send there via Google searches.
Not directly, but...
I do not know if "blog" specifically helps the pagerank of your site in some special way - google guards its pagerank secrets fairly well. If you really wanted to find out, you would create two sites roughly the same content but one with blog in the domain name and one without. Index them and see if the pagerank settings are different. My gut instinct is - no.
It is known that google indexes the name of the site and it improves your chances of getting listed on the search results if the site name corresponds to the search terms. So, it would be reasonable to assume that (unless google specifically removed indexing of the word blog) that when someone searched for a main search term and "blog" the chances of your site showing up would be slightly higher.
For example, it should help searches for: drchrono blog.
By the way, google changes its algorithms all the time, so this is just speculation.
according to an article on hubspot.com
The search engines are treating subdomains more and more as just portions of the main website, so the SEO value for your blog is going to add to your main website domain. If you want your blog to be seen as part of your company, you should it this way (or the next way).
however they go on to say there isn't a big difference between blog.domain.com and domain.com/blog
you can read the full article here: hubspot article on blog domains
One thing using a sub-domain will help is your sites Alexa rank.
Alexa give rank to all pages using your main domain. If you use the Alexa Toolbar you I see all subdomains have the same rank as your main page. So hit's to your sub's will count toward your sites Alexa.
I don't think the subdomain will anything to the pagerank, but however, it might make content easier to find than in a folder.
Let's say you search for something on google, from your page, I could search for
domain:blog.drchrono.com someTopic or articleImLookingFor
Since it is a subdomain, I would guess it counts as traffic to the main site.
Personally, if I was to setup a blog, I would go for the subdomain and would probably set up a redirect from
http://drchrono.com/blog to
http://blog.drchrono.com
blog.domain.tld and www.domain.tld are not treated as unrelated sites, assuming they're handled by the same final ns authority. It has never been clear to me if pages are ranked entirely independently or if a reputation for a domain and hence it's subdomains figures into it beyond just being linked to.
But if I read your question differently, I'd say there's no difference in doing either:
I've tried setting up pages at both photos.domain.tld/stuffAboutPhotos and www.domain.tld/photos/stuffAboutPhotos for a month at a time. I found no noticeable difference between the search engine referral rates.
But then it's actually hard to do this independently of other factors.
Therefore I conclude that despite the human logic indicating that the domain is more important, there is no advantage to putting a keyword in the domain as opposed to the rest of the url, except to be sure it's clearly delimited (use slash, dash, or underscore in the rest of the url).
If Google has a shortlist of keywords that do rank better in a domain name than in the rest of the url, they're definitely not sharing it with anyone not wearing a Google campus dampened exploding collar.
Google treat a subdomain as a domain. If this wasn't true, then all those blogspot blogs would have had a higher SERPS.
With subdomains it is a bit easier as Google "knows" it is a "separate" site. With sub-directories it is tricky. Though, with sub-domains it is the same. Google would rank these ones anything between PR0 and PR3 in the past year, currently:
PR1: of-cour.se
Cheers!
Not really. Blogs do do some nice things to the SEO for your sites, but if they're inside the site it doesn't work the same.
A better option is have a completely separate domain that contains the blog (something like drchronoblog.com), and have lots of links from the blog site to the main site.
That way search engines see the links but do not make the connection between the blog and the main site, and thus it makes your page rank better.
It wont give your site higher priority just because you have a blog. subdomain.
But im sure more people will find your site if they search for blogs..´
And therefore more traffic´, more traffic, more visits though the search engines and so on..
So id say yes :)
Since PageRank is dealing with the rank on search engine. Let's make a little test:
https://www.google.com/search?q=blog
you may see that
example.com/blog
rank higher than
blog.example.com
This almost in the same figure for whatever domains.
However when it were possible, I will fight more to get blog.wordpress.com as it treated on any search engine as my own profile than a folder named wordpress.com/blog that for sure still belong to wordpress.com.
The only way a blog can help you as far as SEO depends on the content in your blog. Just having a blog isn't enough.