Robots.txt in my project root [closed] - seo

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I've seen tutorials/articles discussing using Robots.txt. Is this still a necessary practice? Do we still need to use this technique?

Robots.txt file is not necessary but it is recommended for those who want to block few pages or folders on your website being crawled by search engine crawlers.

I agree with the above answer. Robot.txt file is used for blocking pages and folders from crawling by search engines. For eg. You can block the search engines from crawling and indexing the Session IDs created, which in rare cases could become a security threat! Other than this, I don't see much importance.

The way that a lot of the robots crawl through your site and rank your page has changed recently as well.
I believe for a short period of time the use of Robot.txt may have helped quite a bit, but no adays most other options you'll take in regards to SEO will have more of a positive impact than this little .txt file ever will.
Same goes for backlinks, they used to be far far more important than they are now for you getting ranked.

Robots.txt is not for indexing . its used to blocks the things that you don't want search engines to index

Robots.txt can help with indexation with large sites, if you use it to reveal an XML sitemap file.
Like this:
Sitemap: http://www.domain.com/sitemap.xml
Within the XML file, you can list up to 50,000 URLs for search engines to index. There are plugins for many content management systems that can generate and update these files automatically.

Related

best way to allow search engine to crawl site [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Thanks for reading my question. I am building a site that will be listing products from each manufacturer. I'm planning to structure the URL as following variations:
www.mysite.com/manufacturer_name/product_name/product_id
www.mysite.com/product_name/product_id
www.mysite.com/manufacturer_name
There are millions of products and I want all the major search engine to crawl them. What is the best way to go about doing that?
Would simply submitting site to all the search engines be enough? I would assume if I submit the manufacturer page which lists out all the manufacturer name as links the search engine will click on each links and click on all the products displayed within each manufacturer links (I will have paging for products) so the search engine can keep crawling the site for more products within each manufacturer until it runs out of the page number.
Would that be sufficient to list out each product on the every search engine? or is there a new and better way to do this? May be there are new SEO tricks that I'm not aware of. I am hoping if you can point me to the right direction.
I've previously used robot.txt to tell search engines which pages to crawl and that seemed to work fine.
Thanks,
bad_at_coding
Submit an XML sitemap. The easiest way to do this is to link to it in your robots.txt file.
Sample robots.txt file:
Sitemap: http://example.com/sitemap_location.xml
See Submitting Sitemaps for more on this topic from Google

Improve dictionary's internal linking structure [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
What I want to achieve:
I have an online dictionary which works quite fine - but the crawling by search engines (especially Google) could be better.
So I would like to improve the internal linking structure on my website so that Google can easily find (almost) all pages of the dictionary.
What I know yet:
The number of internal links per page should not exceed 100. Search engines don't like pages containing masses of links - looks spammy. And a website is not to be designed for search engines but for the users. So the usability should not suffer from this optimization, best case would be if the usability does even increase.
My ideas for improving the internal linking structure so far:
on each dictionary entry page: link 25 similar words which could be mixed up
create an index: list of all dictionary entries (75 per page)
...
Can you help me to optimize the linking structure?
Thank you very much in advance!
You could link to synonyms and antonyms, which would be both user-friendly and crawler-friendly. But I think the biggest thing you could do to improve crawling, particularly by Google, would be to add a sitemap:
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Google has lots of information on Sitemaps and how to generate them on their webmaster help pages.

SEO : things to consider\implement for your website's content [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
lets say i have a website that i am developing...
the site may have wallpapers, question & answers, info (e.g imdb,wikipedia etcetera)
what do i need to do so that when some search engine analyzes a particular page of my website for particular, lets say 'XYZ', it finds 'XYZ', content it finds 'XYZ' content if it present in that page...
please i am new to this so pardon my non-techy jargon...
The most important tips in SEO revolve around what not to do:
Keep Java and Flash as minimal as is possible, web crawlers can't parse them. Javascript can accomplish the vast majority of Flash-like animations, but it's generally best to avoid them altogether.
Avoid using images to replace text or headings. Remember that any text in images won't be parsed. If necessary, there are SEO-friendly ways of replacing text with images, but any time you have text not visible to the user, you risk the crawler thinking your trying to cheat the system.
Don't try to be too clever. The best way to optimize your search results is to have quality content which engages your audience. Be wary of anyone who claims they can improve your results artificially; Google is usually smarter than they are.
Search engines (like Google) usually use the content in <h1> tags to find out the content of your page and determine how relevant your page is to that content by the number of sites that link to your page.

How should google crawl my blog? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I was wondering how (or if) I should guide Googlebot through my blog. Should I only allow visiting pages with single entries or should it also crawl the main page (which also has full entries)? My concern is that the main page changes when I add a new post and google keeps the old version for some time. I also find directing people to the main page annoying - you have to look through all the post before you find the one you're interested in. So what is the proper way to solve this issue?
Why not submit a sitemap with the appropriate <changefreq> tags -- if you set that to "always" for the homepage, the crawler will know that your homepage is very volatile (and you can have accurate change freq for other URLs too, of course). You can also give a lower priority to your homepage and a higher one to the pages you prefer to see higher in the index.
I do not recommend telling crawlers to avoid indexing your homepage completely, as that would throw away any link juice you might be getting from links to it from other sites -- tweaking change freq and priority seems preferable.
Make a sitemap.xml and regenerate it periodically. Check out Google Webmaster Tools.

SEO blacklisting for cloaking [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am using postbacks to perform paging on a large amount of data. Since I did not have a sitemap for google to read, there will be products that google will never know about due to the fact that google does not push any buttons.
I am doing cloaking to spit out all the products with no paging if the user-agent is that of a search engine. There may be some work arounds for situations like this which include hidden buttons to paged urls.
What about information you want indexed buy google but you want to charge for the content. Imagine that I have articles that I want users to be able to find in google, but when the user visits the page, only half the content is displayed and users will have to pay for the rest.
I have heard that google may blacklist you for cloaking. I am not being evil, just helpful. Does google recognize the intention?
Here is a FAQ by google on that topic. I suggest to use CSS to hide some content. For example just give links to your products as an alternative to your buttons and use display:none; on them. The layout stays intact and the search engines will find your pages. However most search engines will not find out about cloaking and other techniques, but maybe competitors will denigrate you. In any way: Don't risk it. Use sitemaps, use RSS feeds, use XML documents or even PDF files with links to offer your whole range of products. Good luck!
This is why Google supports a sitemap protocol. The sitemap file needs to render as XML, but can certainly be a code-generated file, so you can produce on-demand from the database. And then point to it from your robots.txt file, as well as telling Google about it explicitly from your Google Webmaster Console area.
Highly doubtful. If you are serving different content based on IP address or User-Agent from the same URL, it's cloaking, regardless of the intentions. How would a spider parse two sets of content and figure out the "intent"?
There is intense disagreement over whether "good" cloakers are even helping the user anyway.
Why not just add a sitemap?
I don't think G will recognize your intent, unfortunately. Have you considered creating a sitemap dynamically? http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40318