Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Lets say we have Twitter, and every profile needs to get indexed in search engines, how does Twitter handle their sitemap? Is there something like "regex" sitemap for domain or do they re-generate a sitemap for each user?
How does this work, for pages that you don't know, so dynamic pages? Look at Wikipedia for example, how do they make sure everything is indexed by Search Engines?
Most likely, they don't bother to do a sitemap.
For highly dynamic sites, a sitemap will not help that much. Google will index only some amount, and if everything canges before Google considers to revisit it, you don't gain much.
For slowly changing sites this is different. The sitemaps tells Google on the one hand, which sites exist that it maybe has not yet visited at all, and (more importantly), which site have not changed and thus do not need to be revisited.
But the sitemap.xml mechanism just does not scale up to huge and highly dynamic sites such as twitter.
Many systems uses dynamically generated site map.
You can upload any sitemap to Google via Webmaster Tools (the service is free of charge) - Optimization > Sitemaps. It does not have to be sitemap.xml; it can be JSP or ASPX page too.
Webmaster Tools allows you to upload many different sitemaps for a single website. However, I am not sure what is the maximum number of sitemaps.
Some crawlers support a Sitemap directive, allowing multiple Sitemaps in the same robots.txt in the form as follows:
Sitemap: http://www.yoursite.com/profiles-sitemap.xml
Sitemap: http://www.yoursite.com/sitemap_index.xml
EDIT
Microsoft website is a very good example:
The robots.txt file contains lots of sitemap entries. Example:
Sitemap: http://www.microsoft.com/en-us/sqlazure/sitemap.xml
Sitemap: http://www.microsoft.com/en-us/cloud/sitemap.xml
Sitemap: http://www.microsoft.com/en-us/server-cloud/sitemap.xml
Sitemap: http://www.microsoft.com/france/sitemap_index.xml
Sitemap: http://www.microsoft.com/fr/ca/sitemap.xml
Sitemap: http://www.microsoft.com/germany/kleinunternehmen/gsitemap.aspx
Sitemap: http://www.microsoft.com/germany/newsroom/sitemap.xml
As you can see, some sitemaps are static (XML) and some are dynamic (ASPX).
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I currently program a website which gives information about food products.
The way the website works is that there's a search engine -> the users search for the product they want to know something about -> the website shows all the products that they may want to see, and every product has his own page with all the information about it.
So my question is: how search engines, like google, will be able to find all the product pages?
Search engines use many different ways to find new pages. Most commonly their web crawlers follow (external as well as internal) hyperlinks.
While a typical informational website links to all available pages in its site-wide navigation (so web crawlers can reach all pages by following internal links), other websites don’t necessarily link to all their pages (maybe because you can only reach them via forms, or because it doesn’t make sense for them to provide all links, etc.).
To allow discovery/crawling of new pages of these sites, too, they can provide a site map. This is essentially just a page linking to all existing pages, but often with structured metadata that can help search engines.
So just make sure that all your pages are linked somehow. Either via "natural" internal links on your site, or by providing a sitemap (ideally following the sitemaps.org protocol), or both.
For questions about SEO advice (which is off-topic here on SO), see our sister site https://webmasters.stackexchange.com/.
Please add sitemap in your site for google crawling all pages easily and indexing properly.
also add xml sitemap
your website need SEO process.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
i want to optimize rank of my site in search engines especially Google . i have submitted a sitemap to Google ; after about a week i see that 170 page had been submitted but just one page has been indexed. is there something wrong with it?
It isn't certain that there is something wrong.
Google first reads your sitemap. It is reporting that it found 170 urls in your sitemap, and has queued them up to be considered.
A week later it has decided to add one page to its index. One of two things has happened: google has not gotten around to crawling ( that is reading ) and considering all the pages in your sitemap. Or Google has looked at your pages and decided not to add them to its index.
Look in webmaster tools under "google index", "index status", "advanced". Then select "ever crawled". It should show you how many URLs it crawled from your site. If they haven't been crawled yet, you may just have to wait.
If they have been crawled, and are not added to the index, consider improving your content - or try the "fetch as googlebot" feature to make sure that what you are sending to google is what you think. Sometimes things can be configured so they look good to users, but are not visible to googlebot - e.g. all your content is ajaxed or in flash or something.
Also make sure that you aren't disallowing google to crawl your site in robots.txt, and that you are allowing the pages to be indexed. ( check to make sure you do not have a "noindex" tag in your html ).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I've seen tutorials/articles discussing using Robots.txt. Is this still a necessary practice? Do we still need to use this technique?
Robots.txt file is not necessary but it is recommended for those who want to block few pages or folders on your website being crawled by search engine crawlers.
I agree with the above answer. Robot.txt file is used for blocking pages and folders from crawling by search engines. For eg. You can block the search engines from crawling and indexing the Session IDs created, which in rare cases could become a security threat! Other than this, I don't see much importance.
The way that a lot of the robots crawl through your site and rank your page has changed recently as well.
I believe for a short period of time the use of Robot.txt may have helped quite a bit, but no adays most other options you'll take in regards to SEO will have more of a positive impact than this little .txt file ever will.
Same goes for backlinks, they used to be far far more important than they are now for you getting ranked.
Robots.txt is not for indexing . its used to blocks the things that you don't want search engines to index
Robots.txt can help with indexation with large sites, if you use it to reveal an XML sitemap file.
Like this:
Sitemap: http://www.domain.com/sitemap.xml
Within the XML file, you can list up to 50,000 URLs for search engines to index. There are plugins for many content management systems that can generate and update these files automatically.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
What I want to achieve:
I have an online dictionary which works quite fine - but the crawling by search engines (especially Google) could be better.
So I would like to improve the internal linking structure on my website so that Google can easily find (almost) all pages of the dictionary.
What I know yet:
The number of internal links per page should not exceed 100. Search engines don't like pages containing masses of links - looks spammy. And a website is not to be designed for search engines but for the users. So the usability should not suffer from this optimization, best case would be if the usability does even increase.
My ideas for improving the internal linking structure so far:
on each dictionary entry page: link 25 similar words which could be mixed up
create an index: list of all dictionary entries (75 per page)
...
Can you help me to optimize the linking structure?
Thank you very much in advance!
You could link to synonyms and antonyms, which would be both user-friendly and crawler-friendly. But I think the biggest thing you could do to improve crawling, particularly by Google, would be to add a sitemap:
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Google has lots of information on Sitemaps and how to generate them on their webmaster help pages.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am using postbacks to perform paging on a large amount of data. Since I did not have a sitemap for google to read, there will be products that google will never know about due to the fact that google does not push any buttons.
I am doing cloaking to spit out all the products with no paging if the user-agent is that of a search engine. There may be some work arounds for situations like this which include hidden buttons to paged urls.
What about information you want indexed buy google but you want to charge for the content. Imagine that I have articles that I want users to be able to find in google, but when the user visits the page, only half the content is displayed and users will have to pay for the rest.
I have heard that google may blacklist you for cloaking. I am not being evil, just helpful. Does google recognize the intention?
Here is a FAQ by google on that topic. I suggest to use CSS to hide some content. For example just give links to your products as an alternative to your buttons and use display:none; on them. The layout stays intact and the search engines will find your pages. However most search engines will not find out about cloaking and other techniques, but maybe competitors will denigrate you. In any way: Don't risk it. Use sitemaps, use RSS feeds, use XML documents or even PDF files with links to offer your whole range of products. Good luck!
This is why Google supports a sitemap protocol. The sitemap file needs to render as XML, but can certainly be a code-generated file, so you can produce on-demand from the database. And then point to it from your robots.txt file, as well as telling Google about it explicitly from your Google Webmaster Console area.
Highly doubtful. If you are serving different content based on IP address or User-Agent from the same URL, it's cloaking, regardless of the intentions. How would a spider parse two sets of content and figure out the "intent"?
There is intense disagreement over whether "good" cloakers are even helping the user anyway.
Why not just add a sitemap?
I don't think G will recognize your intent, unfortunately. Have you considered creating a sitemap dynamically? http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40318