I have a web site with content from 2001 and I need to remake the sitemap. Question arises: if the old urls have already been indexed do I need to add them again (the same urls) to the sitemap even if not haven't changed?
for example:
the sitemap have this url and is generate always from 2006
http://www.semana.com/encuestas/encuesta/le-cree-encuestas-electorales/146255-3
Is it necessary to generate it again if it's already indexed?
If the URLs are active and important from ranking perspective, it is better to have it in sitemap even if it is indexed already. It helps to provide metadata information like when page was last changed, how often it is changed etc.
If the urls are active currently, you should add them to the new sitemap because it will help in gaining rankings.
You should add them to the new sitemap even if they already been indexed.
Related
We have got a deprecated domain www.deprecateddomain.com. Specific fact is that we have got reverse proxy working and redirecting all requests from this domain to the new one www.newdomain.com.
The problem is when you type "deprecateddomain.com" in google search, there is a link to www.deprecateddomain.com in search results besides results with "newdomain.com". It means that there is such entries in google index. Our customer don't want to see links to old site.
We were suggested to create fake robots.txt with Disallow: / directive for www.deprecateddomain.com and reverse proxy rules to get this file from some directory. But after investigation the subject I started hesitating that it will help. Will it remove entries with old domain from index?
Why not to just create the request in search console to remove www.deprecateddomain.com from index? In my opinion it might help.
Anyway, I'm novice in this question. Could you give me advice what to do?
Google takes time to remove old/obsolete entries from its ranking, especially on low visited or low value pages. You have no control on it. Google needs to revisit each page to see the redirection you have implemented.
So DO NOT implement a disallow on the old website, because it will make the problem worse. Bots won't be able to crawls those pages and see the redirection you have implemented. So they will stay longer in the rankings.
You must also make sure you implement a proper 301 redirection (i.e. a permanent one, not a temporary) for all pages of the old website. Else, some pages may stay in the ranking for quite some time.
If some pages are obsolete and should be deleted rather than redirected, return a 404 for them. Google will remove them quickly from its index.
Recently my site got hacked and it has been restored. Thousands of spam URLs were indexed by google. I do have a google webmaster account and I can update and submit my sitemap. But how do I tell google to strictly only index the URLs inside my sitemap? I want to prevent any new spam urls created by hackers from being indexed.
Any parameter inside the sitemap.xml that I can use to do this?
Your sitemap should only include the new URLs and google will crawl and index only them.
If you have removed the old spammed URLs and they are in 404(not found) status, Google will remove them from the index (albeit quite slowly, it could take even 1-2 months).
If you need to remove those URLs from being displayed in the index there's a section about it the webmaster guide: https://support.google.com/webmasters/answer/1663419?hl=en
I hope my question is not too irrelevant to stackoverflow.
this is my website: http://www.rader.my
It's a car information website. The content is dynamic. Therefore, google crawler could not find all the cars specification pages in my website.
I created a sitemap with all my cars URL in it (for instance: http://www.rader.my/Details.php?ID=13 is for one car). I know I haven't made any mistake in my .xml file format and structure. But after submission, google only indexed one URL which is my index.php.
I have also read about rel="canonical". But I don't think in my case I should use such a thing since all my pages ARE different with different content but only the structure is the same.
Is there anything that I missed? Why google doesn't accept my URLs even though the contents are different? What can I do to fix this?
Thanks and regards,
Amin
I have a similar type of site. Google is good about figuring out dynamic sites. They'll crawl the pages and figure out the unique content as time goes on. Give it time.
You should do all the standard things:
Make sure each page has a unique H1 tag.
Make sure each page has substantial unique content
Unique keywords and description tags aren't as useful as they used to be but they can't hurt.
Cross-link internally. Create category pages that include links to all of one manufacturer and have each of the pages of that manufacturer link back to 'similar' pages.
Get links to your pages. Nothing helps getting indexed like external authority.
I have a site with a huge number (well, thousands or tens of thousands) of dynamic URLs, plus a few static URLs.
In theory, due to some cunning SEO linkage on the homepage, it should be possible for any spider to crawl the site and discover all the dynamic urls via a spider-friendly search.
Given this, do I really need to worry about expending the effort to produce a dynamic sitemap index that includes all these URLs, or should I simply ensure that all the main static URLs are in there?
That actual way in which I would generate this isn't a concern - I'm just questioning the need to actually do it.
Indeed, the Google FAQ (and yes, I know they're not the only search engine!) about this recommends including URLs in the sitemap that might not be discovered by a crawl; based on that fact, then, if every URL in your site is reachable from another, surely the only URL you really need as a baseline in your sitemap for a well-designed site is your homepage?
If there is more than one way to get to a page, you should pick a main URL for each page that contains the actual content, and put those URLs in the site map. I.e. the site map should contain links to the actual content, not every possible URL to get to the same content.
Also consider putting canonical meta tags in the pages with this main URL, so that spiders can recognise a page even if it's reachable through different dynamical URLs.
Spiders only spend a limited time searching each site, so you should make it easy to find the actual content as soon as possible. A site map can be a great help as you can use it to point directly to the actual content so that the spider doesn't have to look for it.
We have had a pretty good results using these methods, and Google now indexes 80-90% of our dynamic content. :)
In an SO podcast they talked about limitations on the number of links you could include/submit in a sitemap (around 500 per page with a page limit based on pagerank?) and how you would need to break them over multiple pages.
Given this, do I really need to worry
about expending the effort to produce
a dynamic sitemap index that includes
all these URLs, or should I simply
ensure that all the main static URLs
are in there?
I was under the impression that the sitemap wasn't necessarily about disconnected pages but rather about increasing the crawling of existing pages. In my experience when a site includes a sitemap, minor pages even when prominently linked to are more likely to appear on Google results. Depending on the pagerank/inbound links etc. of your site this may be less of an issue.
I've submitted sitemap.xml files to google webmaster tools and it says that i has all of the page in total but under "indexed" it says "--"? How long does it take for Google to start indexing? This was a couple of days ago.
A Sitemap is a way for webmasters to help Search Engines to easily discover more pages from their websites. A Sitemap should be considered an aid, not a duty. Even if you submit a Sitemap there's no guarantee that the URLs listed in the Sitemap will be read or included in Search Engine indexes.
Usually it takes from a few hours to some day to be indexed.
Quotes from a Google source
"We don't guarantee that we'll crawl
or index all of your URLs. For
example, we won't crawl or index image
URLs contained in your Sitemap.
However, we use the data in your
Sitemap to learn about your site's
structure, which will allow us to
improve our crawler schedule and do a
better job crawling your site in the
future. In most cases, webmasters will
benefit from Sitemap submission, and
in no case will you be penalized for
it."
Mod Note: An attribution link was originally here, but the site linked to no longer exists
It usually takes up to two weeks to be indexed. Just give it some time :)
In short: it depends.
If your website is new, Google will have to crawl and index it first. This can take time and depends on many factors (see the Google FAQs on indexing).
If the website is not new, it's possible that you are submitting URLs in the Sitemap file which do not match the URLs that were crawled and indexed. In this case, the indexed URL count is usually not zero, but this could theoretically be the case if the URLs in the Sitemap file are drastically wrong (eg with session-ids).
Finally, if you are submitting a non-web Sitemap file (eg for Google Video or Google News), it's normal for the indexed URL count to be zero: the count only applies for URLs within the normal web-search results.
Without knowing the URL of the Sitemap file it's impossible to say for sure which of the above applies.