How to tell search engines to only index pages inside my sitemap? - seo

Recently my site got hacked and it has been restored. Thousands of spam URLs were indexed by google. I do have a google webmaster account and I can update and submit my sitemap. But how do I tell google to strictly only index the URLs inside my sitemap? I want to prevent any new spam urls created by hackers from being indexed.
Any parameter inside the sitemap.xml that I can use to do this?

Your sitemap should only include the new URLs and google will crawl and index only them.
If you have removed the old spammed URLs and they are in 404(not found) status, Google will remove them from the index (albeit quite slowly, it could take even 1-2 months).
If you need to remove those URLs from being displayed in the index there's a section about it the webmaster guide: https://support.google.com/webmasters/answer/1663419?hl=en

Related

Is it necessary to generate sitemaps for old indexed urls?

I have a web site with content from 2001 and I need to remake the sitemap. Question arises: if the old urls have already been indexed do I need to add them again (the same urls) to the sitemap even if not haven't changed?
for example:
the sitemap have this url and is generate always from 2006
http://www.semana.com/encuestas/encuesta/le-cree-encuestas-electorales/146255-3
Is it necessary to generate it again if it's already indexed?
If the URLs are active and important from ranking perspective, it is better to have it in sitemap even if it is indexed already. It helps to provide metadata information like when page was last changed, how often it is changed etc.
If the urls are active currently, you should add them to the new sitemap because it will help in gaining rankings.
You should add them to the new sitemap even if they already been indexed.

How to cleanup URLs indexed by google

so I made a mistake in my application which caused thousands of URLs to be indexed by google with the session id appended. What should I do to remove all those session id's from the google index? I'd like to only have the page indexed minus the session id.
You can fix this with an edit to your robots.txt. Also, there's a webmasters stackexchange -- consider checking there for your answer in the future, I know they have an SEO tag.
robots.txt stops crawling, it doesnt stop urls being indexed.
You could use a canonical tag to consolidate the urls to their parent url and/or parameter handling in webmaster tools.

How do I get Google to index changes made to my website's keywords?

I have made changes to my website's keywords, description, and title, but Google is not indexing the new keyword. Instead, I have found that Google is indexing the older one.
How can I get Google to index my site using the new keywords that I have added?
Periods between crawling a page vary a lot across pages. A post to SO will be crawled and indexed by Google in seconds. Your personal page that hasn't changed content in 20 years might not even be crawled as much as once a year.
Submitting a sitemap to the webmaster tools will likely re-crawl your website to validate your sitemap. You could use this to speed up the re-crawling.
However, as #Charles noted, the keywords meta-tag is mostly ignored by Google. So it sounds like you're wasting your time.

sitemap generation strategy

i have a huge site, with more than 5 millions url.
We have already pagerank 7/10. The problem is that because of 5 millions url and because we add/remove new urls daily (we add ± 900 and we remove ± 300) google is not fast enough to index all of them. We have a huge and intense perl module to generate this sitemap that normally is composed by 6 sitemap files. For sure google is not faster enough to add all urls, specially because normally we recreate all those sitemaps daily and submit to google. My question is: what should be a better approach? Should i really care to send 5 millions urls to google daily even if i know that google wont be able to process? Or should i send just permalinks that wont change and the google crawler will found the rest, but at least i will have a concise index at google (today i have less than 200 from 5.000.000 urls indexed)
What is the point of having a lot of indexed sites which are removed right away?
Temporary pages are worthless for search engines and their users after they are disposed. So I would go for letting search engine crawlers decides whether a page is worth indexing. Just tell them the URLs that will persist... and implement some list pages (if there aren't any yet), which allow your pages to be crawled easier.
Note below: 6 sitemap files for 5m URLs? AFAIK, a sitemap file may no contain more than 50k URLs.
When URLs change you should watch out that you work properly with 301 status (permanent redirect).
Edit (refinement):
Still you should try that your URL patterns are getting stable. You can use 301 for redirects, but maintaining a lot of redirect rules is cumbersome.
Why don't you just compare your sitemap to the previous one each time, and only send google the URLs that have changed!

Google webmaster tools: Sitemaps not indexing?

I've submitted sitemap.xml files to google webmaster tools and it says that i has all of the page in total but under "indexed" it says "--"? How long does it take for Google to start indexing? This was a couple of days ago.
A Sitemap is a way for webmasters to help Search Engines to easily discover more pages from their websites. A Sitemap should be considered an aid, not a duty. Even if you submit a Sitemap there's no guarantee that the URLs listed in the Sitemap will be read or included in Search Engine indexes.
Usually it takes from a few hours to some day to be indexed.
Quotes from a Google source
"We don't guarantee that we'll crawl
or index all of your URLs. For
example, we won't crawl or index image
URLs contained in your Sitemap.
However, we use the data in your
Sitemap to learn about your site's
structure, which will allow us to
improve our crawler schedule and do a
better job crawling your site in the
future. In most cases, webmasters will
benefit from Sitemap submission, and
in no case will you be penalized for
it."
Mod Note: An attribution link was originally here, but the site linked to no longer exists
It usually takes up to two weeks to be indexed. Just give it some time :)
In short: it depends.
If your website is new, Google will have to crawl and index it first. This can take time and depends on many factors (see the Google FAQs on indexing).
If the website is not new, it's possible that you are submitting URLs in the Sitemap file which do not match the URLs that were crawled and indexed. In this case, the indexed URL count is usually not zero, but this could theoretically be the case if the URLs in the Sitemap file are drastically wrong (eg with session-ids).
Finally, if you are submitting a non-web Sitemap file (eg for Google Video or Google News), it's normal for the indexed URL count to be zero: the count only applies for URLs within the normal web-search results.
Without knowing the URL of the Sitemap file it's impossible to say for sure which of the above applies.