This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
We have a sitemap for our site http://www.appsamuck.com/
The sitemap is here http://www.appsamuck.com/sitemap.xml
But Google seems to hate it. My question is why? I'm just staring at it now saying to myself it looks right. Am I missing something?
3 Paths don't match
We've detected that you submitted your Sitemap using a URL path that doesn't include the www prefix (for instance, http://example.com/sitemap.xml). However, the URLs listed inside your Sitemap do use the www prefix (for instance, http://www.example.com/myfile.htm). Help Help
URL:
Problem detected on: http://www.appsamuck.com/
Oct 15, 2008
I just typed a huge response and FF crashed and I lost it I hate it when that happens!!
Basically its possible to have two sites with different content, one running under www. and one without the www a bit like a subdomain. Because of this when you submitted your sitemap google sees its on the www site (http://www.appsamuck.com/sitemap.xml) but all the urls in your sitemap do not contain the www, therefore google is wondering if the sitemap is actually for another site the non www section. Usually these two deliver the same content but not always, so google is saying hang on you put the sitemap at www, but all your pages are on a non www domain whats that about!!
The best thing to do is stick to one or the other, are you advertising the www or non www? Whichever you are using (and I suggest the www version), submit your sitemap with www and make sure all your urls in your sitemap have www in them. That way google wont throw a fit. Also sticking to one may be slightly better for SEO.
As Nick suggested above, its also a good idea to let google know which one you prefer through the preferred domain option. I would set this option
Display URLs as www.appsamuck.com (for both www.appsamuck.com and appsamuck.com)
At least google will know that your talking about the same site then.
As for the sitemap, well there are some issues with that too.
Firstly as I pointed out about its missing the www from each URL.
Secondly you are missing an xml declaration etc for the top of the file. YOu need something like this
print("code sample");<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
Like Diodeus above suggested you really should add the other fields in such as priority etc.
Here is a quick go I have done for you (note it follows on from the above as I have opened the urlset tag above and it closes at the bottom of this set of code)
print("code sample");
<url>
<loc>http://www.appsamuck.com/</loc>
<priority>1.00</priority>
<lastmod>2008-10-17T03:01:05+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/index.html</loc>
<priority>0.80</priority>
<lastmod>2008-10-17T03:01:05+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/blog/</loc>
<priority>0.80</priority>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/about.html</loc>
<priority>0.80</priority>
<lastmod>2008-10-16T00:00:32+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/contact.html</loc>
<priority>0.80</priority>
<lastmod>2008-10-16T00:00:33+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/iphonesdkdev.html</loc>
<priority>0.80</priority>
<lastmod>2008-10-14T05:41:03+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/day16.html</loc>
<priority>0.80</priority>
<lastmod>2008-10-17T03:13:21+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/day15.html</loc>
<priority>0.80</priority>
<lastmod>2008-10-16T15:58:57+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/day14.html</loc>
<priority>0.80</priority>
<lastmod>2008-10-15T16:58:06+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.appsamuck.com/day13.html</loc>
<priority>0.80</priority>
<lastmod>2008-10-13T17:52:08+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>
</urlset>
Its not a full list im not going to do all the work for you :)
There are also some good online tools that will create sitemaps for you, they crawl the site and build it, just google xml-sitemaps and you should find some, there are some good free ones. Also if their spider cannot find your content its a flag that google probably cannot either,so it has a dual purpose.
Hope that helps :)
Paul
This could have to do with your preferred domain setting. If your sitemap has www's in it, but you submitted the site without the www, then it could cause the confusion. What I did for my sites was to submit it with the wwww in the sitemap, and make sure I submitted to Google in Webmaster tools the same way.
Then you can go in and set the "Preferred Domain" in the Tools area for your site. From there, you can have Google only link to the non www version if you want.
I've encountered similar problems. Just resubmit the same map. Often the warnings go away.
Try adding the other fields: <lastmod></lastmod>, <changefreq></changefreq>, <priority></priority>. Your site map looks correct.
Also, make sure the status of your resubmitted map is not "pending". Google sometimes takes hours to getting around to processing your files.
I found a similiar problem today. What I did was recreate site without the www. Google usually suggest you to create your site as htttp://www.yoursitename.com but you can also enter htttp:// yoursitename. com and the verify that you are the administrator. it workd well for me. Hope this helps.
Related
I have multi-site in WordPress and have a setup like this
domain.com (main version for Australia)
domain.com/us
domain.com/eu
...
I want if Australian users to search on google, which should only show the main URL domain.com and if someone in USA and Canada searches, only domain.com/us should appear in google search results, similarly, all Europeans, should be able to see domain.com/eu version only.
What can be the best optimal approach?
Thanks.
Currently, I'm using a redirection plugin which I think blocks all the URLs to crawl on google. So, figured out that maybe the robots.txt file could fix the issue.
I am creating multiple sitemap files for my website. The issue is that my sitemap files are located on a different file server from my website.
For example, I have a website by domain, www.example.com, however my sitemap index file and the other sitemap files reside on www.filestack.com.
My sitemap index file will look like:
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
<sitemap>
<loc>
https://www.filestack.com/sitemap1.xml
</loc>
</sitemap>
Though my sitemap1.xml will be:
<url>
<loc>
https://www.example.com/test
</loc>
<lastmod>2017-09-04</lastmod>
<changefreq>weekly</changefreq>
</url>
Is it possible to add links to do such a thing and how?
See Sitemaps & Cross Submits.
You have to provide a robots.txt at https://www.example.com/robots.txt which
links to the external sitemap:
Sitemap: https://www.filestack.com/sitemap1.xml
(This sitemap may only contain URLs from https://www.example.com/.)
You can use either XML sitemap or HTML sitemap as per Matt Cutts says. It's not mandatory that you must use both sitemaps. Though you can't submit an HTML sitemap to search engines, spiders can crawl your HTML sitemap and crawl pages deeper into your site. But you can not use XML sitemap that is on the different server.
In my website i use jquery to dynamically change the primary contents of a div(my primary content) so that pages are not reloading when someone presses a link but it adds content to the div.
Google searches for links in my site and finds only #smth and it does not index the pages.What should i do so that Google will index my other pages?
Any thoughts?
You can add a sitemap.xml file using the Sitemaps protocol to the root of your website (or another location specified in robots.txt). The Sitemaps protocol allows you to inform search engines about URLs on a website that are available for crawling (wiki).
An example sitemap (from referenced wiki above) looks like this:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://example.com/</loc>
<lastmod>2006-11-18</lastmod>
<changefreq>daily</changefreq>
<priority>0.8</priority>
</url>
</urlset>
The crawler of a search engine will visit your sitemap.xml and index the locations specified.
I found out that the answer is to add ! in front of your hash urls and configure the server to send a snapshot of the page to google more info here
I have a php dynamic wallpaper site http://www.fondolandia.com , it's been online for a year, recently I've submited the page to a sitemap builder site, and within the sitemap file i found out that some links point to the same image in different resolutions, example
<url>
<loc>http://www.fondolandia.com/portal/display/51_palacio-europa1356656199/1600x1200</loc>
<changefreq>daily</changefreq>
<priority>0.64</priority>
</url>
<url>
<loc>http://www.fondolandia.com/portal/display/51_palacio-europa1356656199/1280x960</loc>
<changefreq>daily</changefreq>
<priority>0.64</priority>
</url>
<url>
<loc>http://www.fondolandia.com/portal/display/51_palacio-europa1356656199/1024x768</loc>
<changefreq>daily</changefreq>
<priority>0.64</priority>
</url>
<url>
<loc>http://www.fondolandia.com/portal/display/51_palacio-europa1356656199/800x600</loc>
<changefreq>daily</changefreq>
<priority>0.64</priority>
</url>
Actually 4 links to the same image but in different resolutions, should i delete those links from the sitemap, being that another link points to an overview of the same image.
<url>
<loc>http://www.fondolandia.com/portal/fondo/51_palacio-europa1356656199</loc>
<changefreq>daily</changefreq>
<priority>0.80</priority>
</url>
While doing a site:url i see many links to old images that have been deleted from the site and appear listed on the search results, should google solve this once it crawls my site or should i do something ?
thanks in advance
To answer your first question, the duplicate links in the sitemap are OK, but each page should have a canonical link to tell google that they are really the same thing.
As for the deleted pages in search results, google should solve this as well when it re-crawls the pages as long as your site is returning an appropriate status code (e.g. a 404 or a 301 redirect to some other page)
A few days ago we replaced our web site with an updated version. The original site's content was migrated to http://backup.example.com. Search engines do not know about the old site, and I do not want them to know.
While we were in the process of updating our site, Google crawled the old version.
Now when using Google to search for our web site, we get results for both the new and old sites (e.g., http://www.example.com and http://backup.example.com).
Here are my questions:
Can I update the backup site content with the new content? Then we can get rid all of old content. My concern is that Google will lower our page ranking due to duplicate content.
If I prevent the old site from being accessed, how long will it take for the information to clear out of Google's search results?
Can I use google disallow to block Google from the old web site.
You should probably put a robots.txt file in your backup site and tell robots not to crawl it at all. Google will obey the restrictions though not all crawlers will. You might want to check out the options available to you at Google's WebMaster Central. Ask Google and see if they will remove the errant links for you from their data.
you can always use robot.txt on backup.* site to disallow google to index it.
More info here: link text
Are the URL formats consistent enough between the backup and current site that you could redirect a given page on the backup site to its equivalent on the current one? If so you could do so, having the backup site send 301 Permanent Redirects to each of the equivalent pages on the site you actually want indexed. The redirecting pages should drop out of the index (after how much time, I do not know).
If not, definitely look into robots.txt as Zepplock mentioned. After setting the robots.txt you can expedite removal from Google's index with their Webmaster Tools
Also you can make a rule in your scripts to redirect with header 301 each page to new one
Robots.txt is a good suggestion but...Google doesn't always listen. Yea, that's right, they don't always listen.
So, disallow all spiders but....also put this in your header
<meta name="robots" content="noindex, nofollow, noarchive" />
It's better to be safe than sorry. Meta commands are like yelling at Google "I DONT WANT YOU TO DO THIS TO THIS PAGE". :)
Do both, save yourself some pain. :)
I suggest you to either add no index meta tag in all old page or just disallow by robots.txt. Best way to just blocked the by robots.txt. One thing more add the sitemap in new site and submit it in webmaster that improve your new website indexing.
Password protect your webpages or directories that you don't want web spiders to crawl/index by putting password protecting code in the .htaccess file (if present in your website's root directory on the server or create a new one and upload it).
The web spiders will never know that password and hence won't be able to index the protected directories or web pages.
you can block any particular urls in webmasters check once...even you can block using robots.txt....remove sitemap for your old backup site and put noindex no follow tag for all of your old backup pages...i too handled this situation for one of my client............