can't submit sitemap to Google webmaster tools - seo

I’m having trouble submitting a sitemap to Google webmaster tools. This is something I have done many times and have never had any problems before. I get this error:
Description:
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.
Example:
URL restricted by robots.txt
Truly the robots text is correct, and so is the sitemap.
Here is the code on the robots.txt
User-agent: *
Disallow:

As the error message says, these may be your problem:
Your sitemap is not created as Google wants. Try another tool.
Be sure that you've uploaded your sitemap to a public directory
Check your sitemap by typing to browser yoursite.com/sitemap.xml
Check your sitemap file's permissions (7-5-5 was working if I'm not
wrong)
You can allow indexing everything by robots.txt Try this code (another version of yours):
User-agent: *
Allow: /

Try to submit your sitemap xml to Google with command line. Using curl command: curl -i -H "Accept: application/json" -H "Content-Type: application/json" -X GET http://google.com/ping?sitemap=https://yourdomain.x0/sitemap.xml
[replace uri with your http complete url/to/sitemap.xml file]
Google official docs : https://support.google.com/webmasters/answer/183669

Related

Trac - get programmatically wiki page content behind authentication

I'd like to get the content of a wiki page from my trac (1.0.9) using a script.
My Trac is served through apache httpd and adopts a Basic AuthType.
So I tried to use wget as follows
wget http://my/trac/wiki/MyWikiPage?format=txt --user=<THISISME> --ask-password --auth-no-challenge -q -O -
but I get a 403 error.
HTTP request sent, awaiting response... 403 Forbidden
Is there something wrong? Or in other words, is thre a way to simply fetch remotely a wiki page from Trac (taking authentication into account)? Thx
You could install XmlRpcPlugin and use one of the supported libraries, such as xmlrpclib in Python, to fetch the page.

robots.txt for disallowing Google to not to follow specific URLs

I have included robots.txt in the root directory of my application in order to tell Google bots that do not follow this http://www.test.com/example.aspx?id=x&date=10/12/2014 URL or the URL with the same extension but different query string values. For that I have used following piece of code:
User-agent: *
Disallow:
Disallow: /example.aspx/
But I found in the Webmaster Tools that Google is still following this page and has chached a number of URLs with the specified extension, is it something that query strings are creating problem because as far as I know that Google do not bother about query string, but just in case. Am I using it correctly or something else also needs to be done in order to achieve the task.
Your instruction is wrong :
Disallow: /example.aspx/
This is blocking all URLs in the direcory /example.aspx/
If you want to block all URLs of the file /example.aspx, use this instruction:
Disallow: /example.aspx
You can test it with Google Webmaster Tools.

Google Webmaster Tools won't index my site

I discovered that my robots.txt file on my site is causing Google's Webmaster Tools to not index my site properly. I tried and removed just about everything from the file (using WordPress so it will still generate it) but I keep getting the same error in their panel,
"Severe status problems were found on the site. - Check site status". And when I click on the site status it tells me that robots.txt is blocking my main page, which is not.
http://saturate.co/robots.txt - ideas?
Edit: Marking this as solved as it seems Webmaster Tools now accepted the site and is showing no errors.
You should try adding Disallow: to the end of your file. So it looks like this:
User-agent: *
Disallow:

How to verify if the sitemap generated indexes return 200 code?

I have generated the Sitemap indexes for Google. The only issue which I have is that how to verify that all the indexes(URL's) which got generated work or not. Based on the guide it says something like this:
you write a script to test each URL in the sitemap against your application
server and confirm that each link returns an HTTP 200 (OK) code. Broken links may indicate a mismatch
between the URL formatting configuration of the Sitemap Generator
I just wanted to see if somebody had such experience on how to write such script?
google webmaster tools will report you within "site configuration -> sitemaps" any HTTP errors and redirects (pretty much everything that is not an HTTP 200), additionally in the "Diagnostics -> Crawl Errors -> in Sitemaps" is another view of errors that occured while crawling urls that were listed within the sitemaps.
if that is not what you want, i would just do some logfile grep-ing. (grep for "googlebot" and an identifier of the urls that you listed within your sitemaps)
you could propably write your own crawler to pre-check if your sites return an HTTP 200, but well, if it returns an HTTP 200 for you now, does not mean it will return an HTTP 200 for googlebot next week / month / year. so i recommend to stick with google webmaster tools and logfile analysis (visualized with i.e.: munin, cacti, ...)
How did you create the sitemap? I would think most sitemap tools would only include URLs that responded with "200 OK"
Do note that some websites mess up and always respond with response 200 instead of e.g. 404 for invalid URLs. Such websites have trouble ahead :)

500 Internal Server error when using curl on an aspx page with SSL

I'm trying to access an aspx webpage using curl but it returns 500 internal server error. It doesn't require any credentials or POST variables I know of, but I think I'm missing something, because when I try to access it from my browser, it does work. The page is just a form with two fields to be filled and POSTed.
curl -L https://my.website.com
Do I need to make any changes to my curl script?
ps. I don't have access to the server or the server's logs
Some things to try and ideas:
trace your manual access with e.g. Fiddler or httpfox or firebug. You might see something more elaborate than you have see already (like a 301/302 response, I assume that you added -L to handle such a possibility?
as it works when you check out the page via a browser, the page might attempt a referrer check and fail miserably because there is no referrer (hence the 500, a server-side error). The dump you created in 1. will show you what to insert with curl's -e option.