we were unable to access your site's robots.txt file - seo

I verified my site using google webmaster. I have made my website in Wordpress and I also added robots.txt. Now google is showing green tick mark on DNS and Server Connectivity but and yellow warning mark on robots.txt fetch..
My robots.txt file is look like this:
robots file
Also when I run robots.txt test in webmaster it gives allowed result.. My site is not even being searched in google..
when i submit my site in webmaster that time its not showing error but now its showing.
Please help to slove this problem.

If you made your website with wordpress
It will automatically generate an robots.txt file for you
Why you Did not use it ?!

Related

Has google changed crawlers in a way that could lead to the 404 growth?

Since yesterday i'm seeing growing number of 404 errors on our website. It is very strange because we don't have those pages which are reported as missing. Also we didn't released any code changes on that day.
Google Webmaster tool is reporting those errors, but when I look into the pages which are linking to the missing urls - there is no a such links. Could this be a Google Crawlers issue?
404 URL:
http://www.justanswer.co.uk/boat/home-improvement/homework/writing
Linked from:
http://www.justanswer.co.uk/boat/home-improvement/homework
http://www.justanswer.co.uk/boat/home-improvement/hvac
It seems that You have CORS issues doing cross-domain javascript.
https://www.facebook.com/connect/ping?client_id=172525162793917&domain=www.justanswer.co.uk&origin=1&redirect_uri=http%3A%2F%2Fstaticxx.facebook.com%2Fconnect%2Fxd_arbiter.php%3Fversion%3D42%23cb%3Df316e5bca883b5%26domain%3Dwww.justanswer.co.uk%26origin%3Dhttp%253A%252F%252Fwww.justanswer.co.uk%252Ff50e0366c05c14%26relation%3Dparent&response_type=token%2Csigned_request%2Ccode&sdk=joey
is saying that
Given URL is not allowed by the Application configuration: One or more of the given URLs is not allowed by the App's settings. It must match the Website URL or Canvas URL, or the domain must be a subdomain of one of the App's domains.

PHP Web page (code) access security

How do I prevent my page say 'index.php' and all other web pages in different folders on server to be accessed anyone by typing the path in address bar of browser like www.kkweb.com/web/index.php. Kindly help.
If you have php correctly installed and running, they get the parsed site. I.e., they can open the website, but cannot read your source code. If you want to avoid even that, you can implement access protection. Google .htpasswd and .htaccess for that.

How to remove unwanted URL from google cache

We bought a new domain from HugeDomains.com before a month and made it live last week.
Before we move live, the advertisement published by HugeDomains.com got cached in search engines.
Now we need to remove that cached URL from all search engines.
Following is the Pattern of URL got cached, it's just a query string getting passed
http://www.example.com/?fp=ah1QKL6n%2FlECnlCZX2M7prGsvtbv8ddXendjKdEvTBtzHaEkYE%2BEk37MD1iDIPnimmKBVn7jZKj%2BPGqRUxNQzA%3D%3D&prvtof=ytNnOdijWVo6UL0CLJYkUNs043cNT%2BNtJQ5d5VD69Ac%3D&poru=RLg1S8TlJRc59ObVEdjqkbBOZjhk%2FIf%2BH8W1DtjVOk5VRbieT62uHl%2FGfuWk4d%2FnOfDQwYDvqLza3nG76SMxZA%3D%3D&
I have used Disallow in Robots.txt to remove that but its not working, following will be the code
Disallow: /*?fp=
Disallow:
/?fp=ah1QKL6n%2FlECnlCZX2M7prGsvtbv8ddXendjKdEvTBtzHaEkYE%2BEk37MD1iDIPnimmKBVn7jZKj%2BPGqRUxNQzA%3D%3D&prvtof=ytNnOdijWVo6UL0CLJYkUNs043cNT%2BNtJQ5d5VD69Ac%3D&poru=RLg1S8TlJRc59ObVEdjqkbBOZjhk%2FIf%2BH8W1DtjVOk5VRbieT62uHl%2FGfuWk4d%2FnOfDQwYDvqLza3nG76SMxZA%3D%3D&
I even enabled a 302 Redirect for this query string fp= to my home page
Please let me know a way to resolve this.
Thanks in advance.
I wouldn't do this with robots.txt.
Just wait. I think the most search engines will recognize that your website is new so they will crawl it again in near future.
Otherwise you can create a google-webmaster account and send your url to google to crawl it again.
EDIT: You're also able to disallow url-parameter in webmaster tools.
Robots.txt disallow should do it, but another good way is to return a 410 Gone result, then google will stop indexing it since it'll see this page has disappeared.
Edit
Looks like I was wrong about Robots.txt, but right about 410 Gone response:
Reference
You have to do a 301 permanent redirect for Google to drop old indexed page. If you do 302, Google will try to crawl that url once in a while as it is temporary. Ignoring query parameters does not help in clearing the cache, it just sends signal saying the url with query param is same as the one without it. I guess that is not what you want. My suggestion would be to do a 301 permanent redirect if you encounter query param fb.
Right now i doubt google handles 404 and 410 lot differently. So you can do a 410 as well.
Google webmaster can help you in removing outdated/ cache content from Google search results
Copy your domain Cached URL
Browse to https://www.google.com/webmasters/tools/removals
Follow Request instructions.
The cache can be removed in a few numbers of hours. Google search engine crawls to new/current URL contents.

Google Webmaster Tools won't index my site

I discovered that my robots.txt file on my site is causing Google's Webmaster Tools to not index my site properly. I tried and removed just about everything from the file (using WordPress so it will still generate it) but I keep getting the same error in their panel,
"Severe status problems were found on the site. - Check site status". And when I click on the site status it tells me that robots.txt is blocking my main page, which is not.
http://saturate.co/robots.txt - ideas?
Edit: Marking this as solved as it seems Webmaster Tools now accepted the site and is showing no errors.
You should try adding Disallow: to the end of your file. So it looks like this:
User-agent: *
Disallow:

Google found my backup web site. What can I do about It?

A few days ago we replaced our web site with an updated version. The original site's content was migrated to http://backup.example.com. Search engines do not know about the old site, and I do not want them to know.
While we were in the process of updating our site, Google crawled the old version.
Now when using Google to search for our web site, we get results for both the new and old sites (e.g., http://www.example.com and http://backup.example.com).
Here are my questions:
Can I update the backup site content with the new content? Then we can get rid all of old content. My concern is that Google will lower our page ranking due to duplicate content.
If I prevent the old site from being accessed, how long will it take for the information to clear out of Google's search results?
Can I use google disallow to block Google from the old web site.
You should probably put a robots.txt file in your backup site and tell robots not to crawl it at all. Google will obey the restrictions though not all crawlers will. You might want to check out the options available to you at Google's WebMaster Central. Ask Google and see if they will remove the errant links for you from their data.
you can always use robot.txt on backup.* site to disallow google to index it.
More info here: link text
Are the URL formats consistent enough between the backup and current site that you could redirect a given page on the backup site to its equivalent on the current one? If so you could do so, having the backup site send 301 Permanent Redirects to each of the equivalent pages on the site you actually want indexed. The redirecting pages should drop out of the index (after how much time, I do not know).
If not, definitely look into robots.txt as Zepplock mentioned. After setting the robots.txt you can expedite removal from Google's index with their Webmaster Tools
Also you can make a rule in your scripts to redirect with header 301 each page to new one
Robots.txt is a good suggestion but...Google doesn't always listen. Yea, that's right, they don't always listen.
So, disallow all spiders but....also put this in your header
<meta name="robots" content="noindex, nofollow, noarchive" />
It's better to be safe than sorry. Meta commands are like yelling at Google "I DONT WANT YOU TO DO THIS TO THIS PAGE". :)
Do both, save yourself some pain. :)
I suggest you to either add no index meta tag in all old page or just disallow by robots.txt. Best way to just blocked the by robots.txt. One thing more add the sitemap in new site and submit it in webmaster that improve your new website indexing.
Password protect your webpages or directories that you don't want web spiders to crawl/index by putting password protecting code in the .htaccess file (if present in your website's root directory on the server or create a new one and upload it).
The web spiders will never know that password and hence won't be able to index the protected directories or web pages.
you can block any particular urls in webmasters check once...even you can block using robots.txt....remove sitemap for your old backup site and put noindex no follow tag for all of your old backup pages...i too handled this situation for one of my client............