Indexed but blocked by robots.txt on my english cart - prestashop

i have a small issue checking on google search console. It display only 1 error
Indexed, though blocked by robots.txt
but is on my website in the english version (the italian version is ok). How can i fix it?
This is the link that is blocked https://www.cebsas.it/en/cart
Many thanks
Nicola

Its because of the "Disallow: /*controller=cart" line and this is true.
You have to remove Cart page from indexed pages

Related

Google search console fails to fetch sitemaps | "Sitemap could not be read"

I have generated a sitemap from online generators, it seems to be working and even i tested it on old google search console sitemap testor and it works. but when i submit it in both versions it just displays error message.
This is a known bug. See this Google support answer.
In my case, it's the sitemap that had a syntax error.
You should open sitemaps in Firefox, it will tell you if you have a syntax error.
Your sitemap domain address might have changed. If it is wordpress use yoast plugin, where search console will automatically consider sitemap.xml
I had the same problem and the solution was very simple, just put the full path to your sitemap.
Where the console asks 'add new sitemap', instead of writing /sitemap.xml, write the full path, such as https://example.com/sitemap.xml.
That should fix the problem.
Using the yoast SEO plugin which built out 10 sitemaps, the index got red the first time and only one of the sub-sitemaps did. I manually visited the other sitemaps (likely they took to long to respond I thought) and deleted the sitemap on google search console and re-uploaded. All were read that time.
I had this issue and it was because I didn't set the content-type to application/xml
This sitemap validator notified me of the issue: https://www.xml-sitemaps.com/validate-xml-sitemap.html
Enter the full URL of your sitemap, e.g., https://example.com/sitemap.xml. Also, ensure your sitemap name does not include numbers and symbols.

Main site url removed from google despite re-submitting it

I have a site www.megalim.co.il,
recently due to a version upgrade, I discovered that i have a robots.txt file that disallowed all Search engines.. my google ranking dropped , and I couldn't find the site's main page anymore
I changed the robots.txt file to one that allows all, and now the web master toolkit doesn't
write me that the site is blocked from google.
I did this about 5 days ago, I've also fetched as google
and submitted www.megalim.co.il to index with all related pages
but still, when i search this: "site:www.megalim.co.il"
i get a bunch of results from my site , but not the main page!
what else should I look for?
thanks!
Igal
You don't see your main page because of your old robots.txt. 5 days is nothing for Google bots to re-index all your website.
Just wait a little and you will see your website fully indexed in Google results.
Issue sorted out..
embarassing...
apparently we (inexplicably) had a nofollow, noindex meta tag..
after a day we start reappearing in google
thanks :)

Site is not indexed

I have submitted the sitemap to google webmaster tools however it is not getting indexed.
It has been almost a month since it has been submitted. The webmaster tools say "No data available." on almost every section.
As far as I can tell there is nothing blocking google from indexing, robots.txt as you can see is not blocking anything, no meta tags blocking crawling.
Here is a screen shot of the webmaster tools for the sitemap:
http://www.2shared.com/photo/4HLbsOte/webmaster.html
I am not sure why it says Processed May 3 2012 when I submitted it earlier last month. But nothing has been indexed and looks like there are no issues with it either.
Any ideas?
Thanks for the help.
SOLVED Edit:
looks like I had X-Robots-Tag: noindex, nofollow in my http header.
In the sitemap section of webmaster tools, does it say that there are any errors with the sitemap you submitted?
Also, how many pages are in that sitemap? if there are very few pages, you are likely to see very low indexing because google usually doesn't index all of your pages
had X-Robots-Tag: noindex, nofollow in my http header.
There may be some issues when you submit sitemap of your website to google webmaster tool. You should try again to add sitemap. Just delete previous sitemap and add it again. I hope it will work now.

How to tell search engines NOT to look at this specific link?

Suppose I have a link in the page My Messages, which on click will display an alert message "You must login to access my messages".
May be it's better to just not display this link when user is not logged in, but I want "My Messages" to be visible even if user is not logged in.
I think this link is user-friendly, but for search engines they will get redirected to login page, which I think is.. bad for SEO? or is it fine?
I thought of keeping My Messages displayed as normal text (not as a link), then wrap it with a link tag by using javascript/jquery, is this solution good or bad? other ideas please? Thank you.
Try to create a robots.txt file and write:
User-agent: *
Disallow: /mymessages
This will keep SEO bots out of that folder
Use a robots.txt file to tell search engines which pages they should not index.
Using nofollow to block access to a page is erroneous - this is not what nofollow is for. This attribute was designed to allow to you place a link in page without conferring any weight or endorsement of the link. In other words, it's not a link that search engines should regard as significant for page-ranking algorithms. It does not mean "do not index this page" - just "don't follow this particular link to that page"
Here's what Google have to say about nofollow
...However, the target pages may still appear in our index if other
sites link to them without using nofollow or if the URLs are submitted
to Google in a Sitemap. Also, it's important to note that other search
engines may handle nofollow in slightly different ways.
One way of keeping the URL from affecting your rank is setting the rel attribute of your link:
My Messages
Another option is robots.txt, that way you can disallow the bots from the URL entirely.
You might want to use robots.txt to exclude /mymessages. This will also prevent engines which have already visited /mymessages from visiting it again.
Alternatively, add the following to the top of the /mymessages script:
<meta name="robots" content="noindex" />
If you want to tell search engines, not to follow a particular link , then use rel="nofollow".
It is a way to tell search engines and bots that don't follow this link.
Now,google will not crawl that link and does not transfer PageRank or anchor text across this link.

Remove deleted page from Google search results

So I have a website that I recently made changes to, and one of the changes was removing a page from the site. I deleted the page, it doesn't exist anymore.
However, when you search for my site, one of the results is the page that I deleted. People are clicking on the page and getting an error.
How do I remove that page from the search results?
Here is the solution
First get ur site on google webmaster. Then go to site configuration -- > crawler access --> remove url . Click on New removal request and add the page you want to remove and make sure you have added that page to the robots.txt of your site. Google will deindex the page within 24 hrs.
You simply wait for googles robots to find out that it doesn't exist anymore.
A trick that used to work is to upload a sitemap to google where you add the url to the deleted page and set it to top priority and that it changes every day. That way the google robots will prio that page and quicker find out that its not there anymore.
There might be other ways but none that are known to me.
You can remove specific pages using the webmaster tools I believe.
Yahoo Web tools offer a similar service as I understand it.
This information was correct the last time I tried to do this a little while ago.
You should go to https://www.google.com/webmasters/tools/removals and remove the pages which you want.