I have disallowed everything for 10 days [closed] - seo

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
Due to an update error, I put in prod a robots.txt file that was intended for a test server. Result, the prod ended up with this robots.txt :
User-Agent: *
Disallow: /
That was 10 days ago and I now have more than 7000 URLS blocked Error (Submitted URL blocked by robots.txt) or Warning (Indexed through blocked byt robots.txt).
Yesterday, of course, I corrected the robots.txt file.
What can I do to speed up the correction by Google or any other search engine?

You could use the robots.txt test feature. https://www.google.com/webmasters/tools/robots-testing-tool
Once the robots.txt test has passed, click the "Submit" button and a popup window should appear. and then click option #3 "Submit" button again --
Ask Google to update
Submit a request to let Google know your robots.txt file has been updated.
Other then that, I think you'll have to wait for Googlebot to crawl the site again.
Best of luck :).

Related

prevent google to index dynamic error pages (none 404) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
There are some none 404 error pages on my website. what is the best way to stop google from indexing them?
option 1
header("HTTP/1.0 410 Gone");
what if the content is not gone? for example: the article does not exist. or wrong parameter has been caught
option 2
<meta name="robots" content="noindex" />
does it only affect one page or the whole domain?
option 3
using 404 which will make some other problems and I would like to avoid.
robot.txt
this option will not work since the error will depend on the database and is not static.
Best practice is to do a 301 redirect to similar content on your site if content is removed.
To stop Google indexing certain areas of your site use robots.txt
UPDATE: If you send a 200 OK and add the robots meta tag (Option 2 in your question) - this should do what you want.
One way to prevent google bots to index something is using robots files:
User-agent: googlebot
Disallow: /mypage.html
Disallow: /mp3/
This way you can manually disable single pages or entire directories.

Canonical Link Element for Dynamic Pages ( rel="canonical") [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a stack system that passes page tokens in the URL. As well my pages are dynamically created content so I have one php page to access the content with parameters.
index.php?grade=7&page=astronomy&pageno=2&token=foo1
I understand the search indexing goal to be The goal is to have only one link per unique set of data on your website.
Bing has a way to specify specific parameters to ignore.
Google it seems uses rel="canonical" but is it possible to use this to tell Google to ignore the token parameter? My URL (without tokens) can be anything like:
index.php?grade=5&page=astronomy&pageno=2
index.php?grade=6&page=math&pageno=1
index.php?grade=7&page=chemistry&page2=combustion&pageno=4
If there is not a solution for Google... Other possible solutions:
If I provide a site map for each base page, I can supply base URLs but any crawing of that page's links will crate tokens on resulting pages. Plus I would have to constantly recreate the site map to cover new pages (e.g. 25 posts per page, post 26 is on page 2).
One idea I've had is to identify bots on page load (I do this already) and disable all tokens for bots. Since (I'm presuming) bots don't use session data between pages anyway, the back buttons and editing features are useless. Is it feasible (or is it crazy) to write custom code for bots?
Thanks for your thoughts.
You can use the Google Webmaster Tools to tell Google to ignore certain URL parameters.
This is covered on the Google Webmaster Help page.

Proper way to remove Page from website [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a product page in my website which I have added 3 years before.
Now the product production was stopped and the product page was removed from website.
What I did is I started displaying message in the product page telling that the production of the product got stopped.
when some one searches in google for that products the product page which was removed from site shows up first in google search.
The page rank for the product page is also high.
I don't want the removed product page to be shown at the top of search result.
What is the proper method to remove a page from website so that it gets depicted by what ever google have indexed in its table.
Thanks for the reply
Delete It
The proper way to remove a page from a site is to delete the actual file that is been returned to the user/bot when the page is requested. If the file is not on the webserver, any well configured webserver will return a 404 and the bot/spider will choose to remove that from the index in the next refresh.
Redirect It
If you want to keep the good "google juice" or SERP ranking the page has, probably due to any inbound links from external sites, you'd be best to set your websever to do a 302 redirect to a similar (updated product).
Keep and convert
However, if the page is doing so well that it ranks #1 for searches to the entire site, you need to use this to your advantage. Leave the bulk of the copy on the page the same, but highlight to the viewer that the product no longer exists and provide some helpful options to the user instead: tell them about a newer, better product, tell them why it's no longer available, tell them where they can go to get support if they already have the discontinued product.
I am completely agree with above suggestion and want to add just one point.
If you want to remove that page from Google Search Result; just login to Google webmaster tool (you must have verified that website in Google webmaster tool) and add that particular page for index removal request.
Google will de-index that page and it will be removed from Google search rankings.

Refresh Google Search Results for My Site [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
I set up a site with a template and the title was something they supplied as a default. When I searched for my site's title, it showed up in results, but it was with their default title. After changing it a couple days ago, my site still shows up with the default title instead of what I changed it to.
Is there any way I can force Google to update their information so the title I have now shows up in results instead of the default title?
This will refresh your website immediately:
From Web Master tools Menu -> Crawl -> Fetch as Google
Leave URL blank to fetch the homepage then click Fetch
Submit to Index button will appear beside the fetched result; click it then choose > URL and all linked pages > OK
Just wait, Google should normally revisit your site and update its informations. But if you are hurried, you can try the following steps :
Increase the crawl speed of your site in Google Webmaster Tools : http://support.google.com/webmasters/bin/answer.py?hl=en&answer=48620
Ping your website on service like http://pingomatic.com/
Submit if you have not yet or resubmit an updated sitemap of your website.
Fetching as Google works, as already suggested. However stage 2 should be - submit your sites to several large social bookmarking sites like digg, reddit, stumbleupon, etc etc. There are huge lists of these sites out there.
Google notices everything on these sites and it will speed up the re crawling process. You can keep track of when Google last cached your site (There is a big difference between crawling and caching) by going to.. cache:sitename.com

Would 401 Error be a good choice? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
On one of my sites have a lot of restricted pages which is only available to logged-in users, and for everyone else it outputs a default "you have to be logged in ... " view.
The problem is; a lot of these pages are listed on Google with the not-logged-in-view, and it looks pretty bad when 80% of the pages in the list have the same title and description/preview.
Would it be a good choice to, along with my default not-logged-in-view, send a 401 unauthorized header? And would this stop Google (and other engines) to index these pages?
Thanks!
(and if you have another (better?) solution I would love to hear about it!)
Use a robots.txt to tell search engines not to index the not logged in pages.
http://www.robotstxt.org/
Ex.
User-agent: *
Disallow: /error/notloggedin.html
401 Unauthorized is the response code for requests that requires user authentication. So this is exactly the response code you want and have to send. Status Code Definitions
EDIT: Your previous suggestion, response code 403, is for requests, where authentication makes no difference, eg. disabled directory browsing.
here are the status codes googlebot understands and recommends.
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40132
in your case an HTTP 403 would be the right one.