Block google from indexing some pages from site - indexing

I have a problem with lots of 404 errors on one site. I figured out that these errors are happening because google is trying to find pages that no longer exist.
Now I need to tell Google not to index those pages again.
I found some solutions on the internet about using robots.txt file. But this is not a site that I built. I just need to fix those errors.The thing is, those pages are generated. They do not physically exist in that form. So I can not add anything in php code.
And I am not quite sure how to add those to robot.txt.
When I just write:
*User-agent: *
noindex: /objekten/anzeigen/haus_antea/5-0000001575*
and hit test button in webmaster tools
I get this from Googlebot:
Allowed
Detected as a directory; specific files may have different restrictions
And I do not know what that means.
I am new in this kind of stuff so please write your answer as simpler as it can be.
Sorry for bad english.

I think Google will remove such pages that return a 404 error automatically from its index. Google will not display these pages in the results. So you don't need to care about that.
Just make sure, that these pages are not linked from other pages. If so, Google may try to index them from time to time. In this case you should return a 301 error (permanently moved) and redirect to the correct url. Google will follow the 301 errors and use the redirected url instead.
Robots.txt is only necessary, if you want to remove pages that are already in the search results. But I think pages with error code 404 will not be displayed there anyway.

Related

Using Shopify 404 page for content

Shopify is quite restrictive about the ways that you can structure directories. For example all pages must have a url which looks like "my-store.com/pages/my-page".
Whilst there is no way around this in Shopify, I considered a workaround which would work like this.
Use javascript to check the URL queried when displaying the 404 page.
If URL queried = "my-url" connect to Wordpress Rest or graph QL API, query and then render desired content on the page.
For example, my-site.com/blog would return a 404 error, however javascript would run a function to get content when the URL ends in "/blog".
Although this would work from a technical point of view, I understand the server would still be giving a 404 error and this probably has wider implications? To what extent is this the case and is this an unviable solution?
A really interesting idea.
The biggest issue I see will be SEO, since the URLS will still points to the 404 page and you won't be able to show the proper content with liquid, all of the pages will pull the 404 content and show as 404 pages in the google search.
That said I don't see any other major issues that will prevent you to use this with JS. It depends really how many type of pages will require this logic and how the JS logic is written, but as an idea I really like the possibility of it.
I will probably not recommend it to a client that wants a SEO optimized site, but for a personal one it seems like an interesting idea.

Deprecated domain in google index

We have got a deprecated domain www.deprecateddomain.com. Specific fact is that we have got reverse proxy working and redirecting all requests from this domain to the new one www.newdomain.com.
The problem is when you type "deprecateddomain.com" in google search, there is a link to www.deprecateddomain.com in search results besides results with "newdomain.com". It means that there is such entries in google index. Our customer don't want to see links to old site.
We were suggested to create fake robots.txt with Disallow: / directive for www.deprecateddomain.com and reverse proxy rules to get this file from some directory. But after investigation the subject I started hesitating that it will help. Will it remove entries with old domain from index?
Why not to just create the request in search console to remove www.deprecateddomain.com from index? In my opinion it might help.
Anyway, I'm novice in this question. Could you give me advice what to do?
Google takes time to remove old/obsolete entries from its ranking, especially on low visited or low value pages. You have no control on it. Google needs to revisit each page to see the redirection you have implemented.
So DO NOT implement a disallow on the old website, because it will make the problem worse. Bots won't be able to crawls those pages and see the redirection you have implemented. So they will stay longer in the rankings.
You must also make sure you implement a proper 301 redirection (i.e. a permanent one, not a temporary) for all pages of the old website. Else, some pages may stay in the ranking for quite some time.
If some pages are obsolete and should be deleted rather than redirected, return a 404 for them. Google will remove them quickly from its index.

SEO and 404 error redirection

Recently I've removed some products and categories from my Magento store and that generated some many 404 errors as the pages where in the research index and now they doesn't exist anymore.
I was thinking about developing/using a module that takes the request to the store when they should give 404 and use keywords from the request URL to build a search query on the website so the customers doesn't get stop by a dead link.
But the question is:
will that kill my SEO?
How does Google, for instance, couple with the 404 error suppression?
As anyone else encountered this problem and tried something like this?
Since the operation will take quite some time, I would like some feedback before going into this road.
As now, I only know that redirecting 404 error to the homepage or another page is bad as it keeps dead links alive, but redirecting with a criteria will have the same "zombifing" effect?
In their SEO guide Google recommends building a nice custom 404 page for your case (do not forget to return 404 status code).
Abstractly quoting: "A good custom 404 page will help people find the information they're looking for, as well as providing other helpful content and encouraging them to explore your site further."
Google's recommendations about 404 pages are available here.
Also do not forget to check their starter SEO guide in the "Best Practices" chapter.
I would just try to follow the recommendations as close as possible.
Good luck!
As solution is already mentioned over here i would like to add few words to this,
one doesn't need to be worried about their numbers of 404 pages shown in webmaster tools because it is natural that you create and delete you webpages. All you need to do is set a proper custom 404 pages so when user lands on pages that is no longer available they can easily navigate to the similar stuff they would be desiring from your website.
This will increase good user experience and Google loves those sites which caters their visitors in best manner.

Google, do not index YET

In the effort of building a live site on its actual live hosting platform is there a way to tell google to not YET index the website? I found the following:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
But would that tell them to never come back or would they simply see the noindex tag and then not list the results, then when it comes back to crawl again later and my site is good to go I would have the noindex removed and the site would then start getting indexed?
Sounds like you want to use a robots.txt file instead:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449&topic=2370588&ctx=topic
Update your robots.txt file when you want your content to be indexed.
You can use the robot.txt method.
You can specify which subpage could be spidered. And google comes back, checking the file before indexing. So you can delete the file later in order to get fully indexed.
More Information
About /robots.txt
Robots.txt File Generator
You can always change it. The way Google and other robots find your page is if it is linked to on another page. As long as it isn't linked to on another page, it won't be found. Also, once your site is up, chances are that it will be far back in the list of sites.

How Can I Deal With Those Dead Links After Revamping My Web Site?

Couple of months ago, we revamped our web site. We adopted totally new site structure, specifically merged several pages into one. Everything looks charming.
However, there are lots of dead links which produce a large number of 404 errors.
So how can I do with it? If I leave it alone, could it bite back someday, say eating up my pr?
One basic option is using 301 redirect, however it is almost impossible considering the number of it.
So is there any workaround? Thanks for your considering!
301 is an excellent idea.
Consider you can take advantage of global configurations to map a group of pages. You don't necessary need to write one redirect for every 404.
For example, if you removed the http://example/foo folder, using Apache you can write the following configuration
RedirectMatch 301 ^/foo/(.*)$ http://example.org/
to catch all 404 generated from the removed folder.
Also, consider to redirect selectively. You can use Google Webmaster Tools to check which 404 URI are receiving the highest number inbound links and create a redirect configuration only for those.
Chances are the number of redirection rules you need to create will decrease drastically.
301 is definitely the correct route to go down to preserve your page rank.
Alternatively, you could catch 404 errors and redirect either to a "This content has moved" type page, or your home page. If you do this I would still recommend cherry picking busy pages and important content and setting up 301s for these - then you can preserve PR on your most important content, and deal gracefully with the rest of the dead links...
I agree with the other posts - using mod_rewrite you can remap URLs and return 301s. Note - it's possible to call an external program or database with mod_rewrite - so there's a lot you can do there.
If your new and old site don't follow any remapable pattern, then I suggest you make your 404 page as useful as possible. Google has a widget which will suggest the page the user is probably looking for. This works well once Google has spidered your new site.
Along with the other 301 suggestions, you could also split the requested url string into a search string routing to your default search page (if you have one) passing those parameters automatically to the search.
For example, if someone tries to visit http://example.com/2009/01/new-years-was-a-blast, this would route to your search page and automatically search for "new years was a blast" returning the best result for those key words and hopefully your most relevant article.