Recently I've removed some products and categories from my Magento store and that generated some many 404 errors as the pages where in the research index and now they doesn't exist anymore.
I was thinking about developing/using a module that takes the request to the store when they should give 404 and use keywords from the request URL to build a search query on the website so the customers doesn't get stop by a dead link.
But the question is:
will that kill my SEO?
How does Google, for instance, couple with the 404 error suppression?
As anyone else encountered this problem and tried something like this?
Since the operation will take quite some time, I would like some feedback before going into this road.
As now, I only know that redirecting 404 error to the homepage or another page is bad as it keeps dead links alive, but redirecting with a criteria will have the same "zombifing" effect?
In their SEO guide Google recommends building a nice custom 404 page for your case (do not forget to return 404 status code).
Abstractly quoting: "A good custom 404 page will help people find the information they're looking for, as well as providing other helpful content and encouraging them to explore your site further."
Google's recommendations about 404 pages are available here.
Also do not forget to check their starter SEO guide in the "Best Practices" chapter.
I would just try to follow the recommendations as close as possible.
Good luck!
As solution is already mentioned over here i would like to add few words to this,
one doesn't need to be worried about their numbers of 404 pages shown in webmaster tools because it is natural that you create and delete you webpages. All you need to do is set a proper custom 404 pages so when user lands on pages that is no longer available they can easily navigate to the similar stuff they would be desiring from your website.
This will increase good user experience and Google loves those sites which caters their visitors in best manner.
Related
Shopify is quite restrictive about the ways that you can structure directories. For example all pages must have a url which looks like "my-store.com/pages/my-page".
Whilst there is no way around this in Shopify, I considered a workaround which would work like this.
Use javascript to check the URL queried when displaying the 404 page.
If URL queried = "my-url" connect to Wordpress Rest or graph QL API, query and then render desired content on the page.
For example, my-site.com/blog would return a 404 error, however javascript would run a function to get content when the URL ends in "/blog".
Although this would work from a technical point of view, I understand the server would still be giving a 404 error and this probably has wider implications? To what extent is this the case and is this an unviable solution?
A really interesting idea.
The biggest issue I see will be SEO, since the URLS will still points to the 404 page and you won't be able to show the proper content with liquid, all of the pages will pull the 404 content and show as 404 pages in the google search.
That said I don't see any other major issues that will prevent you to use this with JS. It depends really how many type of pages will require this logic and how the JS logic is written, but as an idea I really like the possibility of it.
I will probably not recommend it to a client that wants a SEO optimized site, but for a personal one it seems like an interesting idea.
I'm currently in the process of writing a REST API and this question always seems to popup.
I've always just added a description, quick links to docs, server time etc, but see now (after looking around a bit) that a simple redirect to the API docs would be even better.
My question is what would be the accepted norm to have as the root - '/' - "homepage" of your API?
I've been looking at a few implementations:
Facebook: Just gives a error of "Unsupported get request.";
Twitter: Shows an actual 404 page;
StackOverflow: Redirect to quick "usage" page.
After looking at those it's clear everyone is doing it differently.
In the bigger picture this is of little significance but would be interesting to see what the "RESTfull" way of doing it (if there is one) might be.
Others have had the same question and as you discovered yourself everyone is doing it their own way. There is a move in this direction to somehow standardize it, so see if you find this draft useful:
Home Documents for HTTP APIs aka JSON Home.
I've give this much thought and right now I either return a 404 page, a health status page, a dummy page or redirect to another page, mostly likely on within the organization.
An API homepage isn't something everyone should be looking at and believe me, it can be found. There are more people like me that love to inspect the browser and see how a website is performing.
I have a problem with lots of 404 errors on one site. I figured out that these errors are happening because google is trying to find pages that no longer exist.
Now I need to tell Google not to index those pages again.
I found some solutions on the internet about using robots.txt file. But this is not a site that I built. I just need to fix those errors.The thing is, those pages are generated. They do not physically exist in that form. So I can not add anything in php code.
And I am not quite sure how to add those to robot.txt.
When I just write:
*User-agent: *
noindex: /objekten/anzeigen/haus_antea/5-0000001575*
and hit test button in webmaster tools
I get this from Googlebot:
Allowed
Detected as a directory; specific files may have different restrictions
And I do not know what that means.
I am new in this kind of stuff so please write your answer as simpler as it can be.
Sorry for bad english.
I think Google will remove such pages that return a 404 error automatically from its index. Google will not display these pages in the results. So you don't need to care about that.
Just make sure, that these pages are not linked from other pages. If so, Google may try to index them from time to time. In this case you should return a 301 error (permanently moved) and redirect to the correct url. Google will follow the 301 errors and use the redirected url instead.
Robots.txt is only necessary, if you want to remove pages that are already in the search results. But I think pages with error code 404 will not be displayed there anyway.
I am working on a public-facing web site which serves up articles for people to read. After some time, articles become stale and we remove them from the site. My question is this: what is the best way to handle the situation when a search engine visits a URL corresponding to a removed article? Should the app respond with a permanent redirect (301 Moved Permanently) to a "article not found" page, or is there a better way to handle this?
Edit
These articles are actually not removed, but they are "unpublished" - and they may return to the "published" state eventually.
If the article is remove you should respond with 410 Gone. Your error page can still have some useful info on it as long as the response code is correct. This indicates that the page has been intentionally removed and is not just "not found" (as would happen with a bad url).
You might consider keeping the content up, with some sort of indicator to the person reading it that the content is stale. Then you could also include more relevant content on the page or links to more relevant content.
This might not be appropriate for your situation, or could be more work than its worth, but it may be a good way not to waste potential traffic.
I feel like the 410 Gone response would be the appropriate response, however, you'd basically be telling the search engine "we don't have this content anymore, so stop linking here" - which isn't advantageous to your SEO strategy.
Well, if you want to be all proper about it, it should redirect not to an article not found, but to an "article removed" page. Because article not found suggests that it should be a 404.
My gut tells me that you should probably have an article removed page, but in practice many sites will simply do a 301 redirect to the home page.
I think the idea there is that any "link juice" from the old article will then be transferred to the home page rather than a generic "article removed" page. I get the feeling though that search engines might not look too kindly on that practice.
Couple of months ago, we revamped our web site. We adopted totally new site structure, specifically merged several pages into one. Everything looks charming.
However, there are lots of dead links which produce a large number of 404 errors.
So how can I do with it? If I leave it alone, could it bite back someday, say eating up my pr?
One basic option is using 301 redirect, however it is almost impossible considering the number of it.
So is there any workaround? Thanks for your considering!
301 is an excellent idea.
Consider you can take advantage of global configurations to map a group of pages. You don't necessary need to write one redirect for every 404.
For example, if you removed the http://example/foo folder, using Apache you can write the following configuration
RedirectMatch 301 ^/foo/(.*)$ http://example.org/
to catch all 404 generated from the removed folder.
Also, consider to redirect selectively. You can use Google Webmaster Tools to check which 404 URI are receiving the highest number inbound links and create a redirect configuration only for those.
Chances are the number of redirection rules you need to create will decrease drastically.
301 is definitely the correct route to go down to preserve your page rank.
Alternatively, you could catch 404 errors and redirect either to a "This content has moved" type page, or your home page. If you do this I would still recommend cherry picking busy pages and important content and setting up 301s for these - then you can preserve PR on your most important content, and deal gracefully with the rest of the dead links...
I agree with the other posts - using mod_rewrite you can remap URLs and return 301s. Note - it's possible to call an external program or database with mod_rewrite - so there's a lot you can do there.
If your new and old site don't follow any remapable pattern, then I suggest you make your 404 page as useful as possible. Google has a widget which will suggest the page the user is probably looking for. This works well once Google has spidered your new site.
Along with the other 301 suggestions, you could also split the requested url string into a search string routing to your default search page (if you have one) passing those parameters automatically to the search.
For example, if someone tries to visit http://example.com/2009/01/new-years-was-a-blast, this would route to your search page and automatically search for "new years was a blast" returning the best result for those key words and hopefully your most relevant article.