Difference between 404 and 410 error code - seo

I have read about many error code but I am little-bit confuse about Error code 404 and 410. I didn’t distinguish till now what these errors exactly pretend.

The 404 indicates, that the resource is not present at the given location, and it has probably never been (or server has no idea whether it has been).
The 410, on the other hand, indicates that resource is not present anymore, but it used to be there in the past. It's a useful hint for some clients such as search engines, spiders etc., because they can remove this resource from their indexes.
From the HTTP 1.1 original RFC 2616 section 10.4.11:
The 410 response is primarily intended to assist the task of web
maintenance by notifying the recipient that the resource is
intentionally unavailable and that the server owners desire that
remote links to that resource be removed. Such an event is common for
limited-time, promotional services and for resources belonging to
individuals no longer working at the server's site. It is not
necessary to mark all permanently unavailable resources as "gone" or
to keep the mark for any length of time -- that is left to the
discretion of the server owner.
Also, about the difference between the two:
This condition [the 410] is expected to be
considered permanent. Clients with link editing capabilities SHOULD
delete references to the Request-URI after user approval. If the
server does not know, or has no facility to determine, whether or not
the condition is permanent, the status code 404 (Not Found) SHOULD be
used instead.
It has been later rephrased in RFC 7231 section 6.5.4, but the meaning remains the same:
A 404 status code does not
indicate whether this lack of representation is temporary or
permanent; the 410 (Gone) status code is preferred over 404 if the
origin server knows, presumably through some configurable means, that
the condition is likely to be permanent.

Related

REST API: What HTTP return code for no data found? [duplicate]

This question already has answers here:
What is the proper REST response code for a valid request but an empty data?
(28 answers)
Closed 1 year ago.
If someone could please help settle this argument we might actually get this system finished LOL :^)
So, if you have a REST API.. for.. say.. returning patient details...
And you send in a request with a patient id...
But no patient with that patient id actually exists in the database..
What response should your API return?
1. a 404 ?
2. a 204 ?
3. a 200 with something in the body to indicate no patient found..
Thanks
Use a 404:
404 Not Found
The server can not find the requested resource. In the browser, this means the URL is not recognized. In an API, this can also mean
that the endpoint is valid but the resource itself does not exist.
Servers may also send this response instead of 403 to hide the
existence of a resource from an unauthorized client. This response
code is probably the most famous one due to its frequent occurrence on
the web.
From MDN Web docs https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
What response should your API return?
It Depends.
Status codes are metadata in the transfer of documents over a network domain. The status code communicates the semantics of the HTTP response to general purpose components. For instance, it's the status code that announces to a cache whether the message body of the response is a "representation of the resource" or instead a representation of an error situation.
Rows in your database are an implementation detail; as far as REST is concerned, there doesn't have to be a database.
What REST cares about is resources, and in this case whether or not the resource has a current representation. REST doesn't tell you what the resource model should be, or how it is implemented. What REST does tell you (via it's standardized messages constraint, which in this case means the HTTP standard) is how to describe what's happening in the resource model.
For example, if my resource is "things to do", and everything is done, then I would normally expect a GET request for "things to do" to return a 2xx status code with a representation announcing there is nothing to do (which could be a completely empty document, or it could be a web page with an empty list of items, or a JSON document.... you get the idea).
If instead the empty result set from the database indicates that there was a spelling error in the URI, then a 404 is appropriate.
It might help to consider a boring web server, and how retrieving an empty file differs from retrieving a file that doesn't exist.
But, as before, in some resource models it might make sense to return a "default" representation in the case where there is no file.
if you have a REST API.. for.. say.. returning patient details...
Is it reasonable in the resource model to have a document that says "we have no records for this patient"?
I'm not a specialist in the domain of medical documents, but it sounds pretty reasonable to me that we might get back a document with no information. "Here's a list of everything we've been told about this patient" and a blank list.
What response should your API return?
If you are returning a representation of an error - ie, a document that explains that the document someone asked for is missing, then you should use a 404 Not Found status code (along with other metadata indicating how long that response can be cached, etc).
If you are returning a document, you should use a 200 OK with a Content-Length header.
204 is specialized, and should not be used here. The key distinction between 204 and 200 with Content-Length 0 is the implications for navigation.

Blocking Requests using HTTP_ORIGIN to Prevent Spamming

Over the last couple days I've been getting millions of requests from rotating IPs. They're attempting to run post requests and seem to be using an incorrect HTTP_ORIGIN. By incorrect, I mean that it's not the same as what my server sends:
My server sends: "https://www.example.com"
The spam request sends: www.example.com
I placed some logging for each scenario:
User logged in and has incorrect HTTP_ORIGIN
User NOT logged in and has incorrect HTTP_ORIGIN
What I've noticed is that there are users that are logged in, but have the wrong HTTP_ORIGIN (origin is missing "https://". I have checked those user accounts and while they appear to be real, and not created by the original spam requests, they may be currently run through scripts.
It seems like it would prevent those users from accessing the POST requests of the site, but on the other hand, if they were real users, it would cause a problem.
Now if I were to put filtering in place to block requests that didn't match the origin, my questions are:
What would be the side effect of that?
Are there downsides or negative aspects?
Would I see drops in traffic?
If that so, It's like you said some are using your website from scripts, considering if your website is normal (I mean not like a website to upload data or sth like that), then it would be good to consider adding captcha to your website in place of filtering requests (cause I think it would be simple for those who send incorrect HTTP_ORIGIN to make a similar one to the original if they use a sslstream especially if it is for malicious goals).
And for the consequences if you use a filtering to the http request, I think the requests will drop remarkably (since you will refuse incorrect ones), and some real users who use scripts will switch to browser (it's a rare case especially if they scrape data from website in an automatic way) or they will stop using your website.
You need to wait for further research and make sure that those false requests are not malicious ones (perhaps they are using simple tcp client). Either way it is best for the time being to inspect data sent in the POST requests (incorrect ones) and see if there is some suspicious data (In that case you should use some safety method in your website)

Moderate content using HTTP DELETE

For a platform using a mostly-RESTful HTTP API to moderate many types of content, I am wondering if having clients call DELETE on the same endpoint they used to create the content makes sense.
The API would identify the client as either the content's creator, a platform moderator, or a regular user.
In the case of the first two, the content would be immediately deleted, but in the case of the regular user, the content would be flagged for review and essentially be deleted only for that user.
This is as opposed to POSTing to /flag and /remove endpoints for each type of content as this requires additional routes and other overhead.
Update: The real question here is:
Does it make sense to use HTTP DELETE to moderate content in the way described? Will that lead to future complications?
I'm assuming clients created the content by a PUT request to an endpoint of their choice.
From the client viewpoint, I don't see any obvious problems with the approach. In fact, this is exactly how DELETE is intended to be used in remote authoring applications, but there are some minor issues that depend on how much information you want the clients to have.
Do you want the regular user to know his resource is flagged for deletion, or do you want that to be completely transparent? If the first, the DELETE request should return 202 Accepted and some description of the status, and a further GET request might inform the client of the pending deletion in some way. If you don't care about that, you can simply return 404 Not Found or 410 Gone, but then you might have to deal with the possibility of the client creating new content for the same endpoint while the deletion is still pending. That might be a problem or not, depending on your implementation of the PUT semantics.

How to propagate data from mod-auth-external authenticator to served page

Background
In our Apache configuration we use mod-auth-external (previously on Google Code) to invoke PAM authentication.
Now there is a request for proper handling of shadow-based password expiration:
If password is before warning period Apache should respond with HTTP status code 200. Nothing new here.
If password is in warning period (its validity end is near) Apache should respond with HTTP status code 200, but include somehow information about the warning period.
If password is in expiration period (it is no longer valid but user can still change it on his own) Apache should respond with HTTP status code 401 and include somehow information about expiration period.
If password is beyond expiration period (it is no longer valid and account was locked, administrator must unlock it) Apache should respond with HTTP status code 401 and include somehow information about the locked state.
(There are also corner cases of page missing or some other errors. It is not clear what to do then. But it seems that solving above points would allow to solve those corner cases as well.)
Our PAM authenticator (used through mod-auth-external) is able to differentiate those cases by adjusting return values. That we already have.
The problem is however how to get information from the authenticator to the associated action serving the page (either actual page with 200 status code or 401 error document).
Current investigations
It should be noted that there is significant difference between requirement 2 and requirements 3 and 4.
Requirements 3 and 4 alone are somewhat easier because they both involve our mod-auth-external authenticator returning error (access denied). So we only need to know how to get that error code in 401 error page. I even raised issue on that on mod-auth-external page.
Requirement 2 is much more difficult. In that case our authenticator must return 0 (access granted) and still somehow propagate information about the warning to whatever gets served in the end.
Logs parsing
Obvious (and ugly) idea is to parse logs. mod-auth-external description on Google Code Wiki mentions that authenticator return value gets written to Apache syslog. Also whatever authenticator prints to standard error stream gets logged as well.
This could be used to pass information from authenticator to some other entities.
The difficulty here is that it is not clear how to do it safely. What to print to be sure that "the other entity" will match properly current request with log entry. Mere URL doesn't seem to be enough since there can be multiple requests for the same URL at the same time. While I don't see anything more useful in what authenticator gets.
Another issue here is that it seems that to be able to parse the logs you have to have some non-trivial code running for "the other entity". And this complicates things further since how should we do it?
Another idea
If we could make the authenticator somehow modify "request session" (or whatever, maybe just environment? - I don't know, I'm new to Apache) to add arbitrary data to it we would be (almost) at home.
Our authenticator would somehow store "password status" and also possibly days remaining to the end of warning/expiration period (if applicable). Then upon serving 401 error page we would retrieve that back and use it to dynamically generate content of the page.
Or even better we would have it stored in session so that the other end could read that data directly. (For cases where it is not simply a browser showing page.)
But so far I fail to see how to do that.
Do you have any idea how to meet those requirements?
For over a month I got no answer here. Nor on GitHub issue that I opened for mod-auth-external.
So I ended doing a custom modification to our mod-auth-external. I don't like modifying third party software but this one seems dead anyway. And also it turned out we are using pretty old version (2.2.9 which I upgraded to 2.2.11, the last in 2.2.x line). Which already had some customizations anyway.
I explained details of the solution in a comment to my GitHub issue so I will not repeat them here.
I will however comment on shadow details as they were not mentioned there.
I had two choices: either use getspnam function to retrieve shadow data or to parse messages generated by PAM. First attempts based on getspnam function but in the end I used PAM messages. I didn't have strong reasons for any of those. However I decided to propagate in HTTP response not only shadow status but any PAM message that was generated and so it seemed easier to follow that way.

How long does Google continue polling a linked CSE specification file after it's requested?

When you create a Google Custom Search Engine (CSE) with a linked specification file on your server, Google's "FeedFetcher-Google-CoOp" bot requests that file in order to build the CSE. It appears that even after results have been returned to the user and the specification file is no longer used, Google continues polling it regularly for at least several days.
My question is how long Google will continue polling the file after it has stopped being requested by your CSE code, and if there is any way to force it to stop immediately.
(We created a dynamic linked CSE that was unique to each query, which meant many, many specification files (the same script with different GET arguments each time) were requested. Now that we are no longer using them, FeedFetcher-Google-CoOp continues to request this script with various past arguments.
FeedFetcher-Google-CoOp ignores robots.txt. We are now returning 410: Gone for all requests, but it is difficult to tell whether this is having an effect, since there are so many different versions being requested (ie: /script.php?query=). Ideally there would be some way to tell Google that script.php does not exist, regardless of arguments, but without robots.txt, I can't find a way to do so.
TL;DR:
1) Will Google stop requesting this script on its own eventually? If so, when?
2) Is there a way to stop it requesting immediately?
If left alone, it appears Google will continue requesting these files indefinitely (at least for months). It ignores 410 (gone) responses, but it appears that it respects 301 redirects! So to stop Google trying to request outdated CSE specifications, you can 301 redirect them to a null file. Google will likely still try to access the file again for every set of arguments it has cached, but should stop trying after that.