I'm building a library to interact with a public API. Should I be doing validation of the data passed or leave that to the API?

I'm building a library to interact with a public API. Should I be doing validation of the data passed or leave that to the API? - api

I am building an open source library to interact with a public API. The API validates all data passed to it and on failure properly returns a non-200 status code and usually a failure message causing my library to throw an Exception.
Some of my methods do check for required parameters, but mostly I have decided not to validate the data before passing it to the API.
Should a library for an API also be validating the data? If so, to what extent? Should it attempt to fully validate all data or just verify that required parameters are present/not empty?

The rule has two parts: on input, be an accepting loving grandmother: take what the caller gives you and don't be a stickler about the interface convention in places where it makes little difference; or where you can supply reasonable defaults. On output, follow the convention to the letter: be a strict pedant, a martinet.
So, in my opinion, the answer is yes, you should check your caller's input and give the API the best data you can. But this does not necessarily mean you should duplicate validations that the API is going to do anyway.
You have to use your head and think about what makes sense.

Related

Patterns when designing REST POST endpoint when resource has a computed property

I have a resource, as an example a 'book'.
I want to create a REST POST endpoint to allow consumers to create a new book.
However, some of the properties are required and computed by API, and others were actually taken as they are
Book
{
name,
color,
author # computed
}
Let's say the author is somehow calculated in API based on the book name.
I can think of these solutions each has its drawbacks:
enforce consumer to provide the author and just filter it (do not take into account as an input) # bad because it is very unpredictable why the author was changed
allow the user to provide author # same problem
do not allow the user to provide an author and show an exception if the user provides it
The last solution seems to be the most obvious one. The main problem I can see is that it is inconsistent and can be bizarre for consumers to see the author later on GET request.
I want my POST endpoint to be as expressive as possible. So the POST and GET data transfer objects will look almost the same.
Are there any simple, expressive, and predictable patterns to consider?

Personally I'm a big fan of using the same format for a GET request as well as a PUT.
This makes it possible for a client to do a GET request, add a property to the object they received and immediately PUT again. If your API and clients follow this pattern, it also means it can easily add new properties to GET requests and not break clients.
However, while this is a nice pattern I don't really think that same expectation exists at much for 'creation'. There's usually many things that make less less to require as a property when creating new items (think 'id' for example), so I usually:
Define a schema for PUT and GET.
Define a separate schema for POST that only contains the relevant properties for creation.
If users supply properties not in the schema, always error with a 422.

some of the properties are required and computed by API
Computed properties are neither required nor optional, by definition. No reason to ask consumers to pass such properties.
do not allow the user to provide an author and show an exception if the user provides it
Indeed, DTO should not contain author-property. Consumers can send over network whatever they want, however it is the responsibility of the API-provider to publish contract (DTO) for consumers to use properly. API-provider controls over what properties to consider, and no exception should be thrown, as the number of "bad" properties that can be sent by consumers is endless.
So the POST and GET data transfer objects will look almost the same
Making DTOs of the same resource look the same is not a goal. In many cases, get-operation exposes a lot more properties than post-operation for the same resource, especially when designing domain-driven APIs.
Are there any simple, expressive, and predictable patterns to consider?
If you want your API to express the fact that author is computed, you can have the following endpoints:
POST http://.../author-computed-books
GET http://.../books/1
Personally, I wouldn't implement that way since it does not look natural, however you can get the idea.

I want my POST endpoint to be as expressive as possible. So the POST
and GET data transfer objects will look almost the same.
Maybe just document it instead of relying explicit stuff like it must be almost the same as the GET endpoint.
E.g. my POST endpoint is POST /number "1011" and my GET endpoint is GET /number -> 11. If I don't document that I expect binary and I serve decimal, then nobody will know and they would guess for example decimal for both. Beyond documentation another way of doing this and to be more explicit is changing the response for GET to include the base {"base":10, value:"11"} or changing the GET endpoint GET /number/decimal -> 11.
As of the computed author I don't understand how you would compute it. I mean either a book is registered and the consumer shouldn't register it again or you don't know much about the author of it. If the latter, then you can guess e.g. based on google results for the title, but it will be a guess, not necessarily true. The same with consumer data, but at least that is what the consumers provided. There is no certainty. So for me it would be a complex property not just a primitive one if the source of the information matters. Something like "author": {name: "John Wayne", "source": "consumer/service"} normally it is complex too, because authors tend to have ids, names, other books, etc.
Another thought that if it is weird for the consumers instead of expected, then I have no idea why it is a feature at all. If author guessing is a service, then a possible solution is making the property mandatory and adding a guessing service GET /author?by-book-name={book-name}, so they can use the service if they want to. Or the same with a completely optional property. This way you give back the control to the consumers on whether they want to use this service or not.

Can we pass parameters to HTTP DELETE api

I have an API that will delete a resource (DELETE /resources/{resourceId})
THE above API can only tell us to delete the resource. Now I want to extend the API for other use cases like taking a backup of that resource before deleting or delete other dependant resources of this resource etc.
I want to extend the delete API to this (DELETE /resources/{resourceId}?backupBeforeDelete=true...)
Is the above-mentioned extension API good/recommended?

According to the HTTP Specification, any HTTP message can bear an optional body and/or header part, which means, that you can control in your back-end - what to do (e.g. see what your server receives and conventionally perform your operation), in case of any HTTP Method; however, if you're talking about RESTful API design, DELETE, or any other operation should refer to REST API endpoint resource, which is mapped to controller's DELETE method, and server should then perform the operation, based on the logic in your method.
DELETE /resources/{resourceId} HTTP/1.1
should be OK.

Is the above-mentioned extension API good/recommended?
Probably not.
HTTP is (among other things) an agreement about message semantics: a uniform agreement about what the messages mean.
The basic goal is that, since everybody has the same understanding about what messages mean, we can use a lot of general purpose components (browsers, reverse proxies, etc).
When we start trying to finesse the messages in non standard ways, we lose the benefits of the common interface.
As far as DELETE is concerned, your use case runs into a problem, which is that HTTP does not define a parameterized DELETE.
The usual place to put parameters in an HTTP message is within the message body. Unfortunately...
A payload within a DELETE request message has no defined semantics; sending a payload body on a DELETE request might cause some existing implementations to reject the request
In other words, you can't count on general purpose components doing the right thing here, because the request body is out of bounds.
On the other hand
DELETE /resources/{resourceId}?backupBeforeDelete=true
This has the problem that general purpose components will not recognize that /resources/{resourceId}?backupBeforeDelete=true is the same resource as /resources/{resourceId}. The identifiers for the two are different, and messages sent to one are not understood to affect the other.
The right answer, for your use case, is to change your method token; the correct standard method for what you are trying to do here is POST
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding, 2009
You should use the "real" URI for the resource (the same one that is used in a GET request), and stick any parameters that you need into the payload.
POST /resources/{resourceId}
backupBeforeDelete=true
Assuming you are using POST for other "not worth standardizing" actions, there will need to be enough context in the request that the server can distinguish the different use cases. On the web, we would normally collect the parameters via an HTML form, the usual answer is to include a request token in the body
POST /resources/{resourceId}
action=delete&backupBeforeDelete=true
On the other hand, if you think you are working on an action that is worth standardizing, then the right thing to do is set to defining a new method token with the semantics that you want, and pushing for adoption
MAGIC_NEW_DELETE /resources/{resourceId}
backupBeforeDelete=true
This is, after all, where PATCH comes from; Dusseault et al recognized that patch semantics could be useful for all resources, created a document that described the semantics that they wanted, and shepherded that document through the standardization process.

Unable to access the service instance from within an implementation of IDataContractSurrogate

this is my first post, and I really have tried hard to find an answer, but am drawing a blank thus far.
My implementation of IDataContractSurrogate creates surrogates for certain 'cached' objects which I maintain (this works fine). What doesn't work is that in order for this system to operate effectively, it needs to access the service instance for some properties of the instance which it is maintaining from the interaction with its client. Also, when my implementation of IDataContractSurrogate works in its 'client mode' it needs access to the properties of the client instance in a similar way. Access to the information from the client and service instance affects how I create my surrogate types (or rather SHOULD do if I can answer this question!)
My service instancing is PerSession and concurrent.
On the server side, calls to GetDataContractType and GetDeserializedObject contain a valid OperationContext.Current from which I can of course retreive the service instance. However on the client side, none of the calls yield an OperationContext.Current. We are still in an operation as I am translating the surrogate types to the data contract types after they have been sent from the server as part of its response to the client request so I would have expected one? Maybe the entire idea of using OperationContext.Current from outside of an Operation invocation is wrong?
So, moving on, and trying to fix this problem I have examined the clientRuntime/dispatchRuntime object which is available when applying my customer behaviour, however that doesn't appear to give me any form of access to the client instance, unless I have a message reference perhaps... and then calling InstanceProvider. However I don't have the message.
Another idea I had was to use IInstanceProvider myself and then maybe build up a dictionary of all the ones which are dished out... but that's no good because I don't appear to have access to any session related piece of information from within my implementation of IDataContractSurrogate to use as a dictionary key.
I had originally implemented my own serializer but thats not what I want. I'm happy with the built in serializer, and changing the objects to special surrogates is exactly what I need to do, with the added bonus that every child property comes in for inspection.
I have also looked at applying a service behavior, but that also does not appear to yield a service instance, and also does not let me set a Surrogate implementation property.
I simply do not know how to gain access to the current session/instance from within my implementation IDataContractSurrogate. Any help would be greatly appreciated.
Many thanks,
Sean

I have solved my problem. The short answer is that I implemented IClientMessageFormatter and IDispatchMessageFormatter to accomplish what I needed. Inside SerializeReply I could always access the ServiceInstance as OperationContext.Current is valid. It was more work as I had to implement my own serialization and deserialization, but works flawlessly. The only issue remaining would be that there is no way to get the client proxy which is processing the response, but so far that is not a show stopper for me.

Where should validation occur: endpoint or object?

I know this has been asked more generally before, but here is my specific situation:
I have an endpoint (API exposed to clients/users) that ends up calling public member functions of some objects. Should I validate at the endpoint or at the member function?
It seems that validating at the endpoint is a little easier in this case because then all of my validation is done around my API functions.
But somehow it feels like the objects should maintain themselves and prevent invalid data from being used on their own functions.
Thanks!

Validation can be, and usually is, quite complex process, that involves lots of heavy, bussiness-related logic and which has plenty of dependencies to the outer resources.
I suppose it's better to let the client create invalid object and validate it at the very end - just before its use in the bussines service.

Use Java exceptions internally for REST API user errors?

We have a REST API that works great. We're refactoring and deciding how to internally handle errors by the users of our API.
For example the user needs to specify the "movie" url parameter which should take the value of "1984", "Crash", or "Avatar". First we check to see if it has a valid value.
What would be the best approach if the movie parameter is invalid?
return null from one of the internal methods and check for the null in the main API call method
throw an exception from the internal method and catch exceptions in the main API method
I think it would make our code more readable and elegant to use exceptions. However, we're reluctant because we'd be potentially throwing many exceptions because of user API input errors, our code could be perfect. This doesn't seem to be the proper use of exceptions. If there are heavy performance penalties with exceptions, which would make sense with stack traces needing to be collected, etc., then we're unnecessarily spending resources when all we need to do is tell the user the parameter is wrong.
These are REST API methods, so we're not propogating the exceptions to the users of the API, nor would we want to even if possible.
So what's the best practice here? Use ugly nulls or use java's exception mechanism?

Neither.
The key is that passing a bad parameter isn't that exceptional a condition. Exceptions are thrown for exceptional circumstances. (That's the reason to not use them here, not the performance.)
You should be using something like Spring's DataValidation API for binding parameters that are passed in.
A client of a REST API should not be receiving null or exceptions. They should get an error message that gives them an idea of what's going on without exposing those details. "Sorry, we couldn't find that movie" or null? Go with the first, hands down.

If a invalid request came in (e.g. validation error) you should show 400 status code (bad request).
Internally I would also create an exception hierachy which maps to the HTTP Rest domain (see status codes for error cases).
Examples (simplified and being unchecked exceptions):
class RESTBaseException extends RuntimeException{
int statusCode;
public RESTBaseException(int statusCode){ this.statusCode=statusCode; }
//if no statusCode passed we fallback to very broad 500 server error.
public RESTBaseException(){ this.statusCode=500; }
}
class RESTValidationException extends RESTBaseException{
RESTValidationException(){
super(404);
}
}
you can extend above examples by also passing error messages to constructor to make client even more happy.
Later on you should catch these exceptions with a dedicated exception handler in your servlet handler chain (mapping status code to servlet response). For instance in spring mvc there are nice exception-handling solutions for that.
Usually I don't like to create a deep custom exception hierachies but I think for REST api layers they are OK (because the status codes are propagated later).

I will assume that you are doing input validation here and in this case, your database will do a query for a safe string and it won't find the record since it don't exist in your database, ok?
If you are using any MVC framework the Model should throw already a RecordNotFound exception no?
If you are always expecting to find a value then throw the exception if it is missing. The exception would mean that there was a problem.
If the value can be missing or present and both are valid for the application logic then return a null.
More important: What do you do in other places of the code? Consistency is important.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas