Best practice for initial return in a REST-like image upload api endpoint? - api

When sending a file, e.g. an image, over HTTP to an API, how should the server respond?
Examples:
respond as soon as file is written to disk
respond only when file is written, processed, checksummed, thumbnailed, watermarked etc.
respond as fast as possible with a link to the resource (even if it's a 404 for a few moments afterwards)
add a 'task' endpoint and respond instantly with a task ID to track the progress before data transfer & processing (eventually including path to resource)
Edit: Added one idea from an answer to a similar question: rest api design and workflow to upload images.

The client doesn't know about disks, processing, checksumming, thumbnailing, etc.
The options then are pretty simple. You either want to return the HTTP request as quickly as possible, or you want the client to wait until you know the operation was successful.
If you want the client to wait, return 201 Created. If you want to return as quickly as possible, return 202 Accepted.
Both are acceptable designs. You should let your own requirements dictate which is better for your case. I would say that by default it's a good idea to 'default' to waiting until the HTTP request was guaranteed to be successful, and only use 202 Accepted if that was a specific requirement.
You could also let the client decide with a Prefer header:
Prefer: respond-async, wait=100
See https://www.rfc-editor.org/rfc/rfc7240#section-4.3

Related

Restful api - Is it a good practice, for the same upsert api, return different HttpStatusCode based on action

So i have one single http post API called UpsertPerson, where it does two things:
check if Person existed in DB, if it does, update the person , then return Http code 200
if not existed in DB, create the Person, then return http 201.
So is it a good practices by having the same api return different statusCode (200,201) based on different actions(update, create)?
This is what my company does currently , i just feel like its weird. i think we should have two individual api to handle the update and create.
ninja edit my answer doesn't make as much sense because I misread the question, I thought OP used PUT not POST.
Original answer
Yes, this is an excellent practice, The best method for creating new resources is PUT, because it's idempotent and has a very specific meaning (create/replace the resource at the target URI).
The reason many people use POST for creation is for 1 specific reason: In many cases the client cannot determine the target URI of the new resource. The standard example for this is if there's auto-incrementing database ids in the URL. PUT just doesn't really work for that.
So PUT should probably be your default for creation, and POST if the client doesn't control the namespace. In practice most APIs fall in the second category.
And returning 201/200/204 depending on if a resource was created or updated is also an excellent idea.
Revision
I think having a way to 'upsert' an item without knowing the URI can be a useful optimization. I think the general design I use for building APIs is that the standard plumbing should be in place (CRUD, 1 resource per item).
But if the situation demands optimizations, I typically layer those on top of these standards. I wouldn't avoid optimizations, but adopt them on an as-needed basis. It's still nice to know if every resource has a URI, and I have a URI I can just call PUT on it.
But a POST request that either creates or updates something that already exists based on its own body should:
Return 201 Created and a Location header if something new was created.
I would probably return 200 OK + The full resource body of what was updated + a Content-Location header of the existing resource if something was updated.
Alternatively this post endpoint could also return 303 See Other and a Location header pointing to the updated resource.
Alternatively I also like at the very least sending a Link: </updated-resource>; rel="invalidates" header to give a hint to the client that if they had a cache of the resource, that cache is now invalid.
So is it a good practices by having the same api return different statusCode (200,201) based on different actions(update, create)?
Yes, if... the key thing to keep in mind is that HTTP status codes are metadata of the transfer-of-documents-over-a-network domain. So it is appropriate to return a 201 when the result of processing a POST request include the creation of new resources on the web server, because that's what the current HTTP standard says that you should do (see RFC 9110).
i think we should have two individual api to handle the update and create.
"It depends". HTTP really wants you to send request that change documents to the documents that are changed (see RFC 9111). A way to think about it is that your HTTP request handlers are really just a facade that is supposed to make your service look like a general purpose document store (aka a web site).
Using the same resource identifier whether saving a new document or saving a revised document is a pretty normal thing to do.
It's absolutely what you would want to be doing with PUT semantics and an anemic document store.
POST can be a little bit weird, because the target URI for the request is not necessarily the same as the URI for the document that will be created (ie, in resource models where the server, rather than the client, is responsible for choosing the resource identifier). A common example would be to store new documents by sending a request to a collection resource, that updates itself and selects an identifier for the new item resource that you are creating.
(Note: sending requests that update an item to the collection is a weird choice.)

How do I design a REST call that is just a data transformation?

I am designing my first REST API.
Suppose I have a (SOAP) web service that takes MyData1 and returns MyData2.
It is a pure function with no side effects, for example:
MyData2 myData2 = transform(MyData myData);
transform() does not change the state of the server. My question is, what REST call do I use? MyData can be large, so I will need to put it in the body of the request, so POST seems required. However, POST seems to be used only to change the server state and not return anything, which transform() is not doing. So POST might not be correct? Is there a specific REST technique to use for pure functions that take and return something, or should I just use POST, unload the response body, and not worry about it?
I think POST is the way to go here, because of the sheer fact that you need to pass data in the body. The GET method is used when you need to retrieve information (in the form of an entity), identified by the Request-URI. In short, that means that when processing a GET request, a server is only required to examine the Request-URI and Host header field, and nothing else.
See the pertinent section of the HTTP specification for details.
It is okay to use POST
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.”
It's not a great answer, but it's the right answer. The real issue here is that HTTP, which is a protocol for the transfer of documents over a network, isn't a great fit for document transformation.
If you imagine this idea on the web, how would it work? well, you'd click of a bunch of links to get to some web form, and that web form would allow you to specify the source data (including perhaps attaching a file), and then submitting the form would send everything to the server, and you'd get the transformed representation back as the response.
But - because of the payload, you would end up using POST, which means that general purpose components wouldn't have the data available to tell them that the request was safe.
You could look into the WebDav specifications to see if SEARCH or REPORT is a satisfactory fit -- every time I've looked into them for myself I've decided against using them (no, I don't want an HTTP file server).

REST API for sending files between services

I'm building a microservice which one of it's API's expects a file and some parameters which the API will process and return a response for.
I've searched and found some references, mostly pointing towards form-data (multipart), however they mostly refer to client to service and not service to service like in my case.
I'll be happy to know what is the best practice for this case for both the client (a service actually) and me.
I would also suggest to perform a POST request (multipart) to a service endpoint that can process/accept a byte stream wrapped into the provided HTML body(s). A PUT request may also work in some cases.
Your main concerns will consist in binding enough metadata to the request so that the remote service can correctly handle it. This include in particular the following headers:
Content-Type: to provide the MIME type of the data being transferred and enable its proper processing.
Content-Disposition: to provide additional information about the body part such as the file name.
I personally believe that a single request is enough (in contrast to #Evert suggestion) as it will result in less overhead overall and will keep things simple (and RESTful) by avoiding any linking (or state) between successive requests.
I would not wrap data in form-data, because it just adds to the total body size. You can just put the entire raw file in the body of a PUT or POST request.
If you also need to send meta-data, I would suggest 2 requests. If you absolutely can't do 2 requests, form-data might still be the best option and it does work server-to-server.

RESTful way of getting a resource, but creating it if it doesn't exist yet

For a RESTful API that I'm creating, I need to have some functionality that get's a resource, but if it doesn't exist, creates it and then returns it. I don't think this should be the default behaviour of a GET request. I could enable this functionality on a certain parameter I give to the GET request, but it seems a little bit dirty.
The main point is that I want to do only one request for this, as these requests are gonna be done from mobile devices that potentially have a slow internet connection, so I want to limit the requests that need to be done as much as possible.
I'm not sure if this fits in the RESTful world, but if it doesn't, it will disappoint me, because it will mean I have to make a little hack on the REST idea.
Does anyone know of a RESTful way of doing this, or otherwise, a beatiful way that doesn't conflict with the REST idea?
Does the client need to provide any information as part of the creation? If so then you really need to separate out GET and POSTas otherwise you need to send that information with each GET and that will be very ugly.
If instead you are sending a GET without any additional information then there's no reason why the backend can't create the resource if it doesn't already exist prior to returning it. Depending on the amount of time it takes to create the resource you might want to think about going asynchronous and using 202 as per other answers, but that then means that your client has to handle (yet) another response code so it might be better off just waiting for the resource to be finalised and returned.
very simple:
Request: HEAD, examine response code: either 404 or 200. If you need the body, use GET.
It not available, perform a PUT or POST, the server should respond with 204 and the Location header with the URL of the newly created resource.

In the new ASP.NET Web API, how do I design for "Batch" requests?

I'm creating a web API based on the new ASP.NET Web API. I'm trying to understand the best way to handle people submitting multiple data-sets at the same time. If they have 100,000 requests it would be nice to let them submit 1,000 at a time.
Let's say I have a create new Contact method in my Contacts Controller:
public string Put(Contact _contact)
{
//add new _contact to repository
repository.Add(_contact);
//return success
}
What's the proper way to allow users to "Batch" submit new contacts? I'm thinking:
public string BatchPut(IEnumerable<Contact> _contacts)
{
foreach (var contact in _contacts)
{
respository.Add(contact);
}
}
Is this a good practice? Will this parse a GET request with a JSON array of Contacts (assuming they are correctly formatted)?
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
Thanks a million!
When you PUT a collection, you are either inserting the whole collection or replacing an existing collection as if it was a single resource. It is very similar to GET, DELETE or POST a collection. It is an atomic operation. Using is as a substitute for individual calls to PUT a contact may not be very RESTfull (but that is really open for debate).
You may want to look at HTTP pipelining and send multiple PutContact requests of the same socket. With each request you can return standard HTTP status for that single request.
I implemented batch updates in the past with SOAP and we encountered a number of unforeseen issues when the system was under load. I suspect you will run into the same issues if you don't pay attention.
For example, the database may timeout in the middle of the batch update and the all hell broke loose in terms of failures, reliability, transactions etc. And the poor client had to figure out what was actually updated and try again.
When there was too many records to update, the HTTP request would time out because we took too long. That opened another can of worms.
Another concern was how much data would we accept during the update? Was 10MB of contacts enough? Perhaps 1MB? Larger buffers has numerous implications in terms of memory usage and security.
Hence my suggestion to look at HTTP pipelining.
Update
My suggestion would to handle batch creation of contacts as an async process. Just assume that a "job" is the same as a "batch create" process. So the service may look as follows:
public class JobService
{
// Post
public void Create(CreateJobRequest job)
{
// 1. Create job in the database with status "pending"
// 2. Save job details to disk (or S3)
// 3. Submit the job to MSMQ (or SQS)
// 4. For 20 seconds, poll the database to see if the job completed
// 5. If the job completed, return 201 with a URI to "Get" method below
// 6. If not, return 202 (aka the request was accepted for processing, but has not completed)
}
// Get
public Job Get(string id)
{
// 1. Fetch the job from the database
// 2. Return the job if it exists or 404
}
}
The background process that consumes stuff from the queue can update the database or alternatively perform a PUT to the service to update the status of Job to running and completed.
You'll need another service to navigate through the data that was just processed, address errors and so forth.
You background process may be need to be tolerant of validation errors. If not, or if your service does validation (assuming you are not doing database calls etc for which response times cannot be guaranteed), you can return a structure like CreateJobResponse that contains enough information for your client to fix the issue and resubmit the request. If you have to do some validation that is time consuming, do it in the background process, mark the job as failed and update the job with the information that will allow a client to fix the errors and resubmit the request. This assumes that the client can do something with the fact that the job failed.
If the Create method breaks the job request into many smaller "jobs" you'll have to deal with the fact that it may not be atomic and pose numerous challenges to monitor whether jobs completed successfully.
A PUT operation is supposed to replace a resource. Normally you do this against a single resource but when doing it against a collection that would mean you replace the original collection with the set of data passed. Not sure if you are meaning to do that but I am assuming you are just updating a subset of the collection in which case a PATCH method would be more appropriate.
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
That is really up to you. There is only a single response so you can send a 200 OK or a 400 Bad Request and put the details in the body.