REST API for sending files between services - api

I'm building a microservice which one of it's API's expects a file and some parameters which the API will process and return a response for.
I've searched and found some references, mostly pointing towards form-data (multipart), however they mostly refer to client to service and not service to service like in my case.
I'll be happy to know what is the best practice for this case for both the client (a service actually) and me.

I would also suggest to perform a POST request (multipart) to a service endpoint that can process/accept a byte stream wrapped into the provided HTML body(s). A PUT request may also work in some cases.
Your main concerns will consist in binding enough metadata to the request so that the remote service can correctly handle it. This include in particular the following headers:
Content-Type: to provide the MIME type of the data being transferred and enable its proper processing.
Content-Disposition: to provide additional information about the body part such as the file name.
I personally believe that a single request is enough (in contrast to #Evert suggestion) as it will result in less overhead overall and will keep things simple (and RESTful) by avoiding any linking (or state) between successive requests.

I would not wrap data in form-data, because it just adds to the total body size. You can just put the entire raw file in the body of a PUT or POST request.
If you also need to send meta-data, I would suggest 2 requests. If you absolutely can't do 2 requests, form-data might still be the best option and it does work server-to-server.

Related

GET vs. POST when the request is some arbitrary size

I understand the semantics of GETting vs. POSTing, one endpoint should get data, the other should post it. The latter being a request that you may not wish the user to be able to easily replay.
That said, on the project I'm working on at the moment - the approach has been to POST to endpoints that are clearly responsible for responding with data, and these endpoints do not transform data in any way.
The reasoning behind this has been that the payloads are (potentially) of considerable size and seem more appropriate for a body as opposed to a query string.
Can anybody please shed light on which request would be right for a GET request that takes a large request payload? I'm not asking for opinion, I'm asking for what would be compliant to RESTful deisgn.
Further Context
The request is potentially large due to the fact it's a search DTO from the UI, where users may choose to pass any number of filters or search terms.
Can anybody please shed light on which request would be right for a GET request that takes a large request payload? I'm not asking for opinion, I'm asking for what would be compliant to RESTful deisgn.
Today's answer: It's OK to use POST.
For requests that are fundamentally read-only, we'd like to use standardized HTTP semantics to communicate that to general purpose components, so that they can themselves do intelligent things.
BUT: GET, while being both safe and ubiquitous, isn't an appropriate choice when you need to include a message-body in the request:
content received in a GET request has no generally defined semantics
So if you can't, for whatever reason, copy the information you need into a resource identifier, then GET is not an option.
Now, if your payloads are consistent with WebDAV, then you might be able to use one of the safe methods described in those specifications. But, as far as I can tell, they aren't really appropriate for general use.
Tomorrow's answer: the HTTP-WG accepted a proposal for a safe-method-with-body. So we should eventually expect to see a registered HTTP method that is safe and has defined semantic for the request content.
Then, depending on what those semantics are, we may be able to use it for requests like yours.

How do I design a REST call that is just a data transformation?

I am designing my first REST API.
Suppose I have a (SOAP) web service that takes MyData1 and returns MyData2.
It is a pure function with no side effects, for example:
MyData2 myData2 = transform(MyData myData);
transform() does not change the state of the server. My question is, what REST call do I use? MyData can be large, so I will need to put it in the body of the request, so POST seems required. However, POST seems to be used only to change the server state and not return anything, which transform() is not doing. So POST might not be correct? Is there a specific REST technique to use for pure functions that take and return something, or should I just use POST, unload the response body, and not worry about it?
I think POST is the way to go here, because of the sheer fact that you need to pass data in the body. The GET method is used when you need to retrieve information (in the form of an entity), identified by the Request-URI. In short, that means that when processing a GET request, a server is only required to examine the Request-URI and Host header field, and nothing else.
See the pertinent section of the HTTP specification for details.
It is okay to use POST
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.”
It's not a great answer, but it's the right answer. The real issue here is that HTTP, which is a protocol for the transfer of documents over a network, isn't a great fit for document transformation.
If you imagine this idea on the web, how would it work? well, you'd click of a bunch of links to get to some web form, and that web form would allow you to specify the source data (including perhaps attaching a file), and then submitting the form would send everything to the server, and you'd get the transformed representation back as the response.
But - because of the payload, you would end up using POST, which means that general purpose components wouldn't have the data available to tell them that the request was safe.
You could look into the WebDav specifications to see if SEARCH or REPORT is a satisfactory fit -- every time I've looked into them for myself I've decided against using them (no, I don't want an HTTP file server).

What is a good way to handle multiple response codes for an API?

Consider a call to a REST API:
POST /employees
[{ "name":"joe", "job":"dev",
{ "name":"bob" }]
For each array element an employee would be created. However, job is a required field, so the second element is invalid and the employee cannot be created.
What is a good response for this? 201 or 422?
I saw 207, but it seems to require an XML response, and the API does not use XML. It seems strange to return XML only for this case.
For this particular use case, I am thinking that all valid elements would be used to create resources. But I'm not sure what a good response would be.
We use 400 for any field validation errors (including missing required fields), be it on a single resource, or an entire collection.
I'm not entirely sure what are you trying to do.
Process only valid parts from the payload.
Don't process the payload at all.
In order for something to happen, the entire payload should be valid, not just parts of it. So I wouldn't process only the valid parts of the payload.
I would not use any 2xx status, because that would say to the user that everything worked OK, which isn't true in this situation.
I would not return a 400 status, because the syntax of the payload is syntactically correct JSON, but semantically erroneous.
That would leave us with a 422 status, which is more appropriate in this situation, because like I previously said, you have semantic not syntactic problems.
I'd stick with #visc advice, i.e., avoid batch processing. I have to say that #Alexandru Guzinschi advice on all or nothing approach despite being reasonable is not always feasable as sometimes you will become aware of one operation failure only after having tried it.
Both respose codes 207 and 422 you mentioned are defined as HTTP extensions for WebDAV, not in the protocol itself. And it's RFC (4918) is rather obsolete (that's why the suggestion of XML for the 207 response body).
Anyway, If you want to go down this road, I want to clarify that the use of XML is optional and you could create your own JSON definition. That would be harmless if you're not trying to implement a WebDAV server,
RFC 4918
"A Multi-Status response conveys information about multiple resources
in situations where multiple status codes might be appropriate. The
default Multi-Status response body is a text/xml or application/xml
HTTP entity with a 'multistatus' root element
One approach is to have a batch handler. So, you would send a batch of operations in a single HTTP request, but they are processed as individual operations against the api on the server. So, the batch response would include the appropriate response code for each individual operation.
A batch request can also include changesets to define transactional behavior.
Your batch could also include calls to different operations as necessary.
You don't state how you are building your api. If using ASP.NET Web API, there are batch handling capabilities available out of the box.

What is http multipart request?

I have been writing iPhone applications for some time now, sending data to server, receiving data (via HTTP protocol), without thinking too much about it. Mostly I am theoretically familiar with process, but the part I am not so familiar is HTTP multipart request. I know its basic structure, but the core of it eludes me.
It seems that whenever I am sending something different than plain text (like photos, music), I have to use a multipart request. Can someone briefly explain to me why it is used and what are its advantages?
If I use it, why is it better way to send photos that way?
An HTTP multipart request is an HTTP request that HTTP clients construct to send files and data over to an HTTP Server. It is commonly used by browsers and HTTP clients to upload files to the server.
What it looks like
See Multipart Content-Type
See multipart/form-data
As the official specification says, "one or more different sets of data are combined in a single body". So when photos and music are handled as multipart messages as mentioned in the question, probably there is some plain text metadata associated as well, thus making the request containing different types of data (binary, text), which implies the usage of multipart.
I have found an excellent and relatively short explanation here.
A multipart request is a REST request containing several packed REST requests inside its entity.

REST API for multiple actions on a file

This is continuation of my question on how to design a REST API for a media analysis server. As per Derrel's answer, in my current design I start the analysis of a media file using a POST /facerecognition/analysisrequests?profileId=33 which specifies that profile ID 33 (previously created on the server by another POST) should be used.
I have two short questions:
How can I extend this approach to have multiple analysis requests on the same file, e.g. perform both face recognition, text detection, and ad detection on the given file? Is using a binary coding (e.g. each bit signifies an analysis) and e.g. doing POST http:[server URL]/00000011/analysisrequests?profileId=33 a good idea?
Is using a server side DB (e.g. mySQL) the best way to keep track of all the profile and process IDs?
Thanks,
C
I'd put the types of analysis requested as parameters, rather than as part of the path. They could be POST parameters in the body of the request, or specified in the URL list profileId. Example: POST http://server/analysisrequest?profileId=33&analysisType=faceRecognition&analysisType=textDetection. It's perfectly ok to submit multiple values for a parameter.
You could submit the binary encoding of the analysis type, but spelling it out is a lot more clear and self-documenting. The binary encoding is a bit fragile when adding a new analysis type as well; adding a new digit would affect the urls all requests, even those that don't use the new type.
A server side database is typical for this kind of web application and it's probably a good solution. You might also want to consider an in-process SQL database solution like sqlite or derby to avoid the complexity of a separate database process.
I would recommend making more complete use of HTTP POST. Make all POST requests against the same URI: /analysisrequest. Use application/x-www-form-urlencoded to send the parameters.
So:
Host: yourserver.com
Accept: */*
Content-Length: 73
Content-Type: application/x-www-form-urlencoded
face_recognition=true&text_detection=true&ad_detection=true&profile_id=33
multipart/form-data would also allow you to send the file being analyzed in the same request as the operations to perform on the file, assuming that's a desired scenario. With the added advantage that you ought to be able to use the exact same API end-point for both HTML forms and your REST API.