Bulk POST request without enumerating objects - api

I'm trying to let my API clients make a POST request that bulk modifies objects that the client doesn't have their IDs.
I'm thinking of implementing this design but I don't feel good about it, are there any better solutions than this?
POST url/objects/modify?name=foo
This request will modify all objects with the name foo

This can be a tricky thing to do with an API because it doesn't age very well.
By that I mean that over time, you might introduce more criteria for the data stored on resources (e.g., you can only set this field to "archived" if the create_time field is older than 6 months). When that happens, your bulk updates will start to only work on some resources and now you have to communicate that back to the person calling the API.
For example, for any failures you need to explain that the update worked for some resources (and list them out) but failed on others (and list them out) and the reason why for each failure (and remember you might have different failure conditions for different resources).
If you're set on going down this path, the closest thing I can think of is the "criteria-based delete" method shown here: https://google.aip.dev/165.

Related

GET vs POST API calls and cache issues

I know that GET is used to retrieve data from the server without modifying anything. Whereas POST is used to add data. I won't get into PUT/PATCH, and assume that POST is always used to update and replace data.
The theory is nice, but in practice I have encountered many situations where my GET calls need to be replaced with POST calls. This is because the response often gets incorrectly cached. Where I work there are proxy servers for security, caching, load balancing, etc., and often times the response for GET calls is directly cached to speed up the call, whereas POST calls never get fully cached.
So for my question, if I have an API call /api/get_orders/month. Theoretically, this should be a GET call, however, the number of orders might update any second. So if I call this API at any moment it may return for example 1000, and calling it just two seconds later should return 1001. However, because of the cache, and although adding a version flag such as ?v=<date_as_int> should ensure that the updated value is returned, there seems to be some caches in the proxy servers that might ignore this.
Basically, I don't feel safe enough using GET unless I know for certain that the data will not be modified or if I know for a fact that the response is always the updated data.
So, would you recommend using POST/GET in the case of retrieving daily/monthly number of orders. And if GET, with all the different and complex layers and server set-ups, how can one be certain that the data is always updated?
If you're doing multiple GET request and something is caching the data in between, you have no idea what it is or how to change it's behavior then POST is a valid workaround.
In any normal situation you would take the time what sits in between your browser and your server, and if there's something that's behaving in a way that doesn't make sense, I would try to investigate and fix that.
So you work at a place where some of that infrastructure exists. Maybe talk to the people that maintain it? But if that's not an option and you just want to find the 'ignore every convention and make my request work'-workaround, then you can use POST.

RESTful implementation for "archiving" an entry

Say I have a table for machines, and one of the columns is called status. This can be either active, inactive, or discarded. A discarded machine is no longer useful and it only used for transaction purposes.
Assume now I want to discard a machine. Since this is an entry update, RESTfully, this would be a PUT request to /machines/:id. However, since this is a very special kind of an update, there could be other operations that would occur as well (for instance, remove any assigned users and what not).
So should this be more like a POST to /machines/:id/discard?
From a strict REST perspective, have you considered implementing a PATCH? In this manner, you can have it update just that status field and then also tie it in to updating everything else that is necessary?
References:
https://www.mnot.net/blog/2012/09/05/patch
http://jasonsirota.com/rest-partial-updates-use-post-put-or-patch
I think the most generic way would be to POST a Machine object with { status: 'discarded' } to /machines/:id/.
Personally, I'd prefer the /machines/:id/discard approach, though. It might not be exactly in accordance with the spec, but it's more clear and easier to filter. For example, some users might be allowed to update a machine but not "archive" it.
Personally I think post should be used when the resource Id is either unknown or not relevant to the update being made.
This would make put the method I would use especially since you have other status types that will also need to be updated
path
/machines/id
Message body
{"status":"discarded"}

RESTful way to preallocate an ID

For some reasons my application needs to have an API that flows like:
Client calls server to get ID for a new resource.
Then user spends a while filling out the forms for the resource.
Then user clicks save (or not...), and when he does the client saves by writing to /myresource/{id}
What is the RESTful way to design this?
How should the first call look like? On server side, it's a matter of generating an ID and returning it. It has side effects (increments sequence and thus "reserves space"), but it doesn't explicitly create a resource.
If I understand correctly, the 3rd call should be a PUT because it creates something with a known URI.
One way you could do it is:
client POSTs empty body to /myresource/
server answers with status code 302 (Found) with a Location response header set to /myresource/newresourceid (to indicate the ID / URI that should be used to create the resource)
client PUTs the new resource to /myresource/newresourceid once the user is done filling the form.
Seems RESTful enough. ;)
I'm interested to see the other answers to this question as I imagine there's a lot of ways to do this.
If possible I would let your auto-incrementing ID in the database serve as your surrogate key and assign another field to be your business identifier. It could be something like a product code or a GUID.
With this in mind the client can now create the ID themselves which removes the need for step 1 at all. They would do a PUT to a url such as /myresource/MLN5001 or /myresource/3F2504E0-4F89-11D3-9A0C-0305E82C3301 to create the resource. If the ID is already in use return a 409 Conflict with the conflict in the response body. Otherwise return a 201 Created and include the URI to the resource in the response body and location header.
I would go with
GET /myresource/new-id
POST /myresource/{id}
Your walkthrough is pretty clear on the verb:
"to GET [an] ID for a new resource"
you could rename new-id to whatever you think makes it most clear. If you have multiple resources you need to do this for, it would probably be better to split out the generator into its own resource, such as
GET /id-generator/myresource
GET /id-generator/my-other-resource
If there are multiple cases, the user will quickly learn they need to hit id-generator to get their ID. If it's only one case, it's annoying for them to only have to use it infrequently.
I guess you could also do
GET /myresource-id-generator/next
which looks a little clearer, but then if you ever need another ID to be generated you have to make another resource to do it.
ID allocation is non-idempotent — two invokes of the allocation operation will get different IDs — so that should always be a POST. From that point on, the resource should conceptually exist. However, what I'd do at that point is fill it out with reasonable default values (whether that involves doing POSTs or PUTs is rather immaterial to the RESTfulness of the overall design), so the user can then take their time to alter the things that they want to look like they want them to.
The question then becomes one of whether there should be some kind of “release this; I'm done with altering it” operation at the end. Strict RESTfulness says there shouldn't, as if you know the resource identifier (the URL) then you can talk about it. On the other hand, that doesn't mean the hosting server has to tell anyone else about the resource until the creating user is happy with it; general HATEOAS principles say nothing about when others can discover that a resource exists or whether knowing the name lets you read the thing, but it's entirely reasonable to deny to third parties that a resource exists until the owner of the resource turns on the “make this public” flag.

GET or PUT to reboot a remote resource?

I am struggling (in some sense) to determine which HTTP method is more appropriate for rebooting a remote resource: GET or PUT?
On one hand, it seems more semantic to call http://tools.serviceprovider.net/canopies/d34db33fc4f3?reboot=true because one might want to GET a representation of a freshly rebooted canopy.
On the other hand, a reboot is not 'safe' (nor is it necessarily idempotent, but then a canopy or modem is not just a row in a database) so it might seem more semantic to PUT the canopy into a state of rebooting, then have the server return a 202 to indicate that the reboot was initiated and is processing.
I have been reading up on HTTP/1.1, REST, HATEOAS, and other related concepts over the last week, so I am still putting the pieces together. Could a more seasoned developer please weigh in and confirm or dispel my hunch?
A GET doesn't seem appropriate because a GET is expected, like you said, to be "safe". i.e. no action other than retrieval.
A PUT doesn't seem appropriate because a PUT is expected to be idempotent. i.e. multiple identical operations cause same side-effects as as a single operation. Moreover, a PUT is usually used to replace the content at the request URI with the request body.
A POST appears most appropriate here. Because:
A POST need not be safe
A POST need not be idempotent
It also appears meaningful in that you are POSTing a request for a reboot (much like submitting a form, which also happens via POST), which can then be processed, possibly leading to a new URI containing reboot logs/results returned along with a 303 See Other status code.
Interestingly, Tim Bray wrote a blog post on this exact topic (which method to use to tell a resource representing a virtual machine to reboot itself), in which he also argued for POST. At the bottom of that post there are links to follow-ups on that topic, including one from none other than Roy Fielding himself, who concurs.
Rest is definitely not HTTP. But HTTP definitely does not have only four (or eight) methods. Any method is technically valid (even if as an extension method) and any method is RESTful when it is self describing — such as ‘LOCK’, ‘REBOOT’, ‘DELETE’, etc. Something like ‘MUSHROOM’, while valid as an HTTP extension, has no clear meaning or easily anticipated behavior, thus it would not be RESTful.
Fielding has stated that “The REST style doesn’t suggest that limiting the set of methods is a desirable goal. [..] In particular, REST encourages the creation of new methods for obscure operations” and that “it is more efficient in a true REST-based architecture for there to be a hundred different methods with distinct (non-duplicating), universal semantics.”
Sources:
http://xent.com/pipermail/fork/2001-August/003191.html
http://tech.groups.yahoo.com/group/rest-discuss/message/4732
With this all in mind I am going to be 'self descriptive' and use the REBOOT method.
Yes, you could effectively create a new command, REBOOT, using POST. But there is a perfectly idempotent way to do reboots using PUT.
Have a last_reboot field that contains the time at which the server was last rebooted. Make a PUT to that field with the current time cause a reboot if the incoming time is newer than the current time. If an intermediate server resends the PUT, no problem -- it has the same value as the first command, so it's a no-op.
You might want to get the current time from the server you're rebooting, unless you know that everyone is reasonably time-synced.
Or you could just use a times_rebooted count, eliminating the need for a clock. A PUT times_rebooted: 4 request will cause a reboot if times_rebooted is currently 3, but not if it's 4 or 5. If the current value is 2 and you PUT a 4, that's an error.
The only advantage to using time, if you have a clock, is that sometimes you care about when it happened. You could of course have BOTH a times_rebooted and a last_reboot_time, letting times_rebooted be the trigger.

Confused about Http verbs

I get confused when and why should you use specific verbs in REST?
I know basic things like:
Get -> for retrieval
Post -> adding new entity
PUT -> updating
Delete -> for deleting
These attributes are to be used as per the operation I wrote above but I don't understand why?
What will happen if inside Get method in REST I add a new entity or inside POST I update an entity? or may be inside DELETE I add an entity. I know this may be a noob question but I need to understand it. It sounds very confusing to me.
#archil has an excellent explanation of the pitfalls of misusing the verbs, but I would point out that the rules are not quite as rigid as what you've described (at least as far as the protocol is concerned).
GET MUST be safe. That means that a GET request must not change the server state in any substantial way. (The server could do some extra work like logging the request, but will not update any data.)
PUT and DELETE MUST be idempotent. That means that multiple calls to the same URI will have the same effect as one call. So for example, if you want to change a person's name from "Jon" to "Jack" and you do it with a PUT request, that's OK because you could do it one time or 100 times and the person's name would still have been updated to "Jack".
POST makes no guarantees about safety or idempotency. That means you can technically do whatever you want with a POST request. However, you will lose any advantage that clients can take of those assumptions. For example, you could use POST to do a search, which is semantically more of a GET request. There won't be any problems, but browsers (or proxies or other agents) would never cache the results of that search because it can't assume that nothing changed as a result of the request. Further, web crawlers would never perform a POST request because it could not assume the operation was safe.
The entire HTML version of the world wide web gets along pretty well without PUT or DELETE and it's perfectly fine to do deletes or updates with POST, but if you can support PUT and DELETE for updates and deletes (and other idempotent operations) it's just a little better because agents can assume that the operation is idempotent.
See the official W3C documentation for the real nitty gritty on safety and idempotency.
Protocol is protocol. It is meant to define every rule related to it. Http is protocol too. All of above rules (including http verb rules) are defined by http protocol, and the usage is defined by http protocol. If you do not follow these rules, only you will understand what happens inside your service. It will not follow rules of the protocol and will be confusing for other users. There was an example, one time, about famous photo site (does not matter which) that did delete pictures with GET request. Once the user of that site installed the google desktop search program, that archieves the pages locally. As that program knew that GET operations are only used to get data, and should not affect anything, it made GET requests to every available url (including those GET-delete urls). As the user was logged in and the cookie was in browser, there were no authorization problems. And the result - all of the user photos were deleted on server, because of incorrect usage of http protocol and GET verb. That's why you should always follow the rules of protocol you are using. Although technically possible, it is not right to override defined rules.
Using GET to delete a resource would be like having a function named and documented to add something to an array that deletes something from the array under the hood. REST has only a few well defined methods (the HTTP verbs). Users of your service will expect that your service stick to these definition otherwise it's not a RESTful web service.
If you do so, you cannot claim that your interface is RESTful. The REST principle mandates that the specified verbs perform the actions that you have mentioned. If they don't, then it can't be called a RESTful interface.