PUT or PATCH, which is faster and why? - api

As per information available on web, PATCH is faster than PUT in REST API. But, if we're not checking before updating anything then PUT should be faster.
Definations:
The PATCH method is the correct choice if you're updating an existing resource.
PUT should only be used if you're replacing a resource in its entirety.
Specifically, the PUT method is described as follows in RFC 5789:
Several applications extending the Hypertext Transfer Protocol (HTTP)
require a feature to do partial resource modification. The existing
HTTP PUT method only allows a complete replacement of a document. This
proposal adds a new HTTP method, PATCH, to modify an existing HTTP
resource.

which is faster and why?
I don't think either of them is necessarily faster - PUT (defined by RFC 7231) and PATCH (defined by RFC 5789) are constraints on message semantics, not on implementation performance.
In cases where the representation of the resource is much larger than the HTTP headers, and the representation of the patch document is much smaller than the representation of the resource, the network latency might be a lot better than the additional time required to execute the patching.
Which is to say, I think you have to measure the latency profiles in your context, rather than expecting some universal ordering to apply.
Do you think that say 12 fields without foreign keys (I'm not sure if foreign keys even matter) constitute as a small object so that it'd make more sense to use PUT or would PATCH be fine (and what if 5 of those fields are foreign keys -- in case it matters)?
Foreign keys really shouldn't enter into the discussion at all; they are an implementation detail of the server; the client should be completely unaware of them.

Related

Length extension attack doubts

So I've been studying this concept of length extension attacks and there are few things that I noticed during my study about it which are not very bright to me.
1.Research papers are explaining how you can append some type of data to the end and make newly formed data. For example
Desired New Data: count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo&waffle=liege
(notice 2 waffles). My question is if a parser function on the server side can track duplicate attributes, could then the entire length extension attack be nonsense? Because the server would notice duplicate attributes. Is a proper parser that is made to check any duplicates a good solution versus length extension attacks? I'm aware of HMAC approach and other protections, but specifically talking just about parsers here now.
2.Research says that only vulnerable data is H(key|message). They claim that H(message|key) won't work for the attacker because we would have to append a new key (which we obviously don't know). My question is why would we have to append a new key? We don't do it when we are attacking H(key|message). Why can't we rely on the fact that we will pass the verification test (we would create the correct hash) and that if the parser tries to extract the key from it, that it would take the only key in the block we send out and resume from there? Why would we have to send 2 keys? Why doesn't attack against H(message|key) work?
My question is if a parser function on server side can track duplicate attributes, could then the entire length extension attack be a nonsense?
You are talking about a well-written parser. Writing software is hard and writing correct software is very hard.
In that example, you have seen an overwritten attribute. Are you able to say that a good parser must take the last one or the first one? What is the rule? There can be stations that the last one must be taken! That is an attack that can be applied or not. This depends on the station. If you consider that the knowledge of the length extension attack goes back to 1990s, then finding a place applicable to this should amaze someone!. And, it is applied in the wild to Flickr API in 2009, after almost 20 years;
Flickr's API Signature Forgery by Thai Duong and Juliano Rizzo Published on Sep. 28, 2009.
My question is why would we have to append new key? We don't do it when we are attacking H(key|message). Why can't we relay on the fact that we will pass verification test (we would create correct hash) and that if parser tries to extract key from it, that it would take the only key in the block we send out and resume from there. Why would we have to send 2 keys? Why doesnt attack against H(message|key) work?
The attack is a signature forgery. The key is not known to the attacker, but they can still forge new signatures. The new message and signature - extended hash - is sent to the server, then the server takes the key and appends it to the message to execute a canonical verification, that is; if it does the signature is valid.
The parser doesn't extract the key, it already knows the key. The point is that can you make sure that the data is really extended or not. The padding rule is simple, add 1 and fill many zeroes so that the last 64 (128) is the length encoding (very simplified, for example, the final length must be multiple of 512 for SHA256). To see that there is another padding inside you must check every block and then you may claim that there is an extension attack. Yes, you can do this, however, the one of aims of cryptography is to reduce the dependencies, too. If we can create a better signature that eliminates the checking then we suggest to left the others. This enables the software developers to write more secure implementation easily.
Why doesn't attack against H(message|key) work?
Simple, you get the extended message message|extended and send the extended hash
H(message|key|extended) to the server. Then the server takes the message message|extended and appends the key message|extended|key and hashes it H(message|extended|key) and clearly this is not equal to the extended one H(message|key|extended)
Note that the trimmed version of the SHA2 series like SHA-512/256 has resistance to length extension attacks. SHA3 is immune to it by design and that enables a simple KMAC signature scheme. Blake2 is also immune since it is designed with the HAIFA construction.

Is it okay to have more than one repository for an aggregate in DDD?

I've read this question about something similar but it didn't quite solve my problem.
I have an application where I'm required to use data from an API. Problem is there are performance and technical limitations to doing this. The performance limitations are obvious. The technical limitations lie in the fact that the API does not support some of the more granular queries I need to make.
I decided to use MySQL as a queryable cache.
Since the data I needed to retrieve from the API did not change very often, I settled on refreshing the cache once a day, so I didn't need any complicated mapper that checked if we had the data in the cache and if not fell back to the API. That was my first design, but I realized that wasn't very practical when the API couldn't support most of the queries I needed to make anyway.
Now I have a set of two mappers for every aggregate. One for MySQL and one for the API.
My problem is now how I hide the complexities of persistence from the domain, and the fact that it seems that I need multiple repositories.
Ideally I would have an interface that both mappers adhered to, but as previously disclosed that's not possible.
Is it okay to have multiple repositories, one for each mapper?
Is it okay to have more than one repository for an aggregate in DDD?
Short answer: yes.
Longer answer: you won't find any suggestion of multiple repository in the original book by Evans. As he described things, the domain model would have one representation of the aggregate, and the repository abstraction provided consumers with the illusion that the aggregate was stored in an in-memory collection.
Largely, this makes sense -- you are trying to ensure that writes to data within the aggregate boundary are consistent, so you need a single authority for change.
But... there's no particular reason that reads need to travel through the same code path as writes. Welcome to the world of cqrs. What that gives you immediately is the idea that the in memory representation for reads might need to be optimized differently from the in memory representation used for writes.
In its more general form, you get the idea that the concept that you are modeling might have different representations for each use case.
For your case, where it is sometimes appropriate to read from the RDBMS, sometimes from the API, sometimes both, this isn't quite an exact match -- the repository interface hides the implementation details from the consumer, but you still have to bother with the implementation.
One thing you might look at is your requirements; how fresh does the data need to be in each use case? A constraint that is often relaxed in the CQRS pattern is the idea that the effects of writes are immediately available for reading. The important question to ask would be, if the data hasn't been cached yet, can you simply report "data not available" without hitting the API?
If so, then use cases that access the cached data need only a single repository implementation.
If you are using external API to read and modify data, you can cache them locally to be faster in reads, but I would avoid to have a domain repository.
From the domain perspective it seems that you need a service to query (or just a Query in CQRS implementation) for some data, that you can do with a service, that internally can call some remote API or read from a local cache (mysql, whatever).
When you read your local cache you can develop a repository to decouple your logic from the db implementation, but this is a different concept from a domain repository, it is just a detail of your technical implementation, that has nothing to do with your domain.
If the remote service start offering the query you need you will change the implementation of how your query is executed, calling the remote API instead of the db, but your domain model should not change.
A domain repository is used to load and persist your aggregates, meanwhile if you are working with external aggregates (in a different context, subdomain) you need to interact with them using services.

API Design - Ordering return data in an array

I currently have an API endpoint which returns an array of objects each containing 4 variables in JSON format.
The size of the data ranges from 500kb to 5mb - depending on the number of records. In order to reduce the size of the return, we're considering removing the labels from the objects, and returning an array of arrays.
E.g.
[{propertyone:123,propertytwo:456,propertythree:789,propertyfour:012},etc,etc,etc]
would become
[[123,456,789,012],etc,etc,etc]
We would then document that array position 0 is to considered propertyone, etc. There may be a point in the future when this API becomes public. Is it considered better practice to leave the names in, or would serving a documented API with an enforced order suffice.
Well documented APIs are often the best APIs. Having external documentation for your API is important, but the data you return needs to be documented as well, and nothing beats self-documenting data.
I think by removing the property names, you're going to lose this self-documenting entirely and you could end up limiting yourself in the future as well. For example, consider only wanting to return partial data, how would you represent it without those property names?
In addition to the documentation, if you're returning JSON then depending on the consumer's environment, providing those properties could give them a natural model to work with. Consider JavaScript as an example, if they simply parse your response into JavaScript objects they essentially have a client-side model implemented for free and that will make it a joy to work with your API.
Nonetheless, speed is important and I would suggest looking into different measures for reducing the size of the return. Often a common solution to this is providing paged results, requiring a client to perform additional requests if they desire more data. Also, depending on the shape of your data, you could benefit greatly by using something as simple as GZIP compression on your responses.
While designing an API, as long as you document it and it works as described, normally anything will suffice. However, the API that empowers the consumer is the one that succeeds.

How to handle complex availability of information in OOP from a RESTful API

My issue is that I'm dealing with a RESTful API that returns information about objects, and when writing classes to represent them, I'm not sure how best to handle all the possibilities of the status of each variable's availability. From what I can tell, there are 5 possibilities: The information
is available
has not been requested
is currently being requested (asynchronously)
is unavailable
is not applicable
So with these, having an object represent its data with a value or null doesn't cut it. To give a more concrete example, I'm working with an API about the United States Congress, so the problem goes as thus:
I request information about a bill, and it contains a stub about the sponsoring legislator.
I eventually need to request all the information about that legislator. Not all the legislators will have all the information. Those in the House of Representatives won't have a senate class (Senators' six-year terms are staggered so a third expire every two years, the House is entirely re-elected every two years). Some won't have a twitter id, just because they don't have one. And, of course, if I have already requested information, I shouldn't try to request it again.
There's a couple options I see:
I can create a Legislator object and fill it with what information I have, but then I have to have some mechanism of tracking information availability with the getters and setters. This is kind of what I'm doing right now, but it requires a lot of repeated code.
I could create a separate class for abbreviated objects and replace them when I get more with immutable "complete" objects, but then I have to be really careful about replacing all references to them and also go through a bunch of hoops for unavailable, and especially, not applicable information.
So, I'm just wondering what other people's take on this issue is. Are there other (better?) ways of handling this complexity? What are the advantages and drawbacks of different approaches? What should I consider about what I'm trying to do in choosing an approach?
[Note: I'm working in Objective-C, but this isn't necessarily specific to that language.]
If you want to treat those remote resources as objects on the client side, the do yourself a huge favour and forget about the REST buzzword. You will drive yourself crazy. Just accept that you are doing HTTP RPC and move on as you would doing any other RPC project.
However, if you really want to do REST, you need to understand what is meant by the "State Transfer" part of the REST acronym and you need to read about HATEOAS. It is a huge mental shift for building clients, but it does have a bunch of benefits. But maybe you don't need those particular benefits.
What I do know, is if you are trying using a "REST API" to retrieve objects over the wire, you are going to come to the conclusion that REST is a load of crap.
It's an interesting question, but I think you're probably overthinking this a bit.
Firstly, I think you're considering the possible states of information a bit too much; consider the more basic consideration that you either have the information or you don't. WHY you have the information doesn't really matter, except in one case. Let me explain; if the information about a certain bill or legislator or anything is not applicable, you shouldn't be requesting it / needing it. That "state" is irrelevant. Similarly, if the information is in the process of being requested, then it is simply not yet available; the only state you really care about is whether you have the information or if you do not yet have the information.
If you start worrying about further depths of the request process, you risk getting into a deep, endless cycle of managing state; has the information changed between when I got it and now? All you can know about the information is if you've been told what it is. This is fundamental to the REST process; you're getting REPRESENTATION of the underlying data, but there's no mistake about it; the representation is NOT the underlying data, any more than a congressman's name is the congressman himself.
Second, don't worry about information availability. If an object has a subobject, when you query the object, query for the subobject. If you get back data, great. If you get back that the data isn't available, that too is a representation of the subobject's data; it's just a different representation than you were hoping for, but it's equally valid. I'd represent that as an object with a null value; the object exists (was instantiated because it belonged to the parent), but you have no valid data about it (the representation returned was empty due to some reason; lack of availability, server down, data changed; whatever).
Finally, the real key here is that you need to be remembering that a RESTful structure is driven by hypermedia; a request to an object that does not return the full object's data should return an URI for requesting the subobject's data; and so forth. The key here is that those structures aren't static, like your object structure seems to be hoping to treat them; they're dynamic, and it's up to the server to determine the representation (i.e., the interrelationship). Attempting to define that in stone with a concrete object representation ahead of time means that you're dealing with the system in a way that REST was never meant to be dealt with.

REST API Best practices: Where to put parameters? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 7 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
A REST API can have parameters in at least two ways:
As part of the URL-path (i.e. /api/resource/parametervalue )
As a query argument (i.e. /api/resource?parameter=value )
What is the best practice here? Are there any general guidelines when to use 1 and when to use 2?
Real world example: Twitter uses query parameters for specifying intervals. (http://api.twitter.com/1/statuses/home_timeline.json?since_id=12345&max_id=54321)
Would it be considered better design to put these parameters in the URL path?
If there are documented best practices, I have not found them yet. However, here are a few guidelines I use when determining where to put parameters in an url:
Optional parameters tend to be easier to put in the query string.
If you want to return a 404 error when the parameter value does not correspond to an existing resource then I would tend towards a path segment parameter. e.g. /customer/232 where 232 is not a valid customer id.
If however you want to return an empty list then when the parameter is not found then I suggest using query string parameters. e.g. /contacts?name=dave
If a parameter affects an entire subtree of your URI space then use a path segment. e.g. a language parameter /en/document/foo.txt versus /document/foo.txt?language=en
I prefer unique identifiers to be in a path segment rather than a query parameter.
The official rules for URIs are found in this RFC spec here. There is also another very useful RFC spec here that defines rules for parameterizing URIs.
Late answer but I'll add some additional insight to what has been shared, namely that there are several types of "parameters" to a request, and you should take this into account.
Locators - E.g. resource identifiers such as IDs or action/view
Filters - E.g. parameters that provide a search for, sorting or narrow down the set of results.
State - E.g. session identification, api keys, whatevs.
Content - E.g. data to be stored.
Now let's look at the different places where these parameters could go.
Request headers & cookies
URL query string ("GET" vars)
URL paths
Body query string/multipart ("POST" vars)
Generally you want State to be set in headers or cookies, depending on what type of state information it is. I think we can all agree on this. Use custom http headers (X-My-Header) if you need to.
Similarly, Content only has one place to belong, which is in the request body, either as query strings or as http multipart and/or JSON content. This is consistent with what you receive from the server when it sends you content. So you shouldn't be rude and do it differently.
Locators such as "id=5" or "action=refresh" or "page=2" would make sense to have as a URL path, such as mysite.com/article/5/page=2 where partly you know what each part is supposed to mean (the basics such as article and 5 obviously mean get me the data of type article with id 5) and additional parameters are specified as part of the URI. They can be in the form of page=2, or page/2 if you know that after a certain point in the URI the "folders" are paired key-values.
Filters always go in the query string, because while they are a part of finding the right data, they are only there to return a subset or modification of what the Locators return alone. The search in mysite.com/article/?query=Obama (subset) is a filter, and so is /article/5?order=backwards (modification). Think about what it does, not just what it's called!
If "view" determines output format, then it is a filter (mysite.com/article/5?view=pdf) because it returns a modification of the found resource rather than homing in on which resource we want. If it instead decides which specific part of the article we get to see (mysite.com/article/5/view=summary) then it is a locator.
Remember, narrowing down a set of resources is filtering. Locating something specific within a resource is locating... duh. Subset filtering may return any number of results (even 0). Locating will always find that specific instance of something (if it exists). Modification filtering will return the same data as the locator, except modified (if such a modification is allowed).
Hope this helped give people some eureka moments if they've been lost about where to put stuff!
It depends on a design. There are no rules for URIs at REST over HTTP (main thing is that they are unique). Often it comes to the matter of taste and intuition...
I take following approach:
url path-element: The resource and its path-element forms a directory traversal and a subresource (e.g. /items/{id} , /users/items). When unsure ask your colleagues, if they think that traversal and they think in "another directory" most likely path-element is the right choice
url parameter: when there is no traversal really (search resources with multiple query parameters are a very nice example for that)
IMO the parameters should be better as query arguments. The url is used to identify the resource, while the added query parameters to specify which part of the resource you want, any state the resource should have, etc.
As per the REST Implementation,
1) Path variables are used for the direct action on the resources, like a contact or a song
ex..
GET etc /api/resource/{songid} or
GET etc /api/resource/{contactid} will return respective data.
2) Query perms/argument are used for the in-direct resources like metadata of a song
ex..,
GET /api/resource/{songid}?metadata=genres it will return the genres data for that particular song.
"Pack" and POST your data against the "context" that universe-resource-locator provides, which means #1 for the sake of the locator.
Mind the limitations with #2. I prefer POSTs to #1.
note: limitations are discussed for
POST in Is there a max size for POST parameter content?
GET in Is there a limit to the length of a GET request? and Max size of URL parameters in _GET
p.s. these limits are based on the client capabilities (browser) and server(configuration).
According to the URI standard the path is for hierarchical parameters and the query is for non-hierarchical parameters. Ofc. it can be very subjective what is hierarchical for you.
In situations where multiple URIs are assigned to the same resource I like to put the parameters - necessary for identification - into the path and the parameters - necessary to build the representation - into the query. (For me this way it is easier to route.)
For example:
/users/123 and /users/123?fields="name, age"
/users and /users?name="John"&age=30
For map reduce I like to use the following approaches:
/users?name="John"&age=30
/users/name:John/age:30
So it is really up to you (and your server side router) how you construct your URIs.
note: Just to mention these parameters are query parameters. So what you are really doing is defining a simple query language. By complex queries (which contain operators like and, or, greater than, etc.) I suggest you to use an already existing query language. The capabilities of URI templates are very limited...
As a programmer often on the client-end, I prefer the query argument. Also, for me, it separates the URL path from the parameters, adds to clarity, and offers more extensibility. It also allows me to have separate logic between the URL/URI building and the parameter builder.
I do like what manuel aldana said about the other option if there's some sort of tree involved. I can see user-specific parts being treed off like that.
There are no hard and fast rules, but the rule of thumb from a purely conceptual standpoint that I like to use can briefly be summed up like this: a URI path (by definition) represents a resource and query parameters are essentially modifiers on that resource. So far that likely doesn't help... With a REST API you have the major methods of acting upon a single resource using GET, PUT, and DELETE . Therefore whether something should be represented in the path or as a parameter can be reduced to whether those methods make sense for the representation in question. Would you reasonably PUT something at that path and would it be semantically sound to do so? You could of course PUT something just about anywhere and bend the back-end to handle it, but you should be PUTing what amounts to a representation of the actual resource and not some needlessly contextualized version of it. For collections the same can be done with POST. If you wanted to add to a particular collection what would be a URL that makes sense to POST to.
This still leaves some gray areas as some paths could point to what amount to children of parent resources which is somewhat discretionary and dependent on their use. The one hard line that this draws is that any type of transitive representation should be done using a query parameter, since it would not have an underlying resource.
In response to the real world example given in the original question (Twitter's API), the parameters represent a transitive query that filters on the state of the resources (rather than a hierarchy). In that particular example it would be entirely unreasonable to add to the collection represented by those constraints, and further that query would not be able to be represented as a path that would make any sense in the terms of an object graph.
The adoption of this type of resource oriented perspective can easily map directly to the object graph of your domain model and drive the logic of your API to the point where everything works very cleanly and in a fairly self-documenting way once it snaps into clarity. The concept can also be made clearer by stepping away from systems that use traditional URL routing mapped on to a normally ill-fitting data model (i.e. an RDBMS). Apache Sling would certainly be a good place to start. The concept of object traversal dispatch in a system like Zope also provides a clearer analog.
Here is my opinion.
Query params are used as meta data to a request. They act as filter or modifier to an existing resource call.
Example:
/calendar/2014-08-08/events
should give calendar events for that day.
If you want events for a specific category
/calendar/2014-08-08/events?category=appointments
or if you need events of longer than 30 mins
/calendar/2014-08-08/events?duration=30
A litmus test would be to check if the request can still be served without an query params.
I generally tend towards #2, As a query argument (i.e. /api/resource?parameter=value ).
A third option is to actually post the parameter=value in the body.
This is because it works better for multi parameter resources and is more extendable for future use.
No matter which one you pick, make sure you only pick one, don't mix and match. That leads towards a confusing API.
One "dimension" of this topic has been left out yet it's very important: there are times when the "best practices" have to come into terms with the plaform we are implementing or augmenting with REST capabilities.
Practical example:
Many web applications nowadays implement the MVC (Model, View, Controller) architecture. They assume a certain standard path is provided, even more so when those web applications come with an "Enable SEO URLs" option.
Just to mention a fairly famous web application: an OpenCart e-commerce shop.
When the admin enables the "SEO URLs" it expects said URLs to come in a quite standard MVC format like:
http://www.domain.tld/special-offers/list-all?limit=25
Where
special-offers is the MVC controller that shall process the URL (showing the special-offers page)
list-all is the controller's action or function name to call. (*)
limit=25 is an option, stating that 25 items will be shown per page.
(*) list-all is a fictious function name I used for clarity. In reality, OpenCart and most MVC frameworks have a default, implied (and usually omitted in the URL) index function that gets called when the user wants a default action to be performed. So the real world URL would be:
http://www.domain.tld/special-offers?limit=25
With a now fairly standard application or frameworkd structure similar to the above, you'll often get a web server that is optimized for it, that rewrites URLs for it (the true "non SEOed URL" would be: http://www.domain.tld/index.php?route=special-offers/list-all&limit=25).
Therefore you, as developer, are faced into dealing with the existing infrastructure and adapt your "best practices", unless you are the system admin, know exactly how to tweak an Apache / NGinx rewrite configuration (the latter can be nasty!) and so on.
So, your REST API would often be much better following the referring web application's standards, both for consistency with it and ease / speed (and thus budget saving).
To get back to the practical example above, a consistent REST API would be something with URLs like:
http://www.domain.tld/api/special-offers-list?from=15&limit=25
or (non SEO URLs)
http://www.domain.tld/index.php?route=api/special-offers-list?from=15&limit=25
with a mix of "paths formed" arguments and "query formed" arguments.
I see a lot of REST APIs that don't handle parameters well. One example that comes up often is when the URI includes personally identifiable information.
http://software.danielwatrous.com/design-principles-for-rest-apis/
I think a corollary question is when a parameter shouldn't be a parameter at all, but should instead be moved to the HEADER or BODY of the request.
It's a very interesting question.
You can use both of them, there's not any strict rule about this subject, but using URI path variables has some advantages:
Cache:
Most of the web cache services on the internet don't cache GET request when they contains query parameters.
They do that because there are a lot of RPC systems using GET requests to change data in the server (fail!! Get must be a safe method)
But if you use path variables, all of this services can cache your GET requests.
Hierarchy:
The path variables can represent hierarchy:
/City/Street/Place
It gives the user more information about the structure of the data.
But if your data doesn't have any hierarchy relation you can still use Path variables, using comma or semi-colon:
/City/longitude,latitude
As a rule, use comma when the ordering of the parameters matter, use semi-colon when the ordering doesn't matter:
/IconGenerator/red;blue;green
Apart of those reasons, there are some cases when it's very common to use query string variables:
When you need the browser to automatically put HTML form variables into the URI
When you are dealing with algorithm. For example the google engine use query strings:
http:// www.google.com/search?q=rest
To sum up, there's not any strong reason to use one of this methods but whenever you can, use URI variables.