Aggregate values in REST APIs - api

I am working on an application which requires some double entry bookkeeping. Currently there are two endpoints
/account
/transaction
While /account handles general data of the accounts, /transaction handles transactions for deposits/withdrawal. Account balance is calculated based on the related transactions. I kept them separated to get consistency in the bookkeeping when transferring value from one to another account.
My question is how to represent the balance of an account at the /account endpoint as it always will be calculated on request time. Should a response just contain the balance as a read-only field? This smells like bad API design since all fields but this one would be writeable/updateable.
The alternative coming to my mind would be to extend the endpoint to
/account/{id}/balance
returning only the balance of the regarding account. However, this would always require a second call to get the balance in addition to the remaining data of the account. Maybe the answer could generalized on how to represent aggregated values.

Very good question. I run into situations like this, often. I would say two things:
You probably have other "read-only" fields, like "id"
You may not want to incur the time it takes to calculate the current balance every time you get an account details.
I think I would opt for /account/{id}/balance ... but maybe name it /account/{id}/calculatebalance to indicate that it does take some time to run this methods. And, then it is obvious that the value is a calculated value. If you had "several" calculated values, then I would rethink my opinion.
2 cents.

You usually don't write aggregate properties, so it is natural that the property is read only. It is a sing that you are starting to have a service instead of a database with HTTP interface. Recalculating for each request is not necessary if you can cache it somewhere, though it depends on your needs. I see this is a very old question, idk. how I clicked on it.

Related

Is duduping same as idempotency

So it's clear for me what idempotence means based on this Defining Idempotence
But I also heard a lot people describe such behavior as deduping. Is that an equivalent terminology?
For example, if an API is idempotent that same request being processed for N times will get the state same as one time. Can I say that API is deduping requests?
The two terms are not equivalent, though it may be helpful for people unfamiliar with idempotency to think of it initially based on its similarity to deduplication.
For a contrasting example, consider an API for a bank account which accepts a positive or negative number by which to adjust the account balance (deposit or withdrawal). Clearly this API is not idempotent, because consecutive transactions have a cumulative effect.
On the other hand, we would certainly want to deduplicate these transactions. If transaction #123 is (erroneously) submitted twice, it should apply to the account balance only once. In this case, transactions should be deduplicated because the API is not idempotent.
Deduplication is an activity: an action to perform. Idempotency is an attribute: a property to describe. A similarity exists between the two when the result of deduplication is the same as the effect of idempotency; that is, no change in state. But an equivalent outcome does not make the two terms equivalent.

Optimizing GraphQL resolvers for SQL databases and in service-oriented architectures

My company has a service-oriented architecture. My app's GraphQL server therefore has to call out to other services to fullfill the data requests from the frontend.
Let's imagine my GraphQL schema defines the type User. The data for this type comes from two sources:
A user account service that exposes a REST endpoint for fetching a user's username, age, and friends.
A SQL database used just by my app to store User-related data that is only relevant to my app: favoriteFood, favoriteSport.
Let's assume that the user account service's endpoint automatically returns the username and age, but you have to pass the query parameter friends=true in order to retrieve the friends data because that is an expensive operation.
Given that background, the following query presents a couple optimization challenges in the getUser resolver:
query GetUser {
getUser {
username
favoriteFood
}
}
Challenge #1
When the getUser resolver makes the request to the user account service, how does it know whether or not it needs to ask for the friends data as well?
Challenge #2
When the resolver queries my app's database for additional user data, how does it know which fields to retrieve from the database?
The only solution I can find to both challenges is to inspect the query in the resolver via the fourth info argument that the resolver receives. This will allow it to find out whether friends should be requested in the REST call to the user account service, and it will be able to build the correct SELECT query to retrieve the needed data from my app's database.
Is this the correct approach? It seems like a use-case that GraphQL implementations must be running into all the time and therefore I'd expect to encounter a widely accepted solution. However, I haven't found many articles that address this, nor does a widely used NPM module appear to exist (graphql-parse-resolve-info is part of PostGraphile but only has ~12k weekly downloads, while graphql-fields has ~18.5k weekly downloads).
I'm therefore concerned that I'm missing something fundamental about how this should be done. Am I? Or is inspecting the info argument the correct way to solve these optimization challenges? In case it matters, I am using Apollo Server.
If you want to modify your resolver based on the requested selection set, there's really only one way to do that and that's to parse the AST of the requested query. In my experience, graphql-parse-resolve-info is the most complete solution for making that parsing less painful.
I imagine this isn't as common of an issue as you'd think because I imagine most folks fall into one of two groups:
Users of frameworks or libraries like Postgraphile, Hasaura, Prisma, Join Monster, etc. which take care of optimizations like these for you (at least on the database side).
Users who are not concerned about overfetching on the server-side and just request all columns regardless of the selection set.
In the latter case, fields that represent associations are given their own resolvers, so those subsequent calls to the database won't be fired unless they are actually requested. Data Loader is then used to help batch all these extra calls to the database. Ditto for fields that end up calling some other data source, like a REST API.
In this particular case, Data Loader would not be much help to you. The best approach is to have a single resolver for getUser that fetches the user details from the database and the REST endpoint. You can then, as you're already planning, adjust those calls (or skip them altogether) based on the requested fields. This can be cumbersome, but will work as expected.
The alternative to this approach is to simple fetch everything, but use caching to reduce the number of calls to your database and REST API. This way, you'll fetch the complete user each time, but you'll do so from memory unless the cache is invalidated or expires. This is more memory-intensive, and cache invalidation is always tricky, but it does simply your resolver logic significantly.

Best HTTP method for a stateless REST API service call

I want to make a REST API that does spellchecking on text that is passed in, without storing any of the text on the server.
The call would probably look something like `example.com/api/v1/spelling/mistakes', with optional query param for locale and an list of the mistakes as return value.
What would be the best HTTP method to use, given that the text passed in would be too large for a GET. Neither POST, PUT nor PATCH seem to reasonably map to the intended purpose and there don't seem to be any other suitable matches in the less commonly used methods either.
What is the best HTTP method to use for a "translation"-like REST API service, taking and returning large amounts of data?
I would say this is a POST. But it could have been a GET if the data was previously posted. The reason it is not a GET is because you are passing all the data in this API call, as you mentioned. For example, if the data was 'posted' somewhere else previously, then the GET can be used where the address (URI) of the location, or ID, of that 'posted' data is passed to the API as a param in the GET. But because we are both 'posting' the data and retrieving information about that in the same call, I would say then that this is a POST. Grant it the data being posted has a short life span, it is still being posted. If the data being posted was instead a customer order, then it would still be a POST but the data would be persisted somewhere. The difference here is the the short period of time that the data will exist for. And in future iterations of your API, you might actually want to keep that data and refer back to it with some ID. So by using POST you allow for future enhancements also.
By the way, as a precaution, be careful with the memory footprint of these calls. I can see this as being very memory intensive if the data being passed grows large and the API becomes very popular. Not a show stopped but something to consider when designing it.
Hope that helps alleviate what I call REST anxiety when designing an API.

REST API design aggregate

We are creating api on employee manage app. So there is schedule in interface, where we have to show all shifts in table, for every user per row. Farther, there is summary for every user (per row) and day (per column). Should we create one big aggregate call like:
GET /api/locations/{id}/shedule
which will return all employees, shifts, summaries etc. Or maybe should we smash that to several collections like:
GET /api/locations/{id}/shifts
GET /api/locations/{id}/events
GET /api/locations/{id}/summary
GET /api/employee/{id}/summary?date_from={date_from}&date_from={date_to}
For me, second option is more flexible and there is no reason to create new abstract resource, which is shedule. In my opinion it is clearly part of interface layer and should not affect on API design.
On the other hand the big aggregate is more optimal, becouse there will be less database calls and it's easy to cache.
How do you think? Is there any source, kind of article, which can I rely on?
There's nothing RESTful or unRESTful about either approach. The semantics of URIs is irrelevant to REST. What really matters is how the clients obtain the URIs. If they are looking up URI patterns in documentation and filling up values instead of using links, the API is not RESTful.
With that in mind, I'd say the approach that is more consistent with your business and your ecosystem of applications is the best one. Don't be afraid to create aggregate resources if you feel the need to.

eCommerce Third Party API Data Best Practice

What would be best practice for the following situation. I have an ecommerce store that pulls down inventory levels from a distributor. Should the site, for everytime a user loads a product detail page use the third party API for the most up to date data? Or, should the site using third party APIs and then store that data for a certain amount of time in it's own system and update it periodically?
To me it seems obvious that it should be updated everytime the product detail page is loaded but what about high traffic ecommerce stores? Are completely different solutions used for that case?
In this case I would definitely cache the results from the distributor's site for some period of time, rather than hitting them every time you get a request. However, I would not simply use a blanket 5 minute or 30 minute timeout for all cache entries. Instead, I would use some heuristics. If possible, for instance if your application is written in a language like Python, you could attach a simple script to every product which implements the timeout.
This way, if it is an item that is requested infrequently, or one that has a large amount in stock, you could cache for a longer time.
if product.popularityrating > 8 or product.lastqtyinstock < 20:
cache.expire(productnum)
distributor.checkstock(productnum)
This gives you flexibility that you can call on if you need it. Initially, you can set all the rules to something like:
cache.expireover("3m",productnum)
distributor.checkstock(productnum)
In actual fact, the script would probably not include the checkstock function call because that would be in the main app, but it is included here for context. If python seems too heavyweiaght to include just for this small amount of flexibilty, then have a look at TCL which was specifically designed for this type of job. Both can be embedded easily in C, C++, C# and Java applications.
Actually, there is another solution. Your distributor keeps the product catalog on their servers and gives you access to it via Open Catalog Interface. When a user wants to make an order he gets redirected in-place to the distributor's catalog, chooses items then transfers selection back to your shop.
It is widely used in SRM (Supplier Relationship Management) branch.
It depends on many factors: the traffic to your site, how often the inventory levels change, the business impact of displaing outdated data, how often the supplers allow you to call their API, their API's SLA in terms of availability and performance, and so on.
Once you have these answers, there are of course many possibilities here. For example, for a low-traffic site where getting the inventory right is important, you may want to call the 3rd-party API on every call, but revert to some alternative behavior (such as using cached data) if the API does not respond within a certain timeout.
Sometimes, well-designed APIs will include hints as to the validity period of the data. For example, some REST-over-HTTP APIs support various HTTP Cache control headers that can be used to specify a validity period, or to only retrieve data if it has changed since last request.