Proper way to cache data from API call with nodejs - api

I am using node.js to write a web service, it calls an API for some data but I am limited by the API to a number of calls per month, so I wish to cache the data I retrieve from the API so I can serve it up with the cached data, and re-fetch the data from the API at a timed interval.
Is this a good approach for this problem? And what caching framework should I use? I looked at node-redis but I don't think a key value store is appropriate for the data.
Thanks!

I would disagree with you regarding Redis. Redis is a very powerful key-value store that can easily be used for what you want. It is designed to have stuff dumped in it and taken out again. In your situation, you can easy cache the API response by saving it into Redis with the query as the key (if this is a REST API you're calling, you could just use the URL or serialized data as the key) and simply cache the response as a stringified JSON object (or XML string if you happen to be getting that).
You can also set an expiry on the cached data, and it will be cleared when the time is expired.
You could then wrap your API call in a helper function which checks the cache, and returns the value if it's present. If it's not it makes the API request, adds it to the cache, then returns it.
This is probably the most straightforward solution and seems to cover your use case pretty well.

Related

GET vs POST API calls and cache issues

I know that GET is used to retrieve data from the server without modifying anything. Whereas POST is used to add data. I won't get into PUT/PATCH, and assume that POST is always used to update and replace data.
The theory is nice, but in practice I have encountered many situations where my GET calls need to be replaced with POST calls. This is because the response often gets incorrectly cached. Where I work there are proxy servers for security, caching, load balancing, etc., and often times the response for GET calls is directly cached to speed up the call, whereas POST calls never get fully cached.
So for my question, if I have an API call /api/get_orders/month. Theoretically, this should be a GET call, however, the number of orders might update any second. So if I call this API at any moment it may return for example 1000, and calling it just two seconds later should return 1001. However, because of the cache, and although adding a version flag such as ?v=<date_as_int> should ensure that the updated value is returned, there seems to be some caches in the proxy servers that might ignore this.
Basically, I don't feel safe enough using GET unless I know for certain that the data will not be modified or if I know for a fact that the response is always the updated data.
So, would you recommend using POST/GET in the case of retrieving daily/monthly number of orders. And if GET, with all the different and complex layers and server set-ups, how can one be certain that the data is always updated?
If you're doing multiple GET request and something is caching the data in between, you have no idea what it is or how to change it's behavior then POST is a valid workaround.
In any normal situation you would take the time what sits in between your browser and your server, and if there's something that's behaving in a way that doesn't make sense, I would try to investigate and fix that.
So you work at a place where some of that infrastructure exists. Maybe talk to the people that maintain it? But if that's not an option and you just want to find the 'ignore every convention and make my request work'-workaround, then you can use POST.

Fetch data after an update

VueJS + Quasar + Pinia + Axios
Single page application
I have an entity called user with 4 endpoints associated:
GET /users
POST /user
PUT /user/{id}
DELETE /user/{id}
When I load my page I call the GET and I save the response slice of users inside a store (userStore)
Post and Put returns the created/updated user in the body of the response
Is it a good practice to manually update the slice of users in the store after calling one of these endpoints, or is better to call the GET immediatly after ?
If you own the API or can be sure about the behavior of what PUT/POST methods return, you can use local state manipulation. Those endpoints should return the same value as what the GET endpoint returns inside. Otherwise, you might end up with incomplete or wrong data on the local state.
By mutating the state locally without making an extra GET request, the user can immediately see the change in the browser. It will also be kinder to your server and the user's data usage.
However, if creating the resource(user, in this case) was a really common operation accessible by lots of users, then calling the GET endpoint to return a slice would be better since it would have more chance to include the new ones that are created by other users. But, in that case, listening to real-time events(i.e. using WebSockets) would be even better to ensure everyone gets accurate and new data in real-time.

RESTDataSource - How to know if response comes from get request or cache

I need to get some data from a REST API in my GraphQL API. For that I'm extending RESTDataSource from apollo-datasource-rest.
From what I understood, RESTDataSource caches automatically requests but I'd like to verify if it is indeed cached. Is there a way to know if my request is getting its data from the cache or if it's hitting the REST API?
I noticed that the first request takes some time, but the following ones are way faster and also, the didReceiveResponse method is not called everytime I make a query. Is it because the data is loaded from the cache?
I'm using apollo-server-express.
Thanks for your help!
You can time the requests like following:
console.time('restdatasource get req')
this.get(url)
console.timeEnd('restdatasource get req')
Now, if the time is under 100-150 milliseconds, that should be a request coming from the cache.
You can monitor the console, under the network tab. You will be able to see what endpoints the application is calling. If it uses cached data, there will be no new request to your endpoint logged
If you are trying to verify this locally, one good option is to setup a local proxy so that you can see all the network calls being made. (no network call meaning the call was read from cache) Then you can simply configure your app using this apollo documentation to forward all outgoing calls through a proxy like mitmproxy.

REST bulk endpoint for fetching data

We(producer for the API) have an endpoint
/users/:{id}/name
which is used to retrieve name for the user 'id'
Now as a consumer I want to display the list of names for users like:
user1: id1, name1
uder2: id2, name2
where I have the ids in input.
In such a case should I make 2(here the list is dependent on UI pagination example 50) separate calls to the API to fetch information or else create/ask the producer to create a bulk endpoint like:
POST /users/name
body: { ids: []}
If later, then am I not loosing the REST principle here to fetch information using POST but not GET? If former, then I am not putting too much network overhead in the system?
Also since this seems to be a very common usecase, if we choose the POST method is there really a need of the GET endpoint since the POST endpoint can handle a single user as well?
Which approach should be chosen?
A GET API call should be used for fetching data. Since browser knows GET calls are idempotent, it can handle some situations on its own, like make another call if previous fails.
Since REST calls are lightweight, we tend to overuse API call repeatedly. In your case, since you want all name v/s id mapping at once, one call is sufficient. Or have a Aggregator endpoint in backend API gateway to reduce network traffic and make repeated calls nearer to actual service.
Keeping GET /users/:{id}/name , is also not a bad idea alongside this. It helps to abstract business functionality. A particular scenario can only allow single fetch.
Also using GET /users/name with pagination and returning list of names is complex for single use so keeping both are fine.

How do multiple versions of a REST API share the same data model?

There is a ton of documentation on academic theory and best practices on how to manage versioning for RESTful Web Services, however I have not seen much discussion on how multiple REST APIs interact with data.
I'd like to see various architectural strategies or documentation on how to handle hosting multiple versions of your app that rely on the same data pool.
For instance, suppose you make a database level destructive change to a database table that causes you to have to increment your major API version to v2.
Now at any given time, users could be interacting with the v1 web service and the v2 web service at the same time and creating data that is visible and editable by both services. How should this be handled?
Most of changes introduced to API affect the content of the response, till changes introduced are incremental this is not a very big problem (note: you should never expose the exact DB model directly to the clients).
When you make a destructive/significant change to DB model and new API version of API is introduced, there are two options:
Turn the previous version off, filter out all queries to reply with 301 and new location.
If 1. is impossible to need to maintain both previous and current version of the API. Since this might time and money consuming it should be done only for some time and finally previous version should be turned off.
What with DB model? When two versions of API are active at the same time I'd try to keep the DB model as consistent as possible - having in mind that running two versions at the same time is just temporary. But as I wrote earlier, DB model should never be exposed directly to the clients - this may help you to avoid a lot of problems.
I have given this a little thought...
One solution may be this:
Just because the v1 API should not change, it doesn't mean the underlying implementation cannot change. You can modify the v1 implementation code to set a default value, omit the saving of a field, return an unchecked exception, or do some kind of computational logic that helps the v1 API to be compatible with the shared datasource. Then, implement a better, cleaner, more idealistic implementation in v2.
when you are going to change any thing in your API structure that can change the response, you most increase you'r API Version.
for example you have this request and response:
request post: a, b, c, d
res: {a,b,c+d}
and your are going to add 'e' in your response fetched from database.
if you don't have any change based on 'e' in current client versions, you can add it on your current API version.
but if you'r new changes are going to change last responses, for example:
res: {a+e, b, c+d}
you most increase API number to prevent crashing.
changing in the request input's are the same.