Cache data from external API in an orchestrated manner - api

I am building an application which uses the Amazon MWS API.
The API has limits for how frequently you can hit it.
I am looking for a tool that can act as a reverse-proxy, save the MWS API responses, and eventually masquerade as the MWS API without ever hitting it, returning only responses from the cache.
Some tools do this, but what I need is a bit more complicated.
Say I request a report from Amazon MWS:
I'll call RequestReport
I'll get a ReportRequestId back
I'll start server polling GetReportRequestList to find out what the current status of the report request is. The report request will go likely go through the statuses SUBMITTED then DONE, but it could also be set to ERROR or CANCELLED
When the report request status returned by GetReportRequestList is DONE, I can finally call GetReport and get the data.
The behavior from step 3 is what I'm trying to replicate.
This external API cache should be able to produce different results for the same request: the first response should yield SUBMITTED and then the second response should yield DONE.
I should be able to easily configure these flows as I wish, setting the responses I want for the 1st, 2nd, nth request.
I would like this tool to necessitate minimal configuration, I do not want to configure routes or anything, I want it to automatically cache everything and then return everything from the cache, never flushing it.
Also, I need this level of control over what's returned in a response, depending on the count of requests done up to that point.

Related

Test async data processing flows with Karate Labs

I'm looking for best practices or the recommended approach to test async code execution with Karate.
Our use cases are all pretty similar but a basic one is:
Client makes HTTP request to API
API accepts request and creates a messages which is added to a queue
API replies with ACCEPTED / 202
Worker picks up message from queue processes it and updates database
Eventually after the work is finished another endpoint delivers updated data
How can I check with Karate that after processing has finished other endpoints return the correct result?
Concrete real life example:
Client requests a processing intensive data export to API e.g. via HTTP POST /api/export
API create message with information for creating the export and puts it on AWS SQS queue
API replies with 202
Worker receives message and creates export, uploads result as ZIP to S3 and finally creates and entry in the database symbolizing this export
Client can now query list exports endpoint e.g. via HTTP GET /api/exports
API returns 200 with the list of exports incl. the newly created entry
Generally I have two ideas on how to approach this:
Use karate retry until on the endpoint that returns the list of exports
In the API response (step #3) return the message ID and use the HTTP API of SQS to poll that endpoint until the message has been processed and then query the list endpoint to check the result
Is either of those approach recommended or should I choose an entirely different solution?
The moment queuing comes into the picture, I would not recommend retry until. It would work if you are in a hurry, but if you are ok to write a little bit of Java code, please read on. Note that this Java "glue code" needs to be written only once, and then the team responsible for writing the functional flows will be up and running.
I personally would prefer option (2) just because when a test fails, you will have a lot more diagnostic information and traces to look at.
Pretty sure you won't have a problem using AWS Java libs to do things such as polling SQS.
I think this example will answer all your questions: https://twitter.com/getkarate/status/1417023536082812935

Can I send an API response before successful persistence of data?

I am currently developing a Microservice that is interacting with other microservices.
The problem now is that those interactions are really time-consuming. I already implemented concurrent calls via Uni and uses caching where useful. Now I still have some calls that still need some seconds in order to respond and now I thought of another thing, which I could do, in order to improve the performance:
Is it possible to send a response before the sucessfull persistence of data? I send requests to the other microservices where they have to persist the results of my methods. Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull?
With that, the front-end could already begin working even though my API is not 100% finished.
I saw that there is a possible status-code 207 but it's rather used with streams where someone wants to split large files. Is there another possibility? Thanks in advance.
"Is it possible to send a response before the sucessfull persistence of data? Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull? With that, the front-end could already begin working even though my API is not 100% finished."
You can and should, but it is a philosophy change in your API and possibly you have to consider some edge cases and techniques to deal with them.
In case of a long running API call, you can issue an "ack" response, a traditional 200 one, only the answer would just mean the operation is asynchronous and will complete in the future, something like { id:49584958, apicall:"create", status:"queued", result:true }
Then you can
poll your API with the returned ID to see if the operation that is still ongoing, has succeeded or failed.
have a SSE channel (realtime server side events) where your server can issue status messages as pending operations finish
maybe using persistent connections and keepalives, or flushing the response in the middle, you can achieve what you point out, ie. like a segmented response. I am not familiar with that approach as I normally go for the suggesions above.
But in any case, edge cases apply exactly the same: For example, what happens if then through your API a user issues calls dependent on the success of an ongoing or not even started previous command? like for example, get information about something still being persisted?
You will have to deal with these situations with mechanisms like:
Reject related operations until pending call is resolved "server side": Api could return ie. a BUSY error informing that operations are still ongoing when you want to, for example, delete something that still is being created.
Queue all operations so the server executes all them sequentially.
Allow some simulatenous operations if you find they will not collide (ie. create 2 unrelated items)

How REST API handle continuous data update

I have REST backend api, and front end will call api to get data.
I was wondering how REST api handles continuous data update, for example,
in jenkins, we will see that if we execute build job, we can see the continous log output on page until job finishes. How REST accomplish that?
Jenkins will just continue to send data. That's it. It simply carries on sending (at least that's what I'd presume it does). Normally the response contains a header field indicating how much data the response contains (Content-Length). But this field is not necessary. The server can omit it. In such a case the response body ends when the server closes the connection. See RFC 7230:
Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection.
Another possibility would be to use the chunked transfer encoding. Then the server sends a chunk of data having its own Content-Length header. The server terminates this by sending a zero-length last chunk.
Websocksts would be a third possibility.
I was searching for an answer myself and then the obvious solution struck me. In order to see what type of communication a service is using, you can simply view it from browser side using Developer Tools.
In Google Chrome it will be F12 -> Network.
In case of Jenkins, front-end is sending AJAX requests to backend for data:
every 5 seconds on Dashboards page
every second during Pipeline run (Console Output page), that you have mentioned.
I have also checked the approach in AWS. When checking the status of instances (example: Initializing... , Booting...), it queries the backend every second. It seems to be a standard interval for its services.
Additional note:
When running an AWS Remote Console though, it first sends requests for remote console instance status (backend answers with { status: "BOOTING" }, etc.). After backend returns status as "RUNNING", it starts a WebSocket session between your browser and AWS backend (you can notice it by applying WS filter in developer tools).
Then it is no longer REST API, but WebSockets, that is a different protocol (stateful).

How to write a middle-tier http API endpoint that can stream results as they arrive to the client?

The scenario is this - I have a frontend web-server that I'm writing in node.js. I have an as-yet-unwritten middle-tier internal-API layer written in, well, anything. The internal-API is the only thing allowed to talk to the data-store (which happens to be a relational database).
Disclaimer: I'm a node.js beginner.
node.js wants to do data-access asynchronously - that makes calls like Database.query.all inefficient, since the response callback wouldn't start until the whole list has been assembled. Documentation I've read suggests that instead, it'd be better to stream results one at a time to the client.
I would like to know how to write the frontend and middle-tier http internal-API such that I can take advantage of node.js' asynchronicity, here.
I guess the question is "how do I stream structured data over http"? I guess that's the feature of the internal API that I'm asking for support for.
Should I:
Get the frontend to ask for a list of IDs, then issue one request each to the backend? Sounds crude and chatty, plus I don't see a guarantee that the requests will return in the order that I want, so I'd have to wait 'til I had everything back at the frontend anyway..?
Get the frontend to make a series of requests against the internal API for pages of data, and treat each chunk as a stream-segment...?
Fetch only enough data for the first screen's worth, then request for subsequent chunks, writing each one to the end of the list as it arrives?
something cleverer!?
(Note: please don't say "get rid of the middle-tier so you can talk to the database directly" - that's not an option)
I am not sure what exactly you mean by "streaming"; from the ideas you give, it could be either interpreted as some HTTP server push or long polling technique, or simply making subsequent XHR requests.
Since you're using node, I recommend Socket.io, which allows you to really push data to the browser whenever you want.
If you chose to go with XHRs, simply tell the browser what to request next.
If that doesn't fit you, and you want to use server push or long polling, response.write() seems the way to go. But you will probably run into problems with request timeouts and such.

Is a status method necessary for an API?

I am building an API and I was wondering is it worth having a method in an API that returns the status of the API whether its alive or not?
Or is this pointless, and its the API users job to be able to just make a call to the method that they need and if it doesn't return anything due to network issues they handle it as needed?
I think it's quite useful to have a status returned. On the one hand, you can provide more statuses than 'alive' or not and make your API more poweful, and on the other hand, it's more useful for the user, since you can tell him exactly what's going on (e.g. 'maintainance').
But if your WebService isn't available at all due to network issues, then, of course, it's up to the user to catch that exception. But that's not the point, I guess, and it's not something you could control with your API.
It's useless.
The information it returns is completely out of date the moment it is returned to you because the service may fail right after the status return call is dispatched.
Also, if you are load balancing the incoming requests and your status request gets routed to a failing node, the reply (or lack thereof) would look to the client like a problem with the whole API service. In the meantime, all the other nodes could be happily servicing requests. Now your client will think that the whole API service is down but subsequent requests would work just fine (assuming your load balancer would remove the failed node or restart it).
HTTP status codes returned from your application's requests are the correct way of indicating availability. Your clients of course have to be coded to tolerate and handle them.
What is wrong with standard HTTP response status codes? 503 Service Unavailable comes to mind. HTTP clients should already be able to handle that without writing any code special to your API.
Now, if the service is likely to be unavailable frequently and it is expensive for the client to discover that but cheap for the server, then it might be appropriate to have a separate 'health check' URL that can quickly let the client know that the service is available (at the time of the GET on the health check URL).
It is not necessary most of the time. At least when it returns simple true or false. It just makes client code more complicated because it has to call one more method. Even if your client received active=true from service, next useful call may still fail. Let you client make the calls that they need during normal execution and have them handle network, timeout and HTTP errors correctly. Very useful pattern for such cases is called Circuit Breaker.
The reasons where status check may be useful:
If all the normal calls are considered to be expensive there may be an advantage in first calling lightweight status-check method (just to avoid expensive call).
Service can have different statuses and client can change its behavior depending on these statuses.
It might also be worth looking into stateful protocols like XMPP.