Upload to s3 which trigger lambda, wait for lambda to respond - amazon-s3

So I have an architecture in which the user uploads there file onto the front end, which then sends it to a s3 bucket, which in turn triggers a lambda for validation and processing, which sends the response to the front end of successful upload or validation error.
I don't understand if there is way to implement this in JavaScript (or any other similar language).
In the normal scenario, the front end uploads to server 1, and waits for it's response. The server 1 then tells the front end whether it was a success or a failure, and that is what the front end tells the user.
But in this case, the upload is done to s3 (which is incapable of taking responses from lambda, and send it back to the user), and response is to be expected from the other one (the lambda).
How can this be implemented? If the architecture is flawed, please do suggest improvements.

How does server 2 respond to the front end? Surely server 2 responds to server 1 which in turn responds to the front end. If these operations are synchronous then it is no different from calling a function or API call. If you wanted to make it asynchronous then you've got a different pattern to manage.

Related

Can I send an API response before successful persistence of data?

I am currently developing a Microservice that is interacting with other microservices.
The problem now is that those interactions are really time-consuming. I already implemented concurrent calls via Uni and uses caching where useful. Now I still have some calls that still need some seconds in order to respond and now I thought of another thing, which I could do, in order to improve the performance:
Is it possible to send a response before the sucessfull persistence of data? I send requests to the other microservices where they have to persist the results of my methods. Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull?
With that, the front-end could already begin working even though my API is not 100% finished.
I saw that there is a possible status-code 207 but it's rather used with streams where someone wants to split large files. Is there another possibility? Thanks in advance.
"Is it possible to send a response before the sucessfull persistence of data? Can I already send the user the result in a first response and make a second response if the persistence process was sucessfull? With that, the front-end could already begin working even though my API is not 100% finished."
You can and should, but it is a philosophy change in your API and possibly you have to consider some edge cases and techniques to deal with them.
In case of a long running API call, you can issue an "ack" response, a traditional 200 one, only the answer would just mean the operation is asynchronous and will complete in the future, something like { id:49584958, apicall:"create", status:"queued", result:true }
Then you can
poll your API with the returned ID to see if the operation that is still ongoing, has succeeded or failed.
have a SSE channel (realtime server side events) where your server can issue status messages as pending operations finish
maybe using persistent connections and keepalives, or flushing the response in the middle, you can achieve what you point out, ie. like a segmented response. I am not familiar with that approach as I normally go for the suggesions above.
But in any case, edge cases apply exactly the same: For example, what happens if then through your API a user issues calls dependent on the success of an ongoing or not even started previous command? like for example, get information about something still being persisted?
You will have to deal with these situations with mechanisms like:
Reject related operations until pending call is resolved "server side": Api could return ie. a BUSY error informing that operations are still ongoing when you want to, for example, delete something that still is being created.
Queue all operations so the server executes all them sequentially.
Allow some simulatenous operations if you find they will not collide (ie. create 2 unrelated items)

Cache data from external API in an orchestrated manner

I am building an application which uses the Amazon MWS API.
The API has limits for how frequently you can hit it.
I am looking for a tool that can act as a reverse-proxy, save the MWS API responses, and eventually masquerade as the MWS API without ever hitting it, returning only responses from the cache.
Some tools do this, but what I need is a bit more complicated.
Say I request a report from Amazon MWS:
I'll call RequestReport
I'll get a ReportRequestId back
I'll start server polling GetReportRequestList to find out what the current status of the report request is. The report request will go likely go through the statuses SUBMITTED then DONE, but it could also be set to ERROR or CANCELLED
When the report request status returned by GetReportRequestList is DONE, I can finally call GetReport and get the data.
The behavior from step 3 is what I'm trying to replicate.
This external API cache should be able to produce different results for the same request: the first response should yield SUBMITTED and then the second response should yield DONE.
I should be able to easily configure these flows as I wish, setting the responses I want for the 1st, 2nd, nth request.
I would like this tool to necessitate minimal configuration, I do not want to configure routes or anything, I want it to automatically cache everything and then return everything from the cache, never flushing it.
Also, I need this level of control over what's returned in a response, depending on the count of requests done up to that point.

How REST API handle continuous data update

I have REST backend api, and front end will call api to get data.
I was wondering how REST api handles continuous data update, for example,
in jenkins, we will see that if we execute build job, we can see the continous log output on page until job finishes. How REST accomplish that?
Jenkins will just continue to send data. That's it. It simply carries on sending (at least that's what I'd presume it does). Normally the response contains a header field indicating how much data the response contains (Content-Length). But this field is not necessary. The server can omit it. In such a case the response body ends when the server closes the connection. See RFC 7230:
Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection.
Another possibility would be to use the chunked transfer encoding. Then the server sends a chunk of data having its own Content-Length header. The server terminates this by sending a zero-length last chunk.
Websocksts would be a third possibility.
I was searching for an answer myself and then the obvious solution struck me. In order to see what type of communication a service is using, you can simply view it from browser side using Developer Tools.
In Google Chrome it will be F12 -> Network.
In case of Jenkins, front-end is sending AJAX requests to backend for data:
every 5 seconds on Dashboards page
every second during Pipeline run (Console Output page), that you have mentioned.
I have also checked the approach in AWS. When checking the status of instances (example: Initializing... , Booting...), it queries the backend every second. It seems to be a standard interval for its services.
Additional note:
When running an AWS Remote Console though, it first sends requests for remote console instance status (backend answers with { status: "BOOTING" }, etc.). After backend returns status as "RUNNING", it starts a WebSocket session between your browser and AWS backend (you can notice it by applying WS filter in developer tools).
Then it is no longer REST API, but WebSockets, that is a different protocol (stateful).

Rails 3: Return large amount of data to user via API

My app has an API that users can request data. Sometimes that data takes time to process and is breaking my code.
I need a solution for this and I was thinking in using delayed_job but I'm not sure how this works. If the user makes a request, I need to give him an answer. Even if I process the data in background, the call still needs to wait until the job returns.
What is the solution for this? I am not sure how to do it.
Thanks
Heroku has a 30 second timeout, which is why your requests are failing (Probably H12 or H13 in your heroku logs).
There are three methods to work around this.
Keep the connection open by sending blank data.
You'll need to respond within the first 30 seconds and every 55 seconds after that. Use the time in between to process the data. Sending spaces should not affect the ability of the browser to read the response.
Callback
Have the user provide a callback URL in the initial request. When you finish processing the data, hit the callback url with your response.
Polling
As suggested by Codeglot, you can provide the user with a key. To check on their request, they can ping your server with that key.
Tell the user that their data is being processed and will be available shortly. Youtube, Vimeo, Facebook, Twitter, they all do this.

Streaming API vs Rest API?

The canonical example here is Twitter's API. I understand conceptually how the REST API works, essentially its just a query to their server for your particular request in which you then receive a response (JSON, XML, etc), great.
However I'm not exactly sure how a streaming API works behind the scenes. I understand how to consume it. For example with Twitter listen for a response. From the response listen for data and in which the tweets come in chunks. Build up the chunks in a string buffer and wait for a line feed which signifies end of Tweet. But what are they doing to make this work?
Let's say I had a bunch of data and I wanted to setup a streaming API locally for other people on the net to consume (just like Twitter). How is this done, what technologies? Is this something Node JS could handle? I'm just trying to wrap my head around what they are doing to make this thing work.
Twitter's stream API is that it's essentially a long-running request that's left open, data is pushed into it as and when it becomes available.
The repercussion of that is that the server will have to be able to deal with lots of concurrent open HTTP connections (one per client). A lot of existing servers don't manage that well, for example Java servlet engines assign one Thread per request which can (a) get quite expensive and (b) quickly hits the normal max-threads setting and prevents subsequent connections.
As you guessed the Node.js model fits the idea of a streaming connection much better than say a servlet model does. Both requests and responses are exposed as streams in Node.js, but don't occupy an entire thread or process, which means that you could continue pushing data into the stream for as long as it remained open without tying up excessive resources (although this is subjective). In theory you could have a lot of concurrent open responses connected to a single process and only write to each one when necessary.
If you haven't looked at it already the HTTP docs for Node.js might be useful.
I'd also take a look at technoweenie's Twitter client to see what the consumer end of that API looks like with Node.js, the stream() function in particular.