I'm building a project that sends data to Zapier's webhook, these data can range to hundreds to thousands, and even hundreds of thousands and will be processed one by one in a loop.
Basically, what I did is used Laravel's Chunking Results, and inside it is a pool of Http requests. So it looks like this...
** Note that I chunked the result by 25 to avoid resource exhaustion when sending the data to zapier.
If you guys have better suggestion besides chunking, that would be great. **
Model::where(...someCondition)->chunk(25, function ($models) {
Http::pool(function (Pool $pool) use ($models) {
foreach ($models as $key=>$model) {
$pool->as($key)->withBody(json_encode($model), 'json')->post('https://the-zapier/update/hook/endpoint/...')
}
return $pool;
})
})
Now, Zapier receives the body request, and processes it, and of course it takes time for it to finish.
Here's what I understand:
Inside the loop, the http request was sent, but in Zapier, it hasn't finished the specific task yet.
Now, I'm curious, does the loop proceeds to iterate to the next item AFTER it sends the data to the endpoint ,BUT Zapier hasn't finished the task for this particular request?
OR
Does it wait for Zapier to finish the particular task before it proceeds to the next iteration?
Related
I'm building an API that will handle http calls that perform much work immediately.
For example, in one controller, there is an action (Post request) that will take data in the body, and perform some processing that should be done immediately, but will last for about 1 to 2 minutes.
I'm using CQRS, with Mediatr, and inside this post request, I call a command to handle the processing.
Taking this into consideration, I want the post request to launch the command, then return an Ok to the user though the command is still running in the background. The user will be notified via email once everything is done.
Does someone know any best practice to accomplish this?
Though I'm using MediatR, sending the request to the handler, the call is not executed in parallel.
[HttpPost]
public async Task<IActionResult> RequestReports([FromBody] ReportsIntent reportsIntent)
{
await _mediator.Send(new GetEndOfWeekReports.Command(companyId, clientId, reportsIntent));
return Ok();
}
Does someone know any best practice to accomplish this?
Yes (as described on my blog).
You need what I call the "basic distributed architecture". Specifically, you need at least:
A durable queue. "Durable" here means "on disk; not in-memory".
A backend processor.
So the web api will serialize all the necessary data from the request into a queue message and place that on the queue, and then return to the client.
Then a backend processor retrieves work from that queue and does the actual work. In your case, this work concludes with sending an email.
In a way, this is kind of like MediatR but explicitly going out of process with the on-disk queue and a separate process being your handler.
I would handle API calls that takes some time separately from calls that can be completed directly.
when I do API calls that takes time, I queue them up and process them on the backend.
a typical API call can look something like this:
POST http://api.example.com/orders HTTP/1.1
Host: api.example.com
HTTP/1.1 201 Created
Date: Fri, 5 Oct 2012 17:17:11 GMT
Content-Length: 123
Content-Type: application/json
Location: http://poll.example.com/orders/59cc233e-4068-4d4a-931d-cd5eb93f8c52.xml
ETag: "c180de84f951g8"
{ uri: 'http://poll.example.com/orders/59cc233e-4068-4d4a-931d-cd5eb93f8c52.xml'}
the returned URL is a unique url where the client then can poll/query to get an idea about the status of the job.
When the client queries this URL, then it can look something like this:
and when it is later done, the result would be something like:
Where dataurl is a link to the result/report that the client then can download.
I have a webapp created with Node.js/Express.js/Pug that runs a bash script(mostly an Nmap scan) and displays the results. I'd like to implement some sort of page in between the start and the results to signify the system is working on the task.
I tried to just add another res.render(...) at the beginning of the route that starts the scan, but I ran into the problem that HTTP cannot send headers twice. Effectively, I can't send two http responses for one request; please let me know if I'm wrong here.
I'm still not very familiar with this stuff; I'm working with a group and this job fell to me, any help is appreciated.
Typically the route handler would:
trigger the long running script asynchronously
return an "in progress" page
Then the "in progress" page would ask the server if it was done yet via:
Websocket
Ajax polling
Meta refresh polling
You'd need to have the callback to the original asynchronous process keep track of where the response should go to (possibly using a GUID that would be passed to it and also returned as data in the "in progress" page).
I am uploading multiple files using javascript.
After I upload the files, I need to run several processing functions.
Because of the processing time that is required, I need a UI on the front telling the user the estimated time left of the entire process.
Basically I have 3 functions:
/upload - this is an endpoint for uploading the files
/generate/metadata - this is the next endpoint that should be triggered after /upload
/process - this is the last endpoint. SHould be triggered after /generate/metadata
This is how I expect the screen to look like basically.
Information such as percentage remaining and time left should be displayed.
However, I am unsure whether to allow server to supply the information or I do a hackish estimate solely using javascript.
I would also need to update the screen like telling the user messages such as
"currently uploading"
if I am at function 1.
"Generating metadata" if I am at function 2.
"Processing ..." if I am at function 3.
Function 2 only occurs after the successful completion of 1.
Function 3 only occurs after the successful completion of 2.
I am already using q.js promises to handle some parts of this, but the code has gotten scarily messy.
I recently come across Backbone and it allows structured ways to handle single page app behavior which is what I wanted.
I have no problems with the server-side returning back json responses for success or failure of the endpoints.
I was wondering what would be a good way to implement this function using Backbone.js
You can use a "progress" file or DB entry which stores the state of the backend process. Have your backend process periodically update this file. For example, write this to the file:
{"status": "Generating metadata", "time": "3 mins left"}
After the user submits the files have the frontend start pinging a backend progress function using a simple ajax call and setTimeout. the progress function will simply open this file, grab the JSON-formatted status info, and then update the frontend progress bar.
You'll probably want the ajax call to be attached to your model(s). Have your frontend view watch for changes to the status and update accordingly (e.g. a progress bar).
Long Polling request:
Polling request for updating Backbone Models/Views
Basically when you upload a File you will assign a "FileModel" to every given file. The FileModel will start a long polling request every N seconds, until get the status "complete".
I'm creating a web API based on the new ASP.NET Web API. I'm trying to understand the best way to handle people submitting multiple data-sets at the same time. If they have 100,000 requests it would be nice to let them submit 1,000 at a time.
Let's say I have a create new Contact method in my Contacts Controller:
public string Put(Contact _contact)
{
//add new _contact to repository
repository.Add(_contact);
//return success
}
What's the proper way to allow users to "Batch" submit new contacts? I'm thinking:
public string BatchPut(IEnumerable<Contact> _contacts)
{
foreach (var contact in _contacts)
{
respository.Add(contact);
}
}
Is this a good practice? Will this parse a GET request with a JSON array of Contacts (assuming they are correctly formatted)?
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
Thanks a million!
When you PUT a collection, you are either inserting the whole collection or replacing an existing collection as if it was a single resource. It is very similar to GET, DELETE or POST a collection. It is an atomic operation. Using is as a substitute for individual calls to PUT a contact may not be very RESTfull (but that is really open for debate).
You may want to look at HTTP pipelining and send multiple PutContact requests of the same socket. With each request you can return standard HTTP status for that single request.
I implemented batch updates in the past with SOAP and we encountered a number of unforeseen issues when the system was under load. I suspect you will run into the same issues if you don't pay attention.
For example, the database may timeout in the middle of the batch update and the all hell broke loose in terms of failures, reliability, transactions etc. And the poor client had to figure out what was actually updated and try again.
When there was too many records to update, the HTTP request would time out because we took too long. That opened another can of worms.
Another concern was how much data would we accept during the update? Was 10MB of contacts enough? Perhaps 1MB? Larger buffers has numerous implications in terms of memory usage and security.
Hence my suggestion to look at HTTP pipelining.
Update
My suggestion would to handle batch creation of contacts as an async process. Just assume that a "job" is the same as a "batch create" process. So the service may look as follows:
public class JobService
{
// Post
public void Create(CreateJobRequest job)
{
// 1. Create job in the database with status "pending"
// 2. Save job details to disk (or S3)
// 3. Submit the job to MSMQ (or SQS)
// 4. For 20 seconds, poll the database to see if the job completed
// 5. If the job completed, return 201 with a URI to "Get" method below
// 6. If not, return 202 (aka the request was accepted for processing, but has not completed)
}
// Get
public Job Get(string id)
{
// 1. Fetch the job from the database
// 2. Return the job if it exists or 404
}
}
The background process that consumes stuff from the queue can update the database or alternatively perform a PUT to the service to update the status of Job to running and completed.
You'll need another service to navigate through the data that was just processed, address errors and so forth.
You background process may be need to be tolerant of validation errors. If not, or if your service does validation (assuming you are not doing database calls etc for which response times cannot be guaranteed), you can return a structure like CreateJobResponse that contains enough information for your client to fix the issue and resubmit the request. If you have to do some validation that is time consuming, do it in the background process, mark the job as failed and update the job with the information that will allow a client to fix the errors and resubmit the request. This assumes that the client can do something with the fact that the job failed.
If the Create method breaks the job request into many smaller "jobs" you'll have to deal with the fact that it may not be atomic and pose numerous challenges to monitor whether jobs completed successfully.
A PUT operation is supposed to replace a resource. Normally you do this against a single resource but when doing it against a collection that would mean you replace the original collection with the set of data passed. Not sure if you are meaning to do that but I am assuming you are just updating a subset of the collection in which case a PATCH method would be more appropriate.
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
That is really up to you. There is only a single response so you can send a 200 OK or a 400 Bad Request and put the details in the body.
My application in Django can create some very big SQL queries. I currently use a HttpRequest object, for the data I need, then a HttpResponse, to return what I want to show the User.
Obviously, I can let the User wait for a minute whilst these many sets of queries are being executed and extracted from the database, then return this monolothic HTML page.
Ideally, I'd like to update the page when I want, something like:
For i,e in enumerate(example):
Table.objects.filter(someObjectForFilter[i]).
#Return the object to the page.
#Then Loop again, 'updating' the response after each iteration.
Is this possible?
I discovered recently that an HttpResponse can be a generator:
def myview(request, params):
return HttpResponse(mygenerator(params))
def mygenerator(params):
for i,e in enumerate(params):
yield '<li>%s</li>' % Table.objects.filter(someObjectForFilter[i])
This will progressively return the results of mygenerator to the page, wrapped in an HTML <li> for display.
Your approach is a bit flawed. You have a few different options.
The first is probably the easiest - use AJAX and HTTPRequest. Have a series of these, each of which results in a single Table.objects.filter(someObjectForFilter[i]).. As each one finishes, the script completes and returns the results to the client. The client updates the UI and initiates the next query via another AJAX call.
Another method is to use a batch system. This is a bit heftier, but probably a better design if you're going for real "heavy lifting" in the database. You'll need to have a batch daemon running (a cron probe works just fine for this) scanning for incoming tasks. The user wants to perform something, so their request submits this task (it could simply be a row in a database with their paramters). The daemon grabs it, processes it completely offline - perhaps even by a different machine - and updates the task row when it's complete with the results. The client can then refresh periodically to check the status of that row, via traditional or AJAX methods.