How to split a long timeout API call into smaller ones - api

I have an API js file which gets called by a cronjob via curl GET.
This js file basically makes a query to an external API via await fetch and saves some data from the response onto Mongodb via await .. updateOne. The problem is this happens in loop for about 500 different values and it takes more than 10 seconds to finish, whereas my server timeout limit for serverless functions is 10 sec.
So how can I split it into multiple "GET" requests ?
Isn't doing a for loop inside the API js file the same since it'd still count as a single operation?
Every time I google for this via different keywords it finds me non-related stuff, am I missing something? maybe is rare to find such a case? I'm new to the whole cronjob/serverless functions thing, if this is not the correct place to ask for this please point me out where should I post it whithin stackexchange

Two potential solutions:
The brute force method would be to increase the timeout setting, you can do this via the serverless.yml, either in the provider section or in the function definition directly. (Maximum timeout for AWS Lambdas is 900 seconds or 15 minutes.) (Not relevant as on Vercel, timeout of 900 seconds for Enterprise and 60 seconds for Pro but 10 seconds for the free plan.)
Doing the for loop inside the Lambda function wouldn't change much. If you can break it down into multiple cron jobs which you can parameterise. E.g. imagine a cron job which goes through a staff list to do some processing on a daily basis. You could change your cron job to accept a range of letters which filters the staff list by last name. So instead of one cron job you would do four: A-F, G-M, N-S and T-Z. (In your case trying to find a parameter which splits the 500 values into equal sized buckets.)
As you get billed by duration and memory consumption with serverless (at least with AWS) it probably doesn't make a lot of sense to split it so increasing the timeout setting might be the easier solution, but I don't know your full context so this is just a guess.

Related

Mule is taking a long time for the simple select for the first execution

I am just using a HTTP listener and Select in mule flow. It is a get method, passing ID as an input, and the same ID is passed to select (input). It is taking 3 to 4 minutes of delay when we execute via mule for the first time, but in DB, it took only millisecond.
This delay only happens after adding the parameter in the select.
Someone help me, why there is a delay for the first time and how to resolve it?
Possible cause could be how you create Metadata. For example you use huge CSV file as example for your data structure. Mule reads whole file to have headers. It takes time.
Solution - if you create Metadata by example - use small examples with couple rows of data.
Usually the main points that cause performance issues in first executions are:
JVM and Mule Runtime warmup
Time to establish connections
The first one can not be avoided. For the second one usually a connection pool is used to mitigate it somewhat. Having said that 4 minutes is a very excessive time for either of those. You need to do some performance analysis, adding logs before and after operation in the flow, enabling debug logs for the database connector and even using a Java profiler connected to the Mule JVM to understand what could be happening.
You also have to consider if there is a high number of records that need to be processed, even if the database can answer quickly, it might take some time to format.

How to setup Jmeter test to have a certain throughput?

I am trying to perform a load test, and according to our stats (that I can't disclose) we expect peaks of 300 users per minute, uploading files of different sizes to our system.
Now, I created a jmeter test, which works fine, but what I don't know how to fine tune is - aim for certain throughput.
I create a test with 150 users 100 loops, expecting it to simulate 150 users coming and going, and in total upload 15000 files, but that never happened because at certain point tests started failing.
Looking at our new relic monitoring, it seems that somehow I reached 1600 requests in a single minute. I am testing a microservice, running 12 instances, so that might play the role here for a higher number of requests, but even with it I expected tests to pass. My uploaded file was 600kb. In the end, I had 98% failure.
I reduced the file size to 13kb, at that point, I got 17% failiure.
So, there's obviously something with the time needed to upload the bigger file, but I don't understand what causes 150 thread/users in X loops to become 1600 at the same time. I'd expect Jmeter to never start a new loop with the same thread, unless the original user is finished. That being said - I'd expect tops 150 users in a given minute.
Any clarification on how to get exact number of users/threads running at the same time is well appreciated.
I tried to play with KeepAlive checkbox, I tried adding lifetime of request to 10 seconds (all them uploads get response earlier) - but then JMeter finished the Thread, and I had only 150 runs, no loops.
Thanks!
By default JMeter executes Samplers as fast as it can so there are 2 main factors which define the actual throughput (number of requests per unit of time):
JMeter configuration
Application under test response time
So if you're following JMeter Best Practices and JMeter has enough headroom to operate in terms of CPU, RAM, etc. - you are only limited by your application response time as JMeter waits for previous request to finish before starting a new one.
If you need to "slow down" your test execution consider adding i.e. Constant Throughput Timer to your Test Plan where you will be able to define the desired number of requests per minute

How to increase Google Sheets v4 API quota limitations

The new Google Sheets API v4 currently has an unlimited read/write quota per day (which is fantastic), but restricted to 500 reads/writes per account per 100 seconds, and 100 read/writes per key per 100 seconds (or, I have found, multiple keys coming from the same IP). This is probably plenty for most use cases, but I have an edge case that requires bringing a frequently-updated Google Sheet with 70 tabs down to a node.js server that distributes these to user's clients every ~30-60 seconds or so (users are data annotators who are student research assistants). This wasn't so bad early in the project when there were only 20-30 tabs, but now that the data is large the server is blowing through the 100 quota and returning errors every 10-15 minutes.
The problem is such that:
Frequent data updates: Only data on 1-5 of the 70 tabs is likely to be updated on any given minute, but which tabs have new data is random (so I am pulling down the whole sheet of 70 = 70 reads).
Update interval: The need for updates happens randomly at about 30 second to 5-minute intervals (so some within the quota, some about 3-5x the quota).
Throttling: I have tried throttling the update to be within the 100 calls/100 seconds (my previous solution), but this introduces large usability issues, significantly decreasing usability/productivity/work quality.
Quota increase: The sheets API does not currently appear to include a way to pay to increase the quota. It does allow filling out a form to request an increase in the quota, but I'm not sure what the mean response time is on this (my request is only a few days old).
Multiple service accounts: I have tried using multiple service accounts to get the full 500 requests/100 seconds quota (rather than the per-user quota), since this is a server, but Google Sheets looks to rate-limit to 100 requests/100 seconds from a given IP
Alternatives: I have considered that this project may have just grown beyond the size that Sheets is easily able to handle, but there do not appear to be any good, usable, self-hosted, collaborative spreadsheets with easy-to-interface-to APIs out there.
Are there settings/methods suggested to achieve the full 500 calls/100 seconds for a server?
You can request quota update in Google Cloud Platform and it will be increased to 2500 per account an 500 per user. (about your #4)
You can use spreadsheets.get to read the entire spreadsheet in a single call, rather than 1 call per request. Alternately, you can use spreadsheets.values.batchGet to read multiple different ranges in a single call, if all you need are the values.
The Drive API offers "push notifications", so you can get notified when changes occur and react to those, instead of polling for them. The latency of the notifications is a little on the slow side, but it gets the job done.

BigQuery Retrieval Times Slow

BigQuery is fast at processing large sets of data, however retrieving large results from BigQuery is not fast at all.
For example, I ran a query that returned 211,136 rows over three HTTP requests, taking just over 12 seconds in total.
The query itself was returned from cache, so no time was spent executing the query. The host server is Amazon m4.xlarge running in US-East (Virginia).
In production I've seen this process take ~90seconds when returning ~1Mn rows. Obviously some of this could be down to network traffic... but it seems too slow for that to be the only cause (those 211,136 rows were only ~1.7MB).
Has anyone else encountered such slow speed when having results returned, and have found a resolution?
Update: Reran test on VM inside Google Cloud with very similar results. Ruling out network issues beteween Google and AWS.
Our SLO on this API is 32 seconds,and a call taking 12 seconds is normal. 90 seconds sounds too long, it must be hitting some of our system's tail latency.
I understand that it is embarrassingly slow. There are multiple reasons to it, and we are working on improving the latency of this API. By the end of Q1 next year, we should be able to roll out a change that would cut tabledata.list time in half (by upgrading the API frontend to our new One Platform technology). If we have more resource, we would also make jobs.getQueryResults faster.
Concurrent Requests using TableData.List
It's not great, but there is a resolution.
Make a query, and set the max rows to 1000. If there is no page token simply return the results.
If there is a page token then disregard the results*, and use the TableData.List API. However rather than simply sending one request at a time, send a request for every 10,000 records* in the result. To so this one can use the 'MaxResults' and 'StartIndex' fields. (Note even these smaller pages may be broken into multiple requests*, so paging logic is still needed).
This concurrency (and smaller pages) leads to significant reductions in retrieval times. Not as good as BigQ simply streaming all results, but enough to start realizing the gains from using BigQ.
Potential Pitfals: Keep an eye on the request count, as with larger result-sets there could be 100req/s throttling. It's also worth noting that there's no guarantee of ordering, so using StartIndex field as pseudo-paging may not always return correct results*.
* Anything with a single asterix is still an educated guess, but not confirmed as true/best practise.

Bigquery streaming inserts taking time

During load testing of our module we found that bigquery insert calls are taking time (3-4 s). I am not sure if this is ok. We are using java biguqery client libarary and on an average we push 500 records per api call. We are expecting a million records per second traffic to our module so bigquery inserts are bottleneck to handle this traffic. Currently it is taking hours to push data.
Let me know if we need more info regarding code or scenario or anything.
Thanks
Pankaj
Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
We seen several side effects although:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take for these is an exponential-backoff with retry, even the support told to do so. Which personally doesn't make me happy.
The approach you've chosen if takes hours that means it does not scale, and won't scale. You need to rethink the approach with async processes. In order to finish sooner, you need to run in parallel multiple workers, the streaming performance will be the same. Just having 10 workers in parallel it means time will be 10 times less.
Processing in background IO bound or cpu bound tasks is now a common practice in most web applications. There's plenty of software to help build background jobs, some based on a messaging system like Beanstalkd.
Basically, you needed to distribute insert jobs across a closed network, to prioritize them, and consume(run) them. Well, that's exactly what Beanstalkd provides.
Beanstalkd gives the possibility to organize jobs in tubes, each tube corresponding to a job type.
You need an API/producer which can put jobs on a tube, let's say a json representation of the row. This was a killer feature for our use case. So we have an API which gets the rows, and places them on tube, this takes just a few milliseconds, so you could achieve fast response time.
On the other part, you have now a bunch of jobs on some tubes. You need an agent. An agent/consumer can reserve a job.
It helps you also with job management and retries: When a job is successfully processed, a consumer can delete the job from the tube. In the case of failure, the consumer can bury the job. This job will not be pushed back to the tube, but will be available for further inspection.
A consumer can release a job, Beanstalkd will push this job back in the tube, and make it available for another client.
Beanstalkd clients can be found in most common languages, a web interface can be useful for debugging.