AWS S3 ObjectCreated triggers lambda with delay (Lambda Cold-start) - amazon-s3

I've configured the simple trigger for lambda, which process image up on arrive to S3.
In general, the lambda triggered with minimum delay, many times in the same second when S3 received the image.
But, occasionally, around 7% cases, there is a delay between image received and ObjectCreated event, this delay could be up to 19 seconds!! (9-10 seconds in average).
Any idea how to avoid this delay?
This delay makes me impossible to use S3->Lambda triggers for high performance real time apps.

After a while, trying to investigate and googling.
In parallel asking AWS support about the case, I finally got the answer from AWS:
--
.. Lambda invoked the function pretty much immediately after we received
the event, but the specific request id you shared was for an invoke
that had to coldstart, which added nearly 10 seconds of extra latency.
The function is in the VPC, where cold starts tend to take a few
seconds longer. Coldstarts cannot be eliminated but for high volume
functions the incidence of cold start should be lower once you scale
up and more containers are available for reuse.
As you may see from the answer, if you are trying to make a high performance / high traffic real time app, S3->Lambda will not fit in your requirements.
My next question would be, if I trigger the lambda directly from the script that uploads the image, will it help?
Or I should avoid of using lambda at all on this kind of applications and leave it only for background data processing?
Hope this answer will help someone else..

After 28th of November 2019, cold starts for Lambdas that are inside VPC are no longer causing that much delays, due to AWS Lambda service improvements. You can find out more about it here:
https://aws.amazon.com/blogs/compute/announcing-improved-vpc-networking-for-aws-lambda-functions/
There are a lot of other ways to reduce cold starts in Lambda, but it mostly depends on the use case. The most common is to reduce code size of Lambda or use other runtime, Node or Python for example.

Related

Tracing only error traces in jaeger, finding out only the error traces in jaeger

I actually was trying to sample only the error traces in my application but i already have a probabilistic sampler parameter set in my application which samples the span at the beginning itself and the rest span follow the same pattern after then, i tried using force sampling option in jaeger but it doesnt seem to override the original decision made by the initial span of getting sampled or not. Kindly help me out here.
Jaeger clients implement so-called head-based sampling, where a sampling decision is made at the root of the call tree and propagated down the tree along with the trace context. This is done to guarantee consistent sampling of all spans of a given trace (or none of them), because we don't want to make the coin flip at every node and end up with partial/broken traces. Implementing on-error sampling in the head-based sampling system is not really possible. Imaging that your service is calling service A, which returns successfully, and then service B, which returns an error. Let's assume the root of the trace was not sampled (because otherwise you'd catch the error normally). That means by the time you know of an error from B, the whole sub-tree at A has been already executed and all spans discarded because of the earlier decision not to sample. The sub-tree at B has also finished executing. The only thing you can sample at this point is the spans in the current service. You could also implement a reverse propagation of the sampling decision via response to your caller. So in the best case you could end up with a sub-branch of the whole trace sampled, and possible future branches if the trace continues from above (e.g. via retries). But you can never capture the full trace, and sometimes the reason B failed was because A (successfully) returned some data that caused the error later.
Note that reverse propagation is not supported by the OpenTracing or OpenTelemetry today, but it has been discussed in the last meetings of the W3C Trace Context working group.
The alternative way to implement sampling is with tail-based sampling, a technique employed by some of the commercial vendors today, such as Lightstep, DataDog. It is also on the roadmap for Jaeger (we're working on it right now at Uber). With tail-based sampling 100% of spans are captured from the application, but only stored in memory in a collection tier, until the full trace is gathered and a sampling decision is made. The decision making code has a lot more information now, including errors, unusual latencies, etc. If we decide to sample the trace, only then it goes to disk storage, otherwise we evict it from memory, so that we only need to keep spans in memory for a few seconds on average. Tail-based sampling imposes heavier performance penalty on the traced applications because 100% of traffic needs to be profiled by tracing instrumentation.
You can read more about head-based and tail-based sampling either in Chapter 3 of my book (https://www.shkuro.com/books/2019-mastering-distributed-tracing/) or in the awesome paper "So, you want to trace your distributed system? Key design insights from years of practical experience" by Raja R. Sambasivan, Rodrigo Fonseca, Ilari Shafer, Gregory R. Ganger (http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf).

Sampling rule depending on duration

Is it possible in AWS XRay to create the sampling rule somehow that will sample all the calls for some service with duration greater than some value?
The only way right now to find the lagging sub-service is to sample 100% and then filter by service name and duration. This is pretty expensive having 10K+ segments per second.
AWS X-Ray dev here. Biased sampling on longer duration can skew your service graph statistics and cause false negatives. If you are ok with this data skew, depend on which X-Ray SDK you are using you might be able to achieve conditional sampling by making explicit sampling decisions on segment close. This would require you to override some certain part of the SDK behaviors.
We know this is a popular feature request and we are working on improving it. Please use our AWS X-Ray public forum https://forums.aws.amazon.com/forum.jspa?forumID=241&start=0 for latest feature launch or provide any additional comments.

How to set optimum value for storm.config.setMessageTimeoutSecs

I am working on Apache storm topology. I have different bolts who carrying out functionality like cosmosdb insertion , REST api call etc.
I want to set value for storm.config.setMessageTimeoutSecs for my Storm topology.
Now I have set it for 5 min, still I can see failures at spot side due to message time out. Is there any max value for message time out for topology.
and How to set optimum value for storm.config.setMessageTimeoutSecs
I don't believe there is a maximum timeout, no. You probably don't want to set it too high though, since it will mean that if a tuple is actually lost during network transfer, it will take much longer to figure out that it's been lost and to replay it.
Are you setting topology.max.spout.pending to a reasonable value? That can help to avoid too many tuples in flight at a time, which helps keep the complete latency down.
I'd also look at whether 5+ minutes is a reasonable processing time for your tuples. I don't know your use case, so maybe that's reasonable, but it seems like a long time to me.
If you actually need to process some tuples that will take such a long time, you might consider splitting your topology into smaller topologies, so you can set a high timeout for the slow part, and a lower timeout for the rest.

Pika/RabbitMQ: Correct usage of add_backpressure_callback

I am new to using RabbitMQ and Pika so please excuse if the answer is obvious...
We are feeding some data and passing the results into our rabbitmq message queue. The queue is being consumed by a process that writes the data into elasticsearch.
The data is being produced faster than it can be fed into elastic search and consequently the queue grows and almost never shrinks.
We are using pika and getting the warning:
UserWarning: Pika: Write buffer exceeded warning threshold at X bytes and an estimated X frames behind.
This continues for some time until Pika simply crashes with a strange error message:
NameError: global name 'log' is not defined
We are using the Pika BlockingConnection object (http://pika.github.com/connecting.html#blockingconnection).
My plan to fix this is to use the add_backpressure_callback function to have a function that will call time.sleep(0.5) every time that we need to apply back-pressure. However, this seems like it is too simple of a solution and that there must be a more appropriate way of dealing with something like this.
I would guess that it is a common situation that the queue is being populated faster than it is being consumed. I am looking for an example or even some advice as to what is the best way to slow down the queue.
Thanks!
Interesting problem, and as you rightly point out this is probably quite common. I saw another related question on Stack Overflow with some pointers
Pika: Write buffer exceeded warning
Additionally, may you want to consider scaling up your elasticsearch, this is perhaps the fundamental bottleneck you want to fix. A quick look on the elasticsearch.org website came up with
"Distributed
One of the main features of Elastic Search is its distributed nature. Indices are broken down into shards, each shard with 0 or more replicas. Each data node within the cluster hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically and behind the scenes.
"
(...although not sure if insertion is also distributed and scalable)
Afterall, RabbitMQ is not supposed to grow queues infinitely. Also may want to look at scaling up RabbitMQ itself, for example by using things like per-queue processes etc. in the RabbitMQ configuration.
Cheers!

how to speedup amazon EMR bootstrap?

I'm using amazon EMR for some intensive computation, but, it takes around 7 min to start computing, is there some clever way to have my computation starting immediately ? The computation is a python stream started from a user-faced website, so I can't really afford a long startup.
I might have simply missed an option in the ocean that is amazon AWS. I just want simplicity to launch jobs (that's what I used EMR), scalability, and pay only for what I use (and startup time is not useful).
I know this is an old question but had some insights I would add to the next searcher who finds this thread in hope of speeding up bootstrap times on Amazon EMR.
For a while I have wondered why my clusters took so long to start, usually about 15 minutes. This takes a pretty big chunk of time for a job that usually completes in under 1 hour. Sometimes it pushes the job past 1 hour, but I think thankfully AWS does not charge for the full boot strap time.
The last couple days I noticed my startup times were improved. You see the spot market became very volatile during April and the first week of May. Normally, I start my cluster entirely of spot instances, as failure is an option, and the cost savings justifies the technique in my case. However, after waiting 14 hours for clusters to start, I had to switch to OnDemand, I only have so much patience, over night usually exceeds it. The OnDemand clusters start in about 5 minutes. Now having switched back to spot as the madness seems to have abated, I am back to 15 minutes for a cluster.
So if you are using Spot instances on your Core or Master nodes, expect a longer startup time. I will be experimenting with using a small set of OnDemand in the core and augmenting with a large number of spot instances to see if it helps startup and deals better with Spot Market volatility.
This is pretty normal and there is little you can do about it. I'm starting 100+ node clusters and I've seen them take 15+ minutes before they start processing. Given the amount of work thats going on in the background I'm pretty happy to allow them the 15 minutes or so to get the cluster configured and read in whatever data maybe required. Nature of the beast I'm afraid.
Where's your data source hosted?
If on S3 (probably), if you have many tiny files, it's the latency of each connection (per file) that is taking the time.
If that's the only reason then, your 7 mins of start-up time will translate to ~5 mins of reading from S3 time => ~1GB input files on S3