Sampling rule depending on duration - aws-xray

Is it possible in AWS XRay to create the sampling rule somehow that will sample all the calls for some service with duration greater than some value?
The only way right now to find the lagging sub-service is to sample 100% and then filter by service name and duration. This is pretty expensive having 10K+ segments per second.

AWS X-Ray dev here. Biased sampling on longer duration can skew your service graph statistics and cause false negatives. If you are ok with this data skew, depend on which X-Ray SDK you are using you might be able to achieve conditional sampling by making explicit sampling decisions on segment close. This would require you to override some certain part of the SDK behaviors.
We know this is a popular feature request and we are working on improving it. Please use our AWS X-Ray public forum https://forums.aws.amazon.com/forum.jspa?forumID=241&start=0 for latest feature launch or provide any additional comments.

Related

Is there a way to make Google Text to Speech, speak text for a desired duration?

I went through the documentation of Google Text to Speech SSML.
https://developers.google.com/assistant/actions/reference/ssml#prosody
So there is a tag called <Prosody/> which as per the documentation of W3 Specification can accept an attribute called duration which is a value in seconds or milliseconds for the desired time to take to read the contained text.
So <speak><prosody duration='6s'>Hello, How are you?</prosody></speak> should take 3 seconds for google text to speech to speak this! But when i try it here https://cloud.google.com/text-to-speech/ , its not working and also I tried it in rest API.
Does google text to speech doesn't take duration attribute into account? If they don't then is there a way to achieve the same?
There are two ways I know of to solve this:
First Option: call Google's API twice: use the first call to measure the time of the spoken audio, and the second call to adjust the rate parameter accordingly.
Pros: Better audio quality? (this is subjective and depends on taste as well as the application's requirements)
Cons: Doubles the cost and processing time.
Second option:
Post-process the audio using a specialized library such as ffmpeg
Pros: Cost effective and can be fast if implemented correctly.
Cons: Some knowledge of the concepts and the usage of an audio post-processing library is required (no need to become an expert though).
As Mr Lister already mentioned, the documentation clearly says.
<prosody>
Used to customize the pitch, speaking rate, and volume of text
contained by the element. Currently the rate, pitch, and volume
attributes are supported.
The rate and volume attributes can be set according to the W3
specifications.
Using the UI interface you can test it.
In particular you can use things like
rate="low"
or
rate="80%"
to adjust the speed. However that is as far as you can go with Google TTS.
AWS Polly does support what you need, but only on Standard voices (not Neural).
Here is the documentation.
Setting a Maximum Duration for Synthesized Speech
Polly also has a UI to do a quick test.

Tracing only error traces in jaeger, finding out only the error traces in jaeger

I actually was trying to sample only the error traces in my application but i already have a probabilistic sampler parameter set in my application which samples the span at the beginning itself and the rest span follow the same pattern after then, i tried using force sampling option in jaeger but it doesnt seem to override the original decision made by the initial span of getting sampled or not. Kindly help me out here.
Jaeger clients implement so-called head-based sampling, where a sampling decision is made at the root of the call tree and propagated down the tree along with the trace context. This is done to guarantee consistent sampling of all spans of a given trace (or none of them), because we don't want to make the coin flip at every node and end up with partial/broken traces. Implementing on-error sampling in the head-based sampling system is not really possible. Imaging that your service is calling service A, which returns successfully, and then service B, which returns an error. Let's assume the root of the trace was not sampled (because otherwise you'd catch the error normally). That means by the time you know of an error from B, the whole sub-tree at A has been already executed and all spans discarded because of the earlier decision not to sample. The sub-tree at B has also finished executing. The only thing you can sample at this point is the spans in the current service. You could also implement a reverse propagation of the sampling decision via response to your caller. So in the best case you could end up with a sub-branch of the whole trace sampled, and possible future branches if the trace continues from above (e.g. via retries). But you can never capture the full trace, and sometimes the reason B failed was because A (successfully) returned some data that caused the error later.
Note that reverse propagation is not supported by the OpenTracing or OpenTelemetry today, but it has been discussed in the last meetings of the W3C Trace Context working group.
The alternative way to implement sampling is with tail-based sampling, a technique employed by some of the commercial vendors today, such as Lightstep, DataDog. It is also on the roadmap for Jaeger (we're working on it right now at Uber). With tail-based sampling 100% of spans are captured from the application, but only stored in memory in a collection tier, until the full trace is gathered and a sampling decision is made. The decision making code has a lot more information now, including errors, unusual latencies, etc. If we decide to sample the trace, only then it goes to disk storage, otherwise we evict it from memory, so that we only need to keep spans in memory for a few seconds on average. Tail-based sampling imposes heavier performance penalty on the traced applications because 100% of traffic needs to be profiled by tracing instrumentation.
You can read more about head-based and tail-based sampling either in Chapter 3 of my book (https://www.shkuro.com/books/2019-mastering-distributed-tracing/) or in the awesome paper "So, you want to trace your distributed system? Key design insights from years of practical experience" by Raja R. Sambasivan, Rodrigo Fonseca, Ilari Shafer, Gregory R. Ganger (http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf).

Using the cloud service to trasform a picture using a neural algorithm?

Yesterday I tried to transform a picture in the artistic style using CNNs based on A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge using a recent Torch implemenation,as it is explained here :
https://github.com/mbartoli/neural-animation
it started the conversion correctly,the problem is that the process is very time consuming,after 1 hour of elaboration a simple picture was not fully transformed. And I have to trasform 1615 pictures. What's the solution here ? Can I use the Google Cloud Platform to make this operation faster ? Or some other kind of Cloud service ? Using my Home PC is not the right solution. If I can use the cloud power,how can I configure everything ? let me know,thanks.
Using (Google Cloud Platform) GCP here would seem to be a good use case. If we were to boil it down to what you have ... you have an application which is CPU intensive that takes a long time to run. Depending on the nature of the application, it may run faster for any single given instance by having more CPUs and/or more RAM. GCP allows YOU to choose the size of the machine on which your application runs. You can choose from VERY small to VERY large. The distinction is how much you are willing to pay. Remember, you only pay for what you use. If an application takes an hour to run on a machine with price X but takes 30 minutes on a different machine with price 2X then the cost will still only be X but you will have a result in 30 minutes rather than an hour. You would switch off the machine after the 30 minutes to prevent charging.
Since you also said that you have MANY images to process, this is where you can take advantage of horizontal scale. Instead of having just one machine where the application takes an hour on each machine and all results are serialized.... you can create an array of machines where each machine is processing one picture. So if you had 50 machines, at the end of one hour you would have 50 images processed instead of one.
As for how to get this all going ... I'm afraid that is a much bigger story and one where a read of the GCP documentation will help tremendously. I suggest you read and have a play and THEN if you have specific questions, the community can try and provide specific answers.

AWS S3 ObjectCreated triggers lambda with delay (Lambda Cold-start)

I've configured the simple trigger for lambda, which process image up on arrive to S3.
In general, the lambda triggered with minimum delay, many times in the same second when S3 received the image.
But, occasionally, around 7% cases, there is a delay between image received and ObjectCreated event, this delay could be up to 19 seconds!! (9-10 seconds in average).
Any idea how to avoid this delay?
This delay makes me impossible to use S3->Lambda triggers for high performance real time apps.
After a while, trying to investigate and googling.
In parallel asking AWS support about the case, I finally got the answer from AWS:
--
.. Lambda invoked the function pretty much immediately after we received
the event, but the specific request id you shared was for an invoke
that had to coldstart, which added nearly 10 seconds of extra latency.
The function is in the VPC, where cold starts tend to take a few
seconds longer. Coldstarts cannot be eliminated but for high volume
functions the incidence of cold start should be lower once you scale
up and more containers are available for reuse.
As you may see from the answer, if you are trying to make a high performance / high traffic real time app, S3->Lambda will not fit in your requirements.
My next question would be, if I trigger the lambda directly from the script that uploads the image, will it help?
Or I should avoid of using lambda at all on this kind of applications and leave it only for background data processing?
Hope this answer will help someone else..
After 28th of November 2019, cold starts for Lambdas that are inside VPC are no longer causing that much delays, due to AWS Lambda service improvements. You can find out more about it here:
https://aws.amazon.com/blogs/compute/announcing-improved-vpc-networking-for-aws-lambda-functions/
There are a lot of other ways to reduce cold starts in Lambda, but it mostly depends on the use case. The most common is to reduce code size of Lambda or use other runtime, Node or Python for example.

OptaPlanner partitioned search strategy

I'm considering OptaPlanner's partitioned search feature because of a big scale VRPTW problem I need to deal with.
As far as I know a custom implementation of the SolutionPartitioner have to be implemented according to OptaPlanner's documentation. The example of a partitioner for a cloud balancing problem is straightforward, but I wonder how to partition planning etities in a VRPTW class problem.
Should I use a kind of a clustering algorithm in order to make a cluster based partitioning, or should I just divide input data like in the cloud balance example? Sometimes a lot of customers are placed on a relatively small area, but more workers are scheduled to service them. On the other hand there can be a service area where two clearly disjoint sub-areas are visible.
Thank you in advance!