Tracing only error traces in jaeger, finding out only the error traces in jaeger - jaeger

I actually was trying to sample only the error traces in my application but i already have a probabilistic sampler parameter set in my application which samples the span at the beginning itself and the rest span follow the same pattern after then, i tried using force sampling option in jaeger but it doesnt seem to override the original decision made by the initial span of getting sampled or not. Kindly help me out here.

Jaeger clients implement so-called head-based sampling, where a sampling decision is made at the root of the call tree and propagated down the tree along with the trace context. This is done to guarantee consistent sampling of all spans of a given trace (or none of them), because we don't want to make the coin flip at every node and end up with partial/broken traces. Implementing on-error sampling in the head-based sampling system is not really possible. Imaging that your service is calling service A, which returns successfully, and then service B, which returns an error. Let's assume the root of the trace was not sampled (because otherwise you'd catch the error normally). That means by the time you know of an error from B, the whole sub-tree at A has been already executed and all spans discarded because of the earlier decision not to sample. The sub-tree at B has also finished executing. The only thing you can sample at this point is the spans in the current service. You could also implement a reverse propagation of the sampling decision via response to your caller. So in the best case you could end up with a sub-branch of the whole trace sampled, and possible future branches if the trace continues from above (e.g. via retries). But you can never capture the full trace, and sometimes the reason B failed was because A (successfully) returned some data that caused the error later.
Note that reverse propagation is not supported by the OpenTracing or OpenTelemetry today, but it has been discussed in the last meetings of the W3C Trace Context working group.
The alternative way to implement sampling is with tail-based sampling, a technique employed by some of the commercial vendors today, such as Lightstep, DataDog. It is also on the roadmap for Jaeger (we're working on it right now at Uber). With tail-based sampling 100% of spans are captured from the application, but only stored in memory in a collection tier, until the full trace is gathered and a sampling decision is made. The decision making code has a lot more information now, including errors, unusual latencies, etc. If we decide to sample the trace, only then it goes to disk storage, otherwise we evict it from memory, so that we only need to keep spans in memory for a few seconds on average. Tail-based sampling imposes heavier performance penalty on the traced applications because 100% of traffic needs to be profiled by tracing instrumentation.
You can read more about head-based and tail-based sampling either in Chapter 3 of my book (https://www.shkuro.com/books/2019-mastering-distributed-tracing/) or in the awesome paper "So, you want to trace your distributed system? Key design insights from years of practical experience" by Raja R. Sambasivan, Rodrigo Fonseca, Ilari Shafer, Gregory R. Ganger (http://www.pdl.cmu.edu/PDL-FTP/SelfStar/CMU-PDL-14-102.pdf).

Related

What is the difference between Symbolic and Concrete model checking when the search is bounded in time?

Could someone please spend a few words to explain to someone who does not come from a formal methods background what is the difference between verifying a specification using Symbolic Model Checking and doing the same using Concrete Model Checking, when the search is bounded in time? I am referring to the definition of SMC and Concrete MC made in UPPAAL.
In particular, I wrote a program that uses UPPAAL Java API to verify a query against a network of timed automata. If the query is verified, UPPAAL returns a symbolic trace to parse or something else if it is not. If the verification takes too long I decided to halt the verification process, return a message and move on with the next query to verify. Everything is good so far.
Recently, I have been playing around with UPPAAL Stratego which natively offers the possibility of choosing a maximum time or depth of exploration to bound the search. However, this options can be applied only when the verification is carried out using Concrete Model Checking.
My question is : is there any difference in halting the symbolic verification process, as I am doing in my Java program and what UPPAAL Stratego does natively? In both case I don't get an answer (or a trace) but what about the "reliability" of the exploration?
Which would be better (i.e. more complete) between the two options? Halting the symbolic verification or halting the concrete verification?
My understanding so far is that in Symbolic Model Checking, the possible states are defined by using intervals of variables whilst in Concrete Model Checking variables assume an actual value. My view is that, in terms of "completeness", halting the SMC after some time is more "complete" since the exploration of the state space happens systematically using BFS or DFS algorithm and, if I use BFS, I can be "sure" that within N steps nothing bad happens. But again, my background in model checking is not rich, so I might have get it completely wrong.
Also, if could drop any reference to the strategies, it would be really appreciated.
Thanks!

How a proposer know its propose is not approved by a quorum of acceptors?

I am reading "paxos" on wiki, and it reads:
"Rounds fail when multiple Proposers send conflicting Prepare messages, or when the Proposer does not receive a Quorum of responses (Promise or Accepted). In these cases, another round must be started with a higher proposal number."
But I don't understand how the proposer tells the difference between its proposal not being approved and it just takes more time for the message to transmit?
One of the tricky parts to understanding Paxos is that the original paper and most others, including the wiki, do not describe a full protocol capable of real-world use. They only focus on the algorithmic necessities. For example, they say that a proposer must choose a number "n" higher than any previously used number. But they say nothing about how to actually go about doing that, the kinds of failures that can happen, or how to resolve the situation if two proposers simultaneously try to use the same proposal number (as in both choosing n=2). That actually completely breaks the protocol and would lead to incorrect results but I'm not sure I've ever seen that specifically called out. I guess it's just supposed to be "obvious".
Specifically to your question, there's no perfect way to tell the difference using the raw algorithm. Practical implementations typically go the extra mile by sending a Nack message to the Proposer rather than just silently ignoring it. There are plenty of other tricks that can be used but all of them, including the nacks, come with varying downsides. Which approach is best generally depends on both the kind of application employing Paxos and the environment it's intended to run in.
If you're interested, I put together a much longer-winded description of Paxos that includes many of issues practical implementations must address in addition to the core components. It covers this issue along with several others.
Specific to your question it isn't possible for a proposer to distinguish between lost messages, delayed messages, crashed acceptors or stalled acceptors. In each case you get no response. Typically an implementation will timeout on getting less than a quorum response and resend the proposal on the assumption messages were dropped or acceptors are rebooting.
Often implementations add "nack" messages as negative acknowledgement as an optimisation to speed up recovery. The proposer only gets "nack" responses from nodes that are reachable that have accepted a higher promise. The ”nack” can show both the highest promise and also the highest instance known to be fixed. How this helps will be outlined below.
I wrote an implementation of Paxos called TRex with some of these techniques sticking as closely as possible to the description of the algorithm in the paper Paxos Made Simple. I wrote up a description of the practical considerations of timeouts and nacks on a blog post.
One of the interesting techniques it uses is for a timed out node to make the first proposal with a very low number. This will always get "nack" messages. Why? Consider a three node cluster where one network link breaks between a stable proposer and one other node. The other node will timeout and issue a prepare. If it issues a high prepare it will get a promise from the third node. This will interrupt the stable leader. You then have symmetry where the two nodes that cannot message one another can fight with the leadership swapping with no forward progress.
To avoid this a timed out node can start with a low prepare. It can then look at the "nack" messages to learn from the third node that there is a leader who is making progress. It will see this as the highest instance known to be fixed in the nack will be greater than the local value. The timed out node can then not issue a high prepare and instead ask the third node to send it the latest fixed and accepted values. With that enhancement a timed out node can now distinguish between a stable proposer crashing or the connection failing. Such ”nack” based techniques don't affect the correctness of the implementation they are only an optimisation to ensure fast failover and forward progress.

Sampling rule depending on duration

Is it possible in AWS XRay to create the sampling rule somehow that will sample all the calls for some service with duration greater than some value?
The only way right now to find the lagging sub-service is to sample 100% and then filter by service name and duration. This is pretty expensive having 10K+ segments per second.
AWS X-Ray dev here. Biased sampling on longer duration can skew your service graph statistics and cause false negatives. If you are ok with this data skew, depend on which X-Ray SDK you are using you might be able to achieve conditional sampling by making explicit sampling decisions on segment close. This would require you to override some certain part of the SDK behaviors.
We know this is a popular feature request and we are working on improving it. Please use our AWS X-Ray public forum https://forums.aws.amazon.com/forum.jspa?forumID=241&start=0 for latest feature launch or provide any additional comments.

PyMC change backend after sampling

I have been using PyMC in an analysis of some high energy physics data. It has worked to perfection, the analysis is complete, and we are working on the paper.
I have a small problem, however. I ran the sampler with the RAM database backend. The traces have been sitting around in memory in an IPython kernel process for a couple of months now. The problem is that the workstation support staff want to perform a kernel upgrade and reboot that workstation. This will cause me to lose the traces. I would like to keep these traces (as opposed to just generating new), since they are what I've made all the plots with. I'd also like to include a portion of the traces (only the parameters of interest) as supplemental material with the publication.
Is it possible to take an existing chain in a pymc.MCMC object created with the RAM backend, change to a different backend, and write out the traces in the chain?
The trace values are stored as NumPy arrays, so you can use numpy.savetxt to send the values of each parameter to a file. (This is what the text backend does under the hood.)
While saving your current traces is a good idea, I'd suggest taking the time to make your analysis repeatable before publishing.

Travelling Salesman and Map/Reduce: Abandon Channel

This is an academic rather than practical question. In the Traveling Salesman Problem, or any other which involves finding a minimum optimization ... if one were using a map/reduce approach it seems like there would be some value to having some means for the current minimum result to be broadcast to all of the computational nodes in some manner that allows them to abandon computations which exceed that.
In other words if we map the problem out we'd like each node to know when to give up on a given partial result before it's complete but when it's already exceeded some other solution.
One approach that comes immediately to mind would be if the reducer had a means to provide feedback to the mapper. Consider if we had 100 nodes, and millions of paths being fed to them by the mapper. If the reducer feeds the best result to the mapper than that value could be including as an argument along with each new path (problem subset). In this approach the granularity is fairly rough ... the 100 nodes will each keep grinding away on their partition of the problem to completion and only get the new minimum with their next request from the mapper. (For a small number of nodes and a huge number of problem partitions/subsets to work across this granularity would be inconsequential; also it's likely that one could apply heuristics to the sequence in which the possible routes or problem subsets are fed to the nodes to get a rapid convergence towards the optimum and thus minimize the amount of "wasted" computation performed by the nodes).
Another approach that comes to mind would be for the nodes to be actively subscribed to some sort of channel, or multicast or even broadcast from which they could glean new minimums from their computational loop. In that case they could immediately abandon a bad computation when notified of a better solution (by one of their peers).
So, my questions are:
Is this concept covered by any terms of art in relation to existing map/reduce discussions
Do any of the current map/reduce frameworks provide features to support this sort of dynamic feedback?
Is there some flaw with this idea ... some reason why it's stupid?
that's a cool theme, that doesn't have that much literature, that was done on it before. So this is pretty much a brainstorming post, rather than an answer to all your problems ;)
So every TSP can be expressed as a graph, that looks possibly like this one: (taken it from the german Wikipedia)
Now you can run a graph algorithm on it. MapReduce can be used for graph processing quite well, although it has much overhead.
You need a paradigm that is called "Message Passing". It was described in this paper here: Paper.
And I blog'd about it in terms of graph exploration, it tells quite simple how it works. My Blogpost
This is the way how you can tell the mapper what is the current minimum result (maybe just for the vertex itself).
With all the knowledge in the back of the mind, it should be pretty standard to think of a branch and bound algorithm (that you described) to get to the goal. Like having a random start vertex and branching to every adjacent vertex. This causes a message to be send to each of this adjacents with the cost it can be reached from the start vertex (Map Step). The vertex itself only updates its cost if it is lower than the currently stored cost (Reduce Step). Initially this should be set to infinity.
You're doing this over and over again until you've reached the start vertex again (obviously after you visited every other one). So you have to somehow keep track of the currently best way to reach a vertex, this can be stored in the vertex itself, too. And every now and then you have to bound this branching and cut off branches that are too costly, this can be done in the reduce step after reading the messages.
Basically this is just a mix of graph algorithms in MapReduce and a kind of shortest paths.
Note that this won't yield to the optimal way between the nodes, it is still a heuristic thing. And you're just parallizing the NP-hard problem.
BUT a little self-advertising again, maybe you've read it already in the blog post I've linked, there exists an abstraction to MapReduce, that has way less overhead in this kind of graph processing. It is called BSP (Bulk synchonous parallel). It is more freely in the communication and it's computing model. So I'm sure that this can be a lot better implemented with BSP than MapReduce. You can realize these channels you've spoken about better with it.
I'm currently involved in an Summer of Code project which targets these SSSP problems with BSP. Maybe you want to visit if you're interested. This could then be a part solution, it is described very well in my blog, too. SSSP's in my blog
I'm excited to hear some feedback ;)
It seems that Storm implements what I was thinking of. It's essentially a computational topology (think of how each compute node might be routing results based on a key/hashing function to the specific reducers).
This is not exactly what I described, but might be useful if one had a sufficiently low-latency way to propagate current bounding (i.e. local optimum information) which each node in the topology could update/receive in order to know which results to discard.