Exportable Zipkin/Brave/Spring-Cloud-Sleuth Span cannot be found in Zipkin - spring-cloud-sleuth

I can't find a below exportable Span in Zipkin neither by it's traceId nor by spanId (some other spans appear, so Zipkin server seems to work)
{"timestamp":"2020-08-13 00:48:52.471","level":"INFO","thread":"xxx dispatcher: xxx","mdc":{"traceId":"481bef72295477ac","spanId":"509cdbbac8833590",
"spanExportable":"true","X-Span-Export":"true","X-B3-SpanId":"509cdbbac8833590","X-B3-ParentSpanId":"37eca1021fd5241c","X-B3-TraceId":"481bef72295477ac",
"parentId":"37eca1021fd5241c"},"logger":"xxxService","message":"Sending response xxxMsg to RabbitMQ channel","context":"default"}
I also can't find it's parent "parentId":"37eca1021fd5241c" in Zipkin.
Where can be a problem? How can I bite/debug it?
Possibly this span is in a flow, that was triggered by a rabbit message, not a rest request. Spans from a trace that were triggered by http rest request are correctly visible in Zipkin. But I can't find traces from flows triggered by rabbit message. What problem could be here?

I was using it with default storage which is discouraged in production use, it can handle only small amount of data and can be treated only as a demo version.
What helped a little was setting
spring.sleuth.sampler.probability: 0.01
-- by default it logs all spans.

Related

Disable sleuth for from storing some traces

I am using spring cloud finchley.rc2 with spring boot version 2 along with sleuth and zipkin.
I have a facade layer which uses reactor project. Facade calls services parallel and each service store some tracing info in rabbit mq.
Issue is I am seeing in zipkin some spans like
facade.async
service.publish > Because of mq
How can i stop such traces from being captured
Can you follow the guidelines described here https://stackoverflow.com/help/how-to-ask and the next question you ask, ask it with more details? E.g. I have no idea how exactly you use Sleuth? Anyways I'll try to answer...
You can create a SpanAdjuster bean, that will analyze the span information (e.g. span tags) and basing on that information you will change the sampling decision so as not to send it to Zipkin.
Another option is to wrap the default span reporter in a similar logic.
Yet another option is to verify what kind of a thread it is that is creating this span and toggle it off (assuming that it's a #Scheduled method) - https://cloud.spring.io/spring-cloud-static/Finchley.RC2/single/spring-cloud.html#__literal_scheduled_literal_annotated_methods

Immediate flag in RabbitMQ

I have a clients that uses API. The API sends messeges to rabbitmq. Rabbitmq to workers.
I ought to reply to clients if somethings went wrong - message wasn't routed to a certain queue and wasn't obtained for performing at this time ( full confirmation )
A task who is started after 5-10 seconds does not make sense.
Appropriately, I must use mandatory and immediate flags.
I can't increase counts of workers, I can't run workers on another servers. It's a demand.
So, as I could find the immediate flag hadn't been supporting since rabbitmq v.3.0x
The developers of rabbitmq suggests to use TTL=0 for a queue instead but then I will not be able to check status of message.
Whether any opportunity to change that behavior? Please, share your experience how you solved problems like this.
Thank you.
I'm not sure, but after reading your original question in Russian, it might be that using both publisher and consumer confirms may be what you want. See last three paragraphs in this answer.
As you want to get message result for published message from your worker, it looks like RPC pattern is what you want. See RabbitMQ RPC tuttorial. Pick a programming language section there you most comfortable with, overall concept is the same. You may also find Direct reply-to useful.
It's not the same as immediate flag functionality, but in case all your publishers operate with immediate scenario, it might be that AMQP protocol is not the best choice for such kind of task. Immediate mean "deliver this message right now or burn in hell" and it might be a situation when you publish more than you can process. In such cases RPC + response timeout may be a good choice on application side (e.g. socket timeout). But it doesn't work well for non-idempotent RPC calls while message still be processed, so you may want to use per-queue or per-message TTL (or set queue length limit). In case message will be dead-lettered, you may get it there (in case you need that for some reason).
TL;DR
As to "something" can go wrong, it can go so on different levels which we for simplicity define as:
before RabbitMQ, like sending application failure and network problems;
inside RabbitMQ, say, missed destination queue, message timeout, queue length limit, some hard and unexpected internal error;
after RabbitMQ, in most cases - messages processing application error or some third-party services like data persistence or caching layer outage.
Some errors like network outage or hardware error are a bit epic and are not a subject of this q/a.
Typical scenario for guaranteed message delivery is to use publisher confirms or transactions (which are slower). After you got a confirm it mean that RabbitMQ got your message and if it has route - placed in a queue. If not it is dropped OR if mandatory flag set returned with basic.return method.
For consumers it's similar - after basic.consumer/basic.get, client ack'ed message it considered received and removed from queue.
So when you use confirms on both ends, you are protected from message loss (we'll not run into a situation that there might be some bug in RabbitMQ itself).
Bogdan, thank you for your reply.
Seems, I expressed my thought enough clearly.
Scheme may looks like this. Each component of system must do what it must do :)
The an idea is make every component more simple.
How to task is performed.
Clients goes to HTTP-API with requests and must obtain a respones like this:
Positive - it have put to queue
Negative - response with error and a reason
When I was talking about confirmation I meant that I must to know that a message is delivered ( there are no free workers - rabbitmq can remove a message ), a client must be notified.
A sent message couldn't be delivered to certain queue, a client must be notified.
How to a message is handled.
Messages is sent for performing.
Status of perfoming is written into HeartBeat
Status.
Clients obtain status from HeartBeat by itself and then decide that
it's have to do.
I'm not sure, that RPC may be useful for us i.e. RPC means that clients must to wait response from server. Tasks may works a long time. Excess bound between clients and servers, additional logic on client-side.
Limited size of queue maybe not useful too.
Possible situation when a size of queue maybe greater than counts of workers. ( problem in configuration or defined settings ).
Then an idea with 5-10 seconds doesn't make sense.
TTL doesn't usefull because of:
Setting the TTL to 0 causes messages to be expired upon reaching a
queue unless they can be delivered to a consumer immediately. Thus
this provides an alternative to basic.publish's immediate flag, which
the RabbitMQ server does not support. Unlike that flag, no
basic.returns are issued, and if a dead letter exchange is set then
messages will be dead-lettered.
direct reply-to :
The RPC server will then see a reply-to property with a generated
name. It should publish to the default exchange ("") with the routing
key set to this value (i.e. just as if it were sending to a reply
queue as usual). The message will then be sent straight to the client
consumer.
Then I will not be able to route messages.
So, I'm sorry. I may flounder in terms i.e. I'm new in AMQP and rabbitmq.

Multiple acknowledge for the same delivery tag

In my project I saw that there is a chance of acknowledging the same delivery tag twice. When this happens, the consumer gets unbound from the queue and no further messages come to the consumer (Observed using the RabbitMQ management dashboard).
How can I check that a given delivery tag has already been acknowledged? Is there a recommended way to handle such scenario using the RabbitMQ API?
I tried to avoid acknowledging twice in my code but unfortunately it is not possible due to some design issues.
As the AMQP protocol reference is pretty clear about this:
A message MUST not be acknowledged more than once. The receiving peer MUST validate that a non-zero delivery-tag refers to a delivered message, and raise a channel exception if this is not the case. ...
A quick test reveals that, at least in current versions, this does not cause a consumer to stop working, but that behavior might be implementation-dependent.
In short, you would have to review your design to avoid this situation.

Implementing the reliability pattern in CloudHub with VM queues

I have more-or-less implemented the Reliability Pattern in my Mule application using persistent VM queues CloudHub, as documented here. While everything works fine, it has left me with a number of questions about actually ensuring reliable delivery of my messages. To illustrate the points below, assume I have http-request component within my "application logic flow" (see the diagram on the link above) that is throwing an exception because the endpoint is down, and I want to ensure that the in flight message will eventually get delivered to the endpoint:
As detailed on the link above, I have observed that when the exception is thrown within my "application logic flow", and I have made the flow transactional, the message is put back on the VM queue. However all that happens is the message then repeatedly taken off the queue, processed by the flow, and the exception is thrown again - ad infinitum. There appears to be no way of configuring any sort of retry delay or maximum number of retries on VM queues as is possible, for example, with ActiveMQ. The best work around I have come up with is to surround the http-request message processor with the until-successful scope, but I'd rather have these sorts of things apply to my whole flow (without having to wrap the whole flow in until-successful). Is this sort of thing possible using only VM queues and CloudHub?
I have configured my until-successful to place the message on another VM queue which I want to use as a dead-letter-queue. Again, this works fine, and I can login to CloudHub and see the messages populated on my DLQ - but then it appears to offer no way of moving messages from this queue back into the flow when the endpoint comes back up. All it seems you can do in CloudHub is clear your queue. Again, is this possible using VM queues and CloudHub only (i.e. no other queueing tool)?
VM queues are very basic, whether you use them in CloudHub or not.
VM queues have no capacity for delaying redelivery (like exponential back-offs). Use JMS queues if you need such features.
You need to create a flow for processing the DLQ, for example one that regularly consumes the queue via the requester module and re-injects the messages into the main queue. Again, with JMS, you would have better control.
Alternatively to JMS, you could consider hosted queues like CloudAMQP, Iron.io or AWS SQS. You would lose transaction support on the inbound endpoint but would gain better control on the (re)delivery behaviour.

How to detect alarm-based blocking RabbitMQ producer?

I have a producer sending durable messages to a RabbitMQ exchange. If the RabbitMQ memory or disk exceeds the watermark threshold, RabbitMQ will block my producer. The documentation says that it stops reading from the socket, and also pauses heartbeats.
What I would like is a way to know in my producer code that I have been blocked. Currently, even with a heartbeat enabled, everything just pauses forever. I'd like to receive some sort of exception so that I know I've been blocked and I can warn the user and/or take some other action, but I can't find any way to do this. I am using both the Java and C# clients and would need this functionality in both. Any advice? Thanks.
Sorry to tell you but with RabbitMQ (at least with 2.8.6) this isn't possible :-(
had a similar problem, which centred around trying to establish a channel when the connection was blocked. The result was the same as what you're experiencing.
I did some investigation into the actual core of the RabbitMQ C# .Net Library and discovered the root cause of the problem is that it goes into an infinite blocking state.
You can see more details on the RabbitMQ mailing list here:
http://rabbitmq.1065348.n5.nabble.com/Net-Client-locks-trying-to-create-a-channel-on-a-blocked-connection-td21588.html
One suggestion (which we didn't implement) was to do the work inside of a thread and have some other component manage the timeout and kill the thread if it is exceeded. We just accepted the risk :-(
The Rabbitmq uses a blocking rpc call that listens for a reply indefinitely.
If you look the Java client api, what it does is:
AMQChannel.BlockingRpcContinuation k = new AMQChannel.SimpleBlockingRpcContinuation();
k.getReply(-1);
Now -1 passed in the argument blocks until a reply is received.
The good thing is you could pass in your timeout in order to make it return.
The bad thing is you will have to update the client jars.
If you are OK with doing that, you could pass in a timeout wherever a blocking call like above is made.
The code would look something like:
try {
return k.getReply(200);
} catch (TimeoutException e) {
throw new MyCustomRuntimeorTimeoutException("RabbitTimeout ex",e);
}
And in your code you could handle this exception and perform your logic in this event.
Some related classes that might require this fix would be:
com.rabbitmq.client.impl.AMQChannel
com.rabbitmq.client.impl.ChannelN
com.rabbitmq.client.impl.AMQConnection
FYI: I have tried this and it works.