RabbitMQPublisher sometimes fails to recover - rabbitmq

I am using Vertx 4.2.1 with RabbitMQ client and I just noticed that sometimes when the rabbitMQ client loose connection and reconnect, the RabbitMQPublisher is not able to publish messages anymore. It means that my call to publisherClient.rxPublish(...) never completes and it does not throw any error.
My client settings are:
new RabbitMQOptions().setAutomaticRecoveryEnabled(true)
.setReconnectAttempts(0)
.setNetworkRecoveryInterval(1000L);
Are there some settings or something to prevent this situation ?
For now, I am trying to resolve the issue with the following workaround:
publisherClient.rxPublish(......)
.timeout(5, TimeUnit.SECONDS)
.doOnError(err -> {
if (err instanceof TimeoutException) {
LOG.warn("Publisher did not recover, so it will be restarted");
publisherClient.restart();
}
})
.retry(1L, err -> err instanceof TimeoutException)
As a small update on the issue:
It seems to be reproducible, if we try to publish a message while a connection to RabbitMQ is down, we won't be able to publish any message later even if the connection is recovered and everything seems to be ok. The call to publisherClient.rxPublish(......) never completes
Thanks for help

How are you testing this? And how are you creating the exchange and queue that you are publishing to?
If the objects are not there when the server comes back then all messages will silently disappear.
In order to create the objects you need to use the connection established callbacks, which are not currently accessible via the Rx wrapper.

try this:
publisherClient.rxPublish(......)
.timeout(5, TimeUnit.SECONDS)
.doOnError(err -> {
if (err instanceof TimeoutException) {
LOG.warn("Publisher did not recover, so it will be restarted");
publisherClient.restart();
}
})
.retry(1L, err -> err instanceof TimeoutException)
.subscribe()

Related

Is there a way to make the Kafka consumer poll-function throw errors, rather than the library handling them internally?

I'm working on a Kafka consumer in kotlin/javalin, using the standard kafka library org.apache.kafka.clients.consumer, and struggling a bit with the poll function, as it seems to never throw any errors that can be caught, it just writes warn/errors to the console. For example, when it's not able to reach the broker, it logges a warning that "Connection to node -1 could not be established. Broker may not be available.":
{
"timestamp": "2022-12-14T13:30:58.673+01:00",
"level": "WARN",
"thread": "main",
"logger": "org.apache.kafka.clients.NetworkClient",
"message": "[Consumer clientId=xxx, groupId=xxx] Connection to node -1 (localhost/127.0.0.1:1000) could not be established. Broker may not be available."
}
But it doesn't actually throw any errors, so it's pretty much impossible to handle the error, if you would like to do anything other than just continue to poll forever. Does anyone know if there is some way to configure this behavior? Or am I missing something?
The relevant code
consumer = createConsumer() // This returns a Consumer<String?, String?>
consumer.subscribe(listOf(TOPIC))
while (true) {
val records = consumer.poll(Duration.ofSeconds(1))
records.iterator().forEach {
println(it.key())
}
consumer.commitSync() // Commit offset after finished processing entries
}
I can trigger a timeout-error if I call the partitionsFor-function from the consumer, so this can work as a liveness-probe, but this feels more like a hack than the intended way to do it.
try {
var committed = consumer.partitionsFor(TOPIC)
} catch (e: Exception) {
println(e)
}
Thanks!
The client is dumb, and expects you to provide the correct values.
You can use AdminClient.describeCluster() with the same address to verify connection, then catch/throw RuntimeException from that.
Otherwise, the consumer will retry and update the metadata for your bootstrap.servers until it can connect.

Ratchet PHP server establishes connection, but Kotlin never receives acknowledgement

I have a ratchet server, that I try to access via Websocket. It is similar to the tutorial: logging when there is a new client or when it receives a message. The Ratchet server reports having successfully established a connection while the Kotlin client does not (the connection event in Kotlin is never fired). I am using the socket-io-java module v.2.0.1. The client shows a timeout after the specified timeout time, gets detached at the server and attaches again after a short while, just as it seems to think, the connection did not properly connect (because of a missing connection response?).
The successful connection confirmation gets reported to the client, if the client is a Websocket-Client in the JS-console of Chrome, but not to my Kotlin app. Even an Android emulator running on the same computer doesn´t get a response (So I think the problem is not wi-fi related).
The connection works fine with JS, completing the full handshake, but with an Android app it only reaches the server, but never the client again.
That´s my server code:
<?php
namespace agroSMS\Websockets;
use Ratchet\ConnectionInterface;
use Ratchet\MessageComponentInterface;
class SocketConnection implements MessageComponentInterface
{
protected \SplObjectStorage $clients;
public function __construct() {
$this->clients = new \SplObjectStorage;
}
function onOpen(ConnectionInterface $conn)
{
$this->clients->attach($conn);
error_log("New client attached");
}
function onClose(ConnectionInterface $conn)
{
$this->clients->detach($conn);
error_log("Client detached");
}
function onError(ConnectionInterface $conn, \Exception $e)
{
echo "An error has occurred: {$e->getMessage()}\n";
$conn->close();
}
function onMessage(ConnectionInterface $from, $msg)
{
error_log("Received message: $msg");
// TODO: Implement onMessage() method.
}
}
And the script that I run in the terminal:
<?php
use Ratchet\Server\IoServer;
use agroSMS\Websockets\SocketConnection;
use Ratchet\WebSocket\WsServer;
use Ratchet\Http\HttpServer;
require dirname(__DIR__) . '/vendor/autoload.php';
$server = IoServer::factory(
new HttpServer(
new WsServer(
new SocketConnection()
)
)
);
$server->run();
What I run in the browser for tests (returns "Connection established" in Chrome, but for some reason not in the Browser "Brave"):
var conn = new WebSocket('ws://<my-ip>:80');
conn.onopen = function(e) {
console.log("Connection established!");
};
conn.onmessage = function(e) {
console.log(e.data);
};
What my Kotlin-code looks like:
try {
val uri = URI.create("ws://<my-ip>:80")
val options = IO.Options.builder()
.setTimeout(60000)
.setTransports(arrayOf(WebSocket.NAME))
.build()
socket = IO.socket(uri, options)
socket.connect()
.on(Socket.EVENT_CONNECT) {
Log.d(TAG, "[INFO] Connection established")
socket.send(jsonObject)
}
.once(Socket.EVENT_CONNECT_ERROR) {
val itString = gson.toJson(it)
Log.d(TAG, itString)
}
}catch(e : Exception) {
Log.e(TAG, e.toString())
}
After a minute the Kotlin code logs a "timeout"-error, detaches from the server, and attaches again.
When I stop the script on the server, it then gives an error: "connection reset, websocket error" (which makes sense, but why doesn´t he get the connection in the first time?)
I also tried to "just" change the protocol to "wss" in the url, in case it might be the problem, even though my server doesn´t even work with SSL, but this just gave me another error:
[{"cause":{"bytesTransferred":0,"detailMessage":"Read timed out","stackTrace":[],"suppressedExceptions":[]},"detailMessage":"websocket error","stackTrace":[],"suppressedExceptions":[]}]
And the connection isn´t even established at the server. So this try has been more like a down-grade.
I went to the github page of socket.io-java-client to find a solution to my problem there and it turned out, the whole problem was, that I misunderstood a very important concept:
That socket.io uses Websockets doesn´t mean it is compatible with Websockets.
So speaking in clear words:
If you use socket.io at client side, you also need to use it at the server side and vice versa. Since socket.io sends a lot of meta data with its packets, a pure Websocket-server will accept their connection establishment, but his acknowledgement coming back will not be accepted by the socket.io client.
You have to go for either full socket.io or full pure Websockets.

RabbitMQ and Webjob SDK. Intentionally failing function

I have a webjob that is listening to a rabbitMQ queue and will pick messages of the queue and post to another service.
However, if the service fails (returns 5xx errors or similar) I would like to put the message back into the queue and try again later.
I am using a rabbitmq extension to webjob sdk (https://github.com/Sarmaad/WebJobs.Extensions.RabbitMQ/tree/master/WebJobs.Extensions.RabbitMQ) and if I understand correctly this will happend if the webjob "function" fails. Is there a way of failing it intentionally? (Other than throwing exceptions)
From the extension source code:
var result = _executor.TryExecuteAsync(new TriggeredFunctionData{TriggerValue = triggerValue}, CancellationToken.None).Result;
if (result.Succeeded)
_channel.BasicAck(args.DeliveryTag, false);
else
_channel.BasicNack(args.DeliveryTag, false, false);
According to Azure WebJob SDK, we know that the WebJobs status depends on whether your WebJob/Function is executed without any exceptions or not. We can't set the finial status of a running WebJob programmatically.
Code from TriggeredFunctionExecutor class.
public async Task<FunctionResult> TryExecuteAsync(TriggeredFunctionData input, CancellationToken cancellationToken)
{
IFunctionInstance instance = _instanceFactory.Create((TTriggerValue)input.TriggerValue, input.ParentId);
IDelayedException exception = await _executor.TryExecuteAsync(instance, cancellationToken);
FunctionResult result = exception != null ?
new FunctionResult(exception.Exception)
: new FunctionResult(true);
return result;
}
Is there a way of failing it intentionally? (Other than throwing exceptions)
So the answer to your question is no. Throwing exceptions is the only way to do it.

Redis Timeout Expired message on GetClient call

I hate the questions that have "Not Enough Info". So I will try to give detailed information. And in this case it is code.
Server:
64 bit of https://github.com/MSOpenTech/redis/tree/2.6/bin/release
There are three classes:
DbOperationContext.cs: https://gist.github.com/glikoz/7119628
PerRequestLifeTimeManager.cs: https://gist.github.com/glikoz/7119699
RedisRepository.cs https://gist.github.com/glikoz/7119769
We are using Redis with Unity ..
In this case we are getting this strange message:
"Redis Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use.";
We checked these:
Is the problem configuration issue
Are we using wrong RedisServer.exe
Is there any architectural problem
Any idea? Any similar story?
Thanks.
Extra Info 1
There is no rejected connection issue on server stats (I've checked it via redis-cli.exe info command)
I have continued to debug this problem, and have fixed numerous things on my platform to avoid this exception. Here is what I have done to solve the issue:
Executive summary:
People encountering this exception should check:
That the PooledRedisClientsManager (IRedisClientsManager) is registed in a singleton scope
That the RedisMqServer (IMessageService) is registered in a singleton scope
That any utilized RedisClient returned from either of the above is properly disposed of, to ensure that the pooled clients are not left stale.
The solution to my problem:
First of all, this exception is thrown by the PooledRedisClient because it has no more pooled connections available.
I'm registering all the required Redis stuff in the StructureMap IoC container (not unity as in the author's case). Thanks to this post I was reminded that the PooledRedisClientManager should be a singleton - I also decided to register the RedisMqServer as a singleton:
ObjectFactory.Configure(x =>
{
// register the message queue stuff as Singletons in this AppDomain
x.For<IRedisClientsManager>()
.Singleton()
.Use(BuildRedisClientsManager);
x.For<IMessageService>()
.Singleton()
.Use<RedisMqServer>()
.Ctor<IRedisClientsManager>().Is(i => i.GetInstance<IRedisClientsManager>())
.Ctor<int>("retryCount").Is(2)
.Ctor<TimeSpan?>().Is(TimeSpan.FromSeconds(5));
// Retrieve a new message factory from the singleton IMessageService
x.For<IMessageFactory>()
.Use(i => i.GetInstance<IMessageService>().MessageFactory);
});
My "BuildRedisClientManager" function looks like this:
private static IRedisClientsManager BuildRedisClientsManager()
{
var appSettings = new AppSettings();
var redisClients = appSettings.Get("redis-servers", "redis.local:6379").Split(',');
var redisFactory = new PooledRedisClientManager(redisClients);
redisFactory.ConnectTimeout = 5;
redisFactory.IdleTimeOutSecs = 30;
redisFactory.PoolTimeout = 3;
return redisFactory;
}
Then, when it comes to producing messages it's very important that the utilized RedisClient is properly disposed of, otherwise we run into the dreaded "Timeout Expired" (thanks to this post). I have the following helper code to send a message to the queue:
public static void PublishMessage<T>(T msg)
{
try
{
using (var producer = GetMessageProducer())
{
producer.Publish<T>(msg);
}
}
catch (Exception ex)
{
// TODO: Log or whatever... I'm not throwing to avoid showing users that we have a broken MQ
}
}
private static IMessageQueueClient GetMessageProducer()
{
var producer = ObjectFactory.GetInstance<IMessageService>() as RedisMqServer;
var client = producer.CreateMessageQueueClient();
return client;
}
I hope this helps solve your issue too.

What WCF Exceptions should I retry on failure for? (such as the bogus 'xxx host did not receive a reply within 00:01:00')

I have a WCF client that has thrown this common error, just to be resolved with retrying the HTTP call to the server. For what it's worth this exception was not generated within 1 minute. It was generated in 3 seconds.
The request operation sent to xxxxxx
did not receive a reply within the
configured timeout (00:01:00). The
time allotted to this operation may
have been a portion of a longer
timeout. This may be because the
service is still processing the
operation or because the service was
unable to send a reply message. Please
consider increasing the operation
timeout (by casting the channel/proxy
to IContextChannel and setting the
OperationTimeout property) and ensure
that the service is able to connect to
the client
How are professionals handling these common WCF errors? What other bogus errors should I handle.
For example, I'm considering timing the WCF call and if that above (bogus) error is thrown in under 55 seconds, I retry the entire operation (using a while() loop). I believe I have to reset the entire channel, but I'm hoping you guys will tell me what's right to do.
What other
I make all of my WCF calls from a custom "using" statement which handles exceptions and potential retires. My code optionally allows me to pass a policy object to the statement so I can easily change the behavior, like if I don't want to retry on error.
The gist of the code is as follows:
[MethodImpl(MethodImplOptions.NoInlining)]
public static void ProxyUsing<T>(ClientBase<T> proxy, Action action)
where T : class
{
try
{
proxy.Open();
using(OperationContextScope context = new OperationContextScope(proxy.InnerChannel))
{
//Add some headers here, or whatever you want
action();
}
}
catch(FaultException fe)
{
//Handle stuff here
}
finally
{
try
{
if(proxy != null
&& proxy.State != CommunicationState.Faulted)
{
proxy.Close();
}
else
{
proxy.Abort();
}
}
catch
{
if(proxy != null)
{
proxy.Abort();
}
}
}
}
You can then use the call like follows:
ProxyUsing<IMyService>(myService = GetServiceInstance(), () =>
{
myService.SomeMethod(...);
});
The NoInlining call probably isn't important for you. I need it because I have some custom logging code that logs the call stack after an exception, so it's important to preserve that method hierarchy in that case.