Kafka Connection handling - error-handling

I am consuming data from kafka with the createDirectStream() API. I would like to catch ClosedChannelException and handle disconnection to zookeeper and kafka topics.
Through tests, I was only able to do this:
try { myStreamingContext.awaitTermination }
catch { case foo:SparkException => if (foo.getMessage contains """ClosedChannelException""") /*do something*/ }
It appears the exception is only "catchable" with the above, or at least, not with try { val stream = createDirectStream([...]) } [...] as I expected.
Is there any method to handle this (and other connection errors) application-side?

Related

Is there a way to make the Kafka consumer poll-function throw errors, rather than the library handling them internally?

I'm working on a Kafka consumer in kotlin/javalin, using the standard kafka library org.apache.kafka.clients.consumer, and struggling a bit with the poll function, as it seems to never throw any errors that can be caught, it just writes warn/errors to the console. For example, when it's not able to reach the broker, it logges a warning that "Connection to node -1 could not be established. Broker may not be available.":
{
"timestamp": "2022-12-14T13:30:58.673+01:00",
"level": "WARN",
"thread": "main",
"logger": "org.apache.kafka.clients.NetworkClient",
"message": "[Consumer clientId=xxx, groupId=xxx] Connection to node -1 (localhost/127.0.0.1:1000) could not be established. Broker may not be available."
}
But it doesn't actually throw any errors, so it's pretty much impossible to handle the error, if you would like to do anything other than just continue to poll forever. Does anyone know if there is some way to configure this behavior? Or am I missing something?
The relevant code
consumer = createConsumer() // This returns a Consumer<String?, String?>
consumer.subscribe(listOf(TOPIC))
while (true) {
val records = consumer.poll(Duration.ofSeconds(1))
records.iterator().forEach {
println(it.key())
}
consumer.commitSync() // Commit offset after finished processing entries
}
I can trigger a timeout-error if I call the partitionsFor-function from the consumer, so this can work as a liveness-probe, but this feels more like a hack than the intended way to do it.
try {
var committed = consumer.partitionsFor(TOPIC)
} catch (e: Exception) {
println(e)
}
Thanks!
The client is dumb, and expects you to provide the correct values.
You can use AdminClient.describeCluster() with the same address to verify connection, then catch/throw RuntimeException from that.
Otherwise, the consumer will retry and update the metadata for your bootstrap.servers until it can connect.

Need Elastic APM support for Ktor backend server

Trying to monitor performance of our Ktor backend application and are able to attach Elastic APM agent to it. Server is visible at Kibana dashboard as a service. But it's not creating transactions automatically for each incoming request. When we manually start a transaction and end it in a specific route, then only it is recording performance for that request. Is there another way to solve this situation?
Tried following approach
Intercepted each request in setup phase and started a transaction, but could not end the transaction facing issue while intercepting the same call at the end.
For each request in controller/route defined below piece of code and it is working.
get("/api/path") {
val transaction: Transaction = ElasticApm.startTransaction()
try {
transaction.setName("MyTransaction#getApi")
transaction.setType(Transaction.TYPE_REQUEST)
// do business logic and response
} catch (e: java.lang.Exception) {
transaction.captureException(e)
throw e
} finally {
transaction.end()
}
}
Adding below line for better search result for other developers.
How to add interceptor on starting and ending on each request in ktor. Example of ApplicationCallPipeline.Monitoring and proceed()
You can use the proceed method that executes the rest of a pipeline to catch any occurred exceptions and finish a transaction:
intercept(ApplicationCallPipeline.Monitoring) {
val transaction: Transaction = ElasticApm.startTransaction()
try {
transaction.setName("MyTransaction#getApi")
transaction.setType(Transaction.TYPE_REQUEST)
proceed() // This will call the rest of a pipeline
} catch (e: Exception) {
transaction.captureException(e)
throw e
} finally {
transaction.end()
}
}
Also, you can use attributes to store a transaction for a call duration (begins when the request has started and ends when the response has been sent).

How to catch error when message have been sent from JMS

I am sending an message through my standalone application that uses EJB MDB to communicate to my other application server that is running on JBOSS server.My application server is connected to a MSSQL server. In certain scenario, connection to the database is lost on application server side and we get following error -
Connection is reset.
Later , when i try to send message i don't get any error at my standalone EJB MDB logs and the process just stops executing.I get error log on application server side logs but same logs don't get propagated to my EJB MDB error logs.
As per my understanding, when db connection is lost all the ejb bean present in jboss container get nullified too.(I could be wrong here, i am new to EJB).
I tried implementing below code in my code that use to send message -
QueueConnection qcon = null;
#PostConstruct
public void initialize() {
System.out.println("In PostConstruct");
try {
qcon = qconFactory.createQueueConnection();
} catch (Exception e) {
e.printStackTrace();
}
}
#PreDestroy
public void releaseResources() {
System.out.println("In PreDestroy");
try {
if(qcon != null)
{
qcon.close();
}
if(qcon== null){
throw new Exception(" new exception occured.");
}
} catch (Exception e) {
e.printStackTrace();
}
}
I was in a impression that Queueconnection object will be nullified, when our db connection have been lost(as we are creating bean and making connection for message). But it doesn't seem to work.
I did found a way to call back my application after sending message. I used a separate temporary queue and used setJMSReplyTo method to set the reply destination. More info could be obtained from this
link. Hope this helps others.

How can my WCF service recover from unavailable message queue?

I have a WCF service that receives messages from the Microsoft Message Queue (netMsmqBinding).
I want my service to recover if the message queue is unavailable. My code should fail to open the service, but then try again after a delay.
I have code to recognize the error when the queue is unavailable:
static bool ExceptionIsBecauseMsmqNotStarted(TypeInitializationException ex)
{
MsmqException msmqException = ex.InnerException as MsmqException;
return ((msmqException != null) && msmqException.HResult == (unchecked((int)(0xc00e000b))));
}
So this should be straightforward: I call ServiceHost.Open(), catch this exception, wait for a second or two, then repeat until my Open call is successful.
The problem is, if this exception gets thrown once, it continues to be thrown. The message queue might have become available, but my running process is in a bad state and I continue to get the TypeInitializationException until I shut down my process and restart it.
Is there a way around this problem? Can I make WCF forgive the queue and genuinely try to listen to it again?
Here is my service opening code:
public async void Start()
{
try
{
_log.Debug("Starting the data warehouse service");
while(!_cancellationTokenSource.IsCancellationRequested)
{
try
{
_serviceHost = new ServiceHost(_dataWarehouseWriter);
_serviceHost.Open();
return;
}
catch (TypeInitializationException ex)
{
_serviceHost.Abort();
if(!ExceptionIsBecauseMsmqNotStarted(ex))
{
throw;
}
}
await Task.Delay(1000, _cancellationTokenSource.Token);
}
}
catch (Exception ex)
{
_log.Error("Failed to start the service host", ex);
}
}
And here is the stack information. The first time it is thrown the stack trace of the inner exception is:
at System.ServiceModel.Channels.MsmqQueue.GetMsmqInformation(Version& version, Boolean& activeDirectoryEnabled)
at System.ServiceModel.Channels.Msmq..cctor()
And the top entries of the outer exception stack:
at System.ServiceModel.Channels.MsmqChannelListenerBase`1.get_TransportManagerTable()
at System.ServiceModel.Channels.TransportManagerContainer..ctor(TransportChannelListener listener)
Microsoft have made the source code to WCF visible, so now we can work out exactly what's going on.
The bad news: WCF is implemented in such a way that if the initial call to ServiceModel.Start() triggers a queueing error there is no way to recover.
The WCF framework includes an internal class called MsmqQueue. This class has a static constructor. The static constructor invokes GetMsmqInformation, which can throw an exception.
Reading the C# Programming Guide on static constructors:
If a static constructor throws an exception, the runtime will not invoke it a second time, and the type will remain uninitialized for the lifetime of the application domain in which your program is running.
There is a programming lesson here: Don't put exception throwing code in a static constructor!
The obvious solution lies outside of the code. When I create my hosting service, I could add a service dependency on the message queue service. However, I would rather fix this problem with code then configuration.
Another solution is to manually check that the queue is available using non-WCF code.
The method System.Messaging.MessageQueue.Exists returns false if the message queue service is unavailable. Knowing this, the following works:
private const string KNOWN_QUEUE_PATH = #".\Private$\datawarehouse";
private static string GetMessageQueuePath()
{
// We can improve this by extracting the queue path from the configuration file
return KNOWN_QUEUE_PATH;
}
public async void Start()
{
try
{
_log.Debug("Starting the data warehouse service");
string queuePath = GetMessageQueuePath();
while(!_cancellationTokenSource.IsCancellationRequested)
{
if (!(System.Messaging.MessageQueue.Exists(queuePath)))
{
_log.Warn($"Unable to find the queue {queuePath}. Will try again shortly");
await Task.Delay(60000, _cancellationTokenSource.Token);
}
else
{
_serviceHost = new ServiceHost(_dataWarehouseWriter);
_serviceHost.Open();
return;
}
}
}
catch(System.OperationCanceledException)
{
_log.Debug("The service start operation was cancelled");
}
catch (Exception ex)
{
_log.Error("Failed to start the service host", ex);
}
}

What WCF Exceptions should I retry on failure for? (such as the bogus 'xxx host did not receive a reply within 00:01:00')

I have a WCF client that has thrown this common error, just to be resolved with retrying the HTTP call to the server. For what it's worth this exception was not generated within 1 minute. It was generated in 3 seconds.
The request operation sent to xxxxxx
did not receive a reply within the
configured timeout (00:01:00). The
time allotted to this operation may
have been a portion of a longer
timeout. This may be because the
service is still processing the
operation or because the service was
unable to send a reply message. Please
consider increasing the operation
timeout (by casting the channel/proxy
to IContextChannel and setting the
OperationTimeout property) and ensure
that the service is able to connect to
the client
How are professionals handling these common WCF errors? What other bogus errors should I handle.
For example, I'm considering timing the WCF call and if that above (bogus) error is thrown in under 55 seconds, I retry the entire operation (using a while() loop). I believe I have to reset the entire channel, but I'm hoping you guys will tell me what's right to do.
What other
I make all of my WCF calls from a custom "using" statement which handles exceptions and potential retires. My code optionally allows me to pass a policy object to the statement so I can easily change the behavior, like if I don't want to retry on error.
The gist of the code is as follows:
[MethodImpl(MethodImplOptions.NoInlining)]
public static void ProxyUsing<T>(ClientBase<T> proxy, Action action)
where T : class
{
try
{
proxy.Open();
using(OperationContextScope context = new OperationContextScope(proxy.InnerChannel))
{
//Add some headers here, or whatever you want
action();
}
}
catch(FaultException fe)
{
//Handle stuff here
}
finally
{
try
{
if(proxy != null
&& proxy.State != CommunicationState.Faulted)
{
proxy.Close();
}
else
{
proxy.Abort();
}
}
catch
{
if(proxy != null)
{
proxy.Abort();
}
}
}
}
You can then use the call like follows:
ProxyUsing<IMyService>(myService = GetServiceInstance(), () =>
{
myService.SomeMethod(...);
});
The NoInlining call probably isn't important for you. I need it because I have some custom logging code that logs the call stack after an exception, so it's important to preserve that method hierarchy in that case.