Akka.Remote - cannot send messages to remote actor after dissassociation - akka.net

I am using Akka.Remote to communicate between a server-side service application and multiple desktop client applications. The clients send a request message to the server (using Akka.net) and waits for the server to reply with a response message. The client applications are transient, meaning that they often connect to the server, stay connected for some time, disconnect and then reconnect again.
The problem I encountered is that sometimes when a client disconnects from the server actor (by shutting down its ActorSystem) and then reconnects back to the server, it does not receive any replies from the server for some time. After a few minutes the communication works without any problems. I found out that this issue occurs when the server sends a reply to a client that has disconnected during the request and is no longer reachable. The server cannot deliver the response message and it somehow marks the client endpoint as invalid.
In the log (on the server side) I am getting the following messages when the client is disconnected.
[DEBUG] 2016-01-21 13:04:58.6151 received AutoReceiveMessage <Terminated>: [akka.tcp://qb#client:8090/user/qb] - ExistenceConfirmed=True ServerActor
[DEBUG] 2016-01-21 13:04:58.6550 Stopped Akka.Remote.Transport.ProtocolStateActor
[ INFO] 2016-01-21 13:04:58.6550 Quarantined address [akka.tcp://qb#client:8090] is still unreachable or has not been restarted. Keeping it quarantined. Akka.Event.DummyClassForStringSources
[DEBUG] 2016-01-21 13:04:58.6725 Stopped Akka.Remote.ReliableDeliverySupervisor
[DEBUG] 2016-01-21 13:04:58.6725 no longer watched by [akka://myservice/system/endpointManager/reliableEndpointWriter-akka.tcp%3a%2f%2fqb%40client%3a8090-2] Akka.Remote.EndpointWriter
[DEBUG] 2016-01-21 13:04:58.6725 Disassociated [akka.tcp://myservice#server:8081] <- akka.tcp://qb#client:8090 Akka.Remote.EndpointWriter
[DEBUG] 2016-01-21 13:04:58.6725 Stopped Akka.Remote.EndpointWriter
And then when the client attempts to reconnect, I get:
[DEBUG] 2016-01-21 13:05:15.5883 ConnectResponse [akka.tcp://qb#client:8090/user/qb] ServerActor
[DEBUG] 2016-01-21 13:05:16.0467 Started (Akka.Remote.Transport.ProtocolStateActor) Akka.Remote.Transport.ProtocolStateActor
[DEBUG] 2016-01-21 13:05:16.0467 Stopped Akka.Remote.Transport.ProtocolStateActor
[ WARN] 2016-01-21 13:05:16.0467 AssociationError [akka.tcp://myservice#server:8081] -> akka.tcp://qb#client:8090: Error [Invalid address: akka.tcp://qb#client:8090] [] Akka.Remote.EndpointWriter
[ INFO] 2016-01-21 13:05:16.0467 Quarantined address [akka.tcp://qb#client:8090] is still unreachable or has not been restarted. Keeping it quarantined. Akka.Event.DummyClassForStringSources
[DEBUG] 2016-01-21 13:05:16.0643 Stopped Akka.Remote.ReliableDeliverySupervisor
[DEBUG] 2016-01-21 13:05:16.0711 no longer watched by [akka://myservice/system/endpointManager/reliableEndpointWriter-akka.tcp%3a%2f%2fqb%40client%3a8090-4] Akka.Remote.EndpointWriter
[DEBUG] 2016-01-21 13:05:16.0711 Disassociated [akka.tcp://myservice#server:8081] -> akka.tcp://qb#client:8090 Akka.Remote.EndpointWriter
[DEBUG] 2016-01-21 13:05:16.0711 Stopped Akka.Remote.EndpointWriter
[DEBUG] 2016-01-21 13:05:16.0867 received AutoReceiveMessage <Terminated>: [akka://myservice/system/endpointManager/reliableEndpointWriter-akka.tcp%3a%2f%2fqb%40client%3a8090-4] - ExistenceConfirmed=True Akka.Remote.EndpointManager
[DEBUG] 2016-01-21 13:05:16.0867 Terminated [akka.tcp://qb#client:8090/user/qb] ServerActor
I suspect that this behavior is a feature of Akka.net, however, I need to implement my system so that clients can disconnect and then reconnect back to the server without the need to wait. Is there any way to disable the quarantine mechanism or to gracefully close the client endpoint on the server so that the client endpoint doesn't get quarantined?

[ INFO] 2016-01-21 13:04:58.6550 Quarantined address [akka.tcp://qb#client:8090] is still unreachable or has not been restarted. Keeping it quarantined. - that says it all. The node was quarantined which requires a restart of the actor system.
However, IMHO - just upgrade to Akka.NET 1.0.6, which we released on Monday. We made the remoting policy manager much less brittle than it has been historically.

Related

RabbitMQ ignore config "heartbeat" rule

RabbitMQ 3.10.1
rabbitmq-diagnostics status
...
Config files
* /etc/rabbitmq/rabbitmq.config
...
rabbitmq.config:
[
{rabbit,
[
{heartbeat, 90}
]
}
].
RabbitMQ Management show 5s heartbeat
And log:
2022-05-13 19:56:43.235925+03:00 [error] <0.5979.0> closing AMQP connection <0.5979.0> (xxx.xxx.xxx.xxx:3555 -> xxx.xxx.xxx.xxx:5672):
2022-05-13 19:56:43.235925+03:00 [error] <0.5979.0> missed heartbeats from client, timeout: 5s
How to fix this?
Set the heartbeat to 90s in the client. Most clients are able to set the heartbeat (from the client). RabbitMQ will respect the heartbeat suggested by the client. More about that here: https://www.rabbitmq.com/heartbeats.html#heartbeats-timeout

SignalR server in console host not sending keep-alives

I have an ASP.NET Core application using a SignalR hub. When running via a console application (development mode), no keep-alive requests are sent by the server to the client. Consequently, the connection is re-established every 30 seconds or so.
However, when running the same application via Service Fabric, keep-alive requests are sent and everything works as expected.
Here are the server logs when running under the console app:
dbug: Microsoft.AspNetCore.Http.Connections.Internal.HttpConnectionManager[1]
New connection T2NKQg0jyrm7QAPz4p0ZWA created.
dbug: Microsoft.AspNetCore.Http.Connections.Internal.HttpConnectionDispatcher[4]
Establishing new connection.
dbug: Microsoft.AspNetCore.SignalR.HubConnectionHandler[5]
OnConnectedAsync started.
dbug: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[1]
Socket opened using Sub-Protocol: '(null)'.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
Message received. Type: Text, size: 32, EndOfMessage: True.
dbug: Microsoft.AspNetCore.SignalR.Internal.DefaultHubProtocolResolver[2]
Found protocol implementation for requested protocol: json.
dbug: Microsoft.AspNetCore.SignalR.HubConnectionContext[1]
Completed connection handshake. Using HubProtocol 'json'.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[11]
Sending payload: 3 bytes.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
Message received. Type: Text, size: 11, EndOfMessage: True.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
Message received. Type: Text, size: 11, EndOfMessage: True.
dbug: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[4]
Waiting for the application to finish sending data.
dbug: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[2]
Socket closed.
trce: Microsoft.AspNetCore.Http.Connections.Internal.HttpConnectionContext[1]
Disposing connection T2NKQg0jyrm7QAPz4p0ZWA.
trce: Microsoft.AspNetCore.Http.Connections.Internal.HttpConnectionContext[2]
Waiting for application to complete.
dbug: Microsoft.AspNetCore.SignalR.HubConnectionHandler[6]
OnConnectedAsync ending.
trce: Microsoft.AspNetCore.Http.Connections.Internal.HttpConnectionContext[3]
Application complete.
dbug: Microsoft.AspNetCore.Http.Connections.Internal.HttpConnectionManager[2]
Removing connection T2NKQg0jyrm7QAPz4p0ZWA from the list of connections.
dbug: Microsoft.AspNetCore.Http.Connections.Internal.HttpConnectionManager[1]
New connection JW_1AnoGvhvNJ6MdWGb5RA created.
dbug: Microsoft.AspNetCore.Http.Connections.Internal.HttpConnectionDispatcher[4]
Establishing new connection.
dbug: Microsoft.AspNetCore.SignalR.HubConnectionHandler[5]
OnConnectedAsync started.
dbug: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[1]
Socket opened using Sub-Protocol: '(null)'.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
Message received. Type: Text, size: 32, EndOfMessage: True.
dbug: Microsoft.AspNetCore.SignalR.Internal.DefaultHubProtocolResolver[2]
Found protocol implementation for requested protocol: json.
dbug: Microsoft.AspNetCore.SignalR.HubConnectionContext[1]
Completed connection handshake. Using HubProtocol 'json'.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[11]
Sending payload: 3 bytes.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
Message received. Type: Text, size: 11, EndOfMessage: True.
And the client logs:
Microsoft.AspNetCore.Http.Connections.Client.HttpConnection: Debug: Transport 'WebSockets' started.
Microsoft.AspNetCore.Http.Connections.Client.HttpConnection: Information: HttpConnection Started.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Information: Using HubProtocol 'json v1'.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Debug: Sending Hub Handshake.
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Debug: Received message from application. Payload size: 32.
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Debug: Message received. Type: Text, size: 3, EndOfMessage: True.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Debug: Handshake with server complete.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Debug: Receive loop starting.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Debug: Sending PingMessage message.
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Debug: Received message from application. Payload size: 11.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Debug: Sending PingMessage message completed.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Information: HubConnection started.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Trace: The HubConnection is attempting to transition from the Connecting state to the Connected state.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Trace: Releasing Connection Lock in StartAsyncInner (/_/src/SignalR/clients/csharp/Client.Core/src/HubConnection.cs:280).
The thread 0x8184 has exited with code 0 (0x0).
Microsoft.AspNetCore.SignalR.Client.HubConnection: Trace: Acquired the Connection Lock in order to ping the server.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Debug: Sending PingMessage message.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Debug: Sending PingMessage message completed.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Trace: Releasing Connection Lock in RunTimerActions (/_/src/SignalR/clients/csharp/Client.Core/src/HubConnection.cs:1881).
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Debug: Received message from application. Payload size: 11.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Trace: Waiting on Connection Lock in HandleConnectionClose (/_/src/SignalR/clients/csharp/Client.Core/src/HubConnection.cs:1279).
Microsoft.AspNetCore.Http.Connections.Client.HttpConnection: Debug: Disposing HttpConnection.
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Information: Transport is stopping.
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Debug: Send loop stopped.
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Debug: Transport stopped.
Microsoft.AspNetCore.Http.Connections.Client.HttpConnection: Information: HttpConnection Disposed.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Debug: Canceling all outstanding invocations.
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Debug: Receive loop canceled.
Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport: Debug: Receive loop stopped.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Trace: The HubConnection is attempting to transition from the Connected state to the Reconnecting state.
Microsoft.AspNetCore.SignalR.Client.HubConnection: Error: HubConnection reconnecting due to an error.
I won't include them here, but the logs when running under Service Fabric show that the server is correctly sending keep-alives to the client ("Sent a ping message to the client").
It might seem obvious that there is some difference in configuration between my console and Service Fabric hosts, but I've gone through it carefully and cannot see anything that would explain this. In fact, the SignalR integration differed only in that the development host configured detailed errors to be enabled, but even if I remove that the behavior remains the same.
Short of running my own build of ASP.NET Core (something I'm perhaps lazily attempting to avoid only because it was looking far from trivial to build), is there anything I might be missing that would explain this situation?

erlang failed to resolve ipv6 addresses using parameter from rabbitmq

I'm using rabbitmq cluster in k8s which has only pure ipv6 address. inet return nxdomain error when parsing the k8s service name.
The paramter passed to erlang from rabbitmq is:
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+A 128 -kernel inetrc '/etc/rabbitmq/erl_inetrc' -proto_dist inet6_tcp"
RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp"
erl_inetrc: |-
{inet6, true}.
when rabbitmq using its plugin rabbit_peer_discovery_k8s to invoke k8s api:
2019-10-15 07:33:55.000 [info] <0.238.0> Peer discovery backend does not support locking, falling back to randomized delay
2019-10-15 07:33:55.000 [info] <0.238.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized start
up delay.
2019-10-15 07:33:55.000 [debug] <0.238.0> GET https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/tazou/endpoints/zt4-crmq
2019-10-15 07:33:55.015 [debug] <0.238.0> Response: {error,{failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},{inet,[inet]
,nxdomain}]}}
2019-10-15 07:33:55.015 [debug] <0.238.0> HTTP Error {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},{inet,[inet],nxdom
ain}]}
2019-10-15 07:33:55.015 [info] <0.238.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}}
,
{inet,[inet],nxdomain}]}
2019-10-15 07:33:55.016 [error] <0.237.0> CRASH REPORT Process <0.237.0> with 0 neighbours exited with reason: no case clause matching {error,"{fa
iled_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from
_config/0 line 167 in application_master:init/4 line 138
2019-10-15 07:33:55.016 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kub
ernetes.default.svc.cluster.local\",443}},\n
in k8s console, the address could be resolved:
[rabbitmq]# nslookup -type=AAAA kubernetes.default.svc.cluster.local
Server: 2019:282:4000:2001::6
Address: 2019:282:4000:2001::6#53
kubernetes.default.svc.cluster.local has AAAA address fd01:abcd::1
the inet could return ipv6 address.
kubectl exec -ti zt4-crmq-0 rabbitmqctl eval 'inet:gethostbyname("kubernetes.default.svc.cluster.local").'
{ok,{hostent,"kubernetes.default.svc.cluster.local",[],inet6,16,
[{64769,43981,0,0,0,0,0,1}]}}
as I know, plugin call httpc:request to invoke k8s api. I don't know what's the gap between httpc:request and inet:gethostbyname. I also don't what's used by httpc:request to resolve the address of hostname.
I query for the rabbitmq plugin, It's said that rabbitmq plugin don't aware how erlang resovlve the address. https://github.com/rabbitmq/rabbitmq-peer-discovery-k8s/issues/55.
Anything else I could set for erl_inetrc so that erlang could resolve the ipv6 address? what did i miss to config? or how could i debug from erlang side? I'm new to erlang.
B.R,
Tao

IO thread error : 1595 (Relay log write failure: could not queue event from master)

Slave status :
Last_IO_Errno: 1595
Last_IO_Error: Relay log write failure: could not queue event from master
Last_SQL_Errno: 0
from error log :
[ERROR] Slave I/O for channel 'db12': Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: log_file_name toku10-bin.000063<D1> log_pos 97223067, Error_code: 1623
[ERROR] Slave I/O for channel 'db12': Relay log write failure: could not queue event from master, Error_code: 1595
I tried to restarting the slave_io thread for many times, still its same.
we need to keep on start io_thread whenever it stopped manually, hope its bug from percona
I have simply written shell and scheduled the same for every 10mins to check if io_thread is not running , start slave io_thread for channel 'db12';. It's working as of now

Configuration of CloudAMQP Connection

I'm having difficulty configuring my connection to CloudAMQP in my deployed grails application. I can run the application locally against a locally installed RabbitMQ instance but can't figure out how to correctly define my application to run on CloudBees using the CloudAMQP service.
In my Config.groovy, I'm defining my connection info and a queue:
rabbitmq {
connectionfactory {
username = 'USERNAME'
password = 'PASSWORD'
hostname = 'lemur.cloudamqp.com'
}
queues = {
testQueue autoDelete: false, durable: false, exclusive: false
}
}
When the application starts and tries to connect, I see the following log messages:
2013-08-23 21:29:59,195 [main] DEBUG listener.SimpleMessageListenerContainer - Starting Rabbit listener container.
2013-08-23 21:29:59,205 [SimpleAsyncTaskExecutor-1] DEBUG listener.BlockingQueueConsumer - Starting consumer Consumer: tag=[null], channel=null, acknowledgeMode=AUTO local queue size=0
2013-08-23 21:30:08,405 [SimpleAsyncTaskExecutor-1] WARN listener.SimpleMessageListenerContainer - Consumer raised exception, processing can restart if the connection factory supports it
org.springframework.amqp.AmqpIOException: java.io.IOException
at org.springframework.amqp.rabbit.connection.RabbitUtils.convertRabbitAccessException(RabbitUtils.java:112)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:163)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:228)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils$1.createConnection(ConnectionFactoryUtils.java:119)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.doGetTransactionalResourceHolder(ConnectionFactoryUtils.java:163)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.getTransactionalResourceHolder(ConnectionFactoryUtils.java:109)
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:199)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:524)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:106)
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:102)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:124)
at com.rabbitmq.client.impl.AMQConnection.start(AMQConnection.java:381)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:516)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:545)
Caused by: com.rabbitmq.client.ShutdownSignalException: connection error; reason: java.net.SocketException: Connection reset
at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:67)
at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:33)
at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:343)
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:216)
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:118)
... 3 more
Caused by: java.net.SocketException: Connection reset
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95)
at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:131)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:508)
2013-08-23 21:30:08,406 [SimpleAsyncTaskExecutor-1] INFO listener.SimpleMessageListenerContainer - Restarting Consumer: tag=[null], channel=null, acknowledgeMode=AUTO local queue size=0
2013-08-23 21:30:08,406 [SimpleAsyncTaskExecutor-1] DEBUG listener.BlockingQueueConsumer - Closing Rabbit Channel: null
2013-08-23 21:30:08,407 [SimpleAsyncTaskExecutor-2] DEBUG listener.BlockingQueueConsumer - Starting consumer Consumer: tag=[null], channel=null, acknowledgeMode=AUTO local queue size=0
Aug 23, 2013 9:30:11 PM org.apache.catalina.core.ApplicationContext log
INFO: Initializing Spring FrameworkServlet 'grails'
Aug 23, 2013 9:30:11 PM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8634
Aug 23, 2013 9:30:11 PM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8634
According to https://developer.cloudbees.com/bin/view/RUN/CloudAMQP
when you bind your CloudAMQP service to your app - some config params are provided in the pattern of CLOUDAMQP_URL_ - this is the type of thing you would need to put in your config files so they can be wired in when the app launches.
Make sure to specify the virtualHost for CloudAMQP connections. That worked for me.