I have a strange behavior in ActiveMQ with network connectors. Here is the setup:
Broker A listening for connections via a nio transport connector on 61616
Broker B establishing a duplex connection to broker A
A producer on A sends messages to a known queue, say q1
A consumer on B subscribes to the queue q1
I can clearly see that the duplex connection is established but the consumer on B doesn't receive any message.
On jconsole I can see that the broker A is sending messages, up to the value of the prefetch limit (1000 messages) to the network consumer, which seems fine. The "DispatchedQueue", "DispatchedQueueSize", and more importantly the "MessageCountAwaitingAck" counters have the same value: they are stuck to 1000.
On the broker B, the queue size is 0.
At the system level, I can clearly see an established connection between broker A and broker B:
# On broker A (192.168.x.x)
$ netstat -t -p -n
tcp 89984 135488 192.168.x.x:61616 172.31.x.x:57270 ESTABLISHED 18591/java
# On broker B (172.31.x.x)
$ netstat -t -p -n
tcp 102604 101144 172.31.x.x:57270 192.168.x.x:61616 ESTABLISHED 32455/java
Weird thing: the recv-q and send-q on both brokers A and B seem to have some data not read by the other side. They don't increase or decrease, they are just stuck to these values.
The ActiveMQ logs on both sides don't say much, even in TRACE level.
Seems like neither broker A or broker B are sending acks for the messages to the other side.
How is that possible? What's a potential cause and fix?
EDIT: I should add that I'm using an embedded ActiveMQ 5.13.4 on both sides.
Thanks.
Related
I am using the ActiveMQ Artemis Broker and publishing to it through a client application.
Behavior observed:
When my client is IPV4 a TLS handshake is established and data is published as expected, no problems.
When my client is IPV6 , I see frequent re-connections being established between the client and the server(broker) and no data is being published.
Details:
When using IPV6 the client does a 3 way handshake and attempts to send data. It also receives a Server Hello and sends application data.
But the connection terminates and again reconnects. This loop keeps occurring.
The client library, network infrastructure, and broker are all completely the same when using IPv4 and IPv6.
The client logs say:
Idle network reply timeout.
The broker logs show an incoming connection request and also an CONNACK for it from the broker, e.g.:
MQTT(): IN << CONNECT protocol=(MQTT, 4), hasPassword=false, isCleanSession=false, keepAliveTimeSeconds=60, clientIdentifier=b_001, hasUserName=false, isWillFlag=false
MQTT(): OUT >> CONNACK connectReturnCode=0, sessionPresent=true
What wire-shark (tcpdump) tells:
Before every re-connection(3 way handshake is done) I see this:
Id Src Dest
1 Broker(App Data) Client
2 Broker(App Data) Client
3 Client(ACK) Broker
4 Client(ACK) Broker
5 Broker(FIN,ACK) Client
6 Client(FIN,ACK) Broker
7 Broker (ACK) Client
8 Client (SYN) Broker
9 Broker (SYN/ACK) Client
10 Client (ACK) Broker
Then the 3 way handshake (Client hello, Change Cipher Spec, Server Hello) and the above repeats again.
Based on packets 5, 6, & 7 I have concluded that the connection is being terminated by the broker (server). The client acknowledges termination and then again attempts to reconnect as it is an infinite loop attempting re connection and publishing.
I am looking at network level analysis for the first time and even wireshark. I'm not sure if my analysis is right.
Also have hit a wall, not sure why re-connection is occurring only when the device is IPV6. Also I don't see any RST to indicate termination of connection.
Broker is also sending a CONNACK (from broker logs), but still no data is sent, just attempts to reconnect not sure why.
Also, I see a few I see a few:
Out-of-Order TCP (when src is broker)
Spurious Re-transmission
DUP ACK (src is client)
Not sure if this is important.
Any headers on what is going on?
The issue was caused due to a LB setting which had a default connection time out of 30 secs , lesser than the connection timeout set by the client.
I was trying to run Apache Activemq , broker ran successfully at localhost. At same machine JMS producer , consumer Java applications ran successfully . BUT I changed Uri to tcp://192.168.1.1:61616 in activemq.xml and ran the broker in machine 1( 192.168.1.1) . I ran consumer in machine 1. I ran producer from machine 2 in LAN. But producer caused jms exception. ConnectException. Connection refused. As a result producer and consumer can not communicate in LAN . Please guide.
If I understand this correctly you have this setup:
Machine1: ActiveMQ Broker and Consumer
Machine2: Producer
Then you need to setup your configurations like this:
ActiveMQ Broker: in activemq.xml set to tcp://192.168.1.1:61616
Consumer: tcp://192.168.1.1:61616
Producer: tcp://192.168.1.1:61616
Thank you very much. It was firewall that was preventing connection. I disabled firewall, things is running fine. Regards.
I have an activemq installation with master / slave failover.
Master and Slave are synced using the lease-database-locker
Master and Slave run on 2 different machines and the database is located on a third machine.
Failover and client reconnection works properly on a forced shutdown of the master broker. The slave is taking over properly and the clients reconnect due to their failover setting.
The problems start, if I simulate a network outage on the master broker only. This is done by using an iptables Drop Rule for packages going to the database on the master.
The master now realizes, that it cannot connect to the Database any longer. The slave starts up, since it's network connection is still alive.
It seems from the logs, that the clients still try to reconnect to the non responding master
For my understanding the master should inform the clients, that there is no connection anymore. The clients should failover and reconnect to the slave.
But this is not happening.
The clients do reconnect to the slave if I reestablish the db connection by reenabling the network connection to the db for the master. The master gives up beeing the master then.
I have set a queryTimeout on the lease-database-locker.
I have set updateClusterClients=true for the transport connector.
I have set a validationQueryTimeout of 10s on the db connection.
I have set a testOnBorrow for the db connection
Is there a way to force the master to inform the clients to failover in this particular case ?
After some digging I found the trick.
The broker was not informing the clients due to a missing ioExceptionHandler configuration.
The documentation can be found here
http://activemq.apache.org/configurable-ioexception-handling.html
I needed to specify
<bean id="ioExceptionHandler" class="org.apache.activemq.util.LeaseLockerIOExceptionHandler">
<property name="stopStartConnectors"><value>true</value></property>
<property name="resumeCheckSleepPeriod"><value>5000</value></property>
</bean>
and tell the broker to use the Handler
<broker xmlns="http://activemq.apache.org/schema/core" ....
ioExceptionHandler="#ioExceptionHandler" >
In order to produce an error on network outages I also had to set a queryTimeout on the lease query:
<jdbcPersistenceAdapter dataDirectory="${activemq.base}/data" dataSource="#mysql-ds-db01-st" lockKeepAlivePeriod="3000">
<locker>
<lease-database-locker lockAcquireSleepInterval="10000" queryTimeout="8" />
</locker>
This will produce an sql exception if the query takes to long due to a network outage.
I did test the network by dropping packages to the database using an iptables rule:
/sbin/iptables -A OUTPUT -p tcp --destination-port 13306 -j DROP
Sounds like you client doesn't have the address of the slave in its URI so it doesn't know where to reconnect to. The master broker doesn't inform the client where the slave is as it doesn't know there is a slave(s) or where that slave might be on the network, and even if it did that would be unreliable depending on what the conditions are that caused the master broker to drop in the first place.
You need to provide the client with the connection information for the master and the slave in the failover URI.
I'm trying to setup a cluster of RabbitMQ servers, to get highly available queues using an active/passive server architecture. I'm following this guides:
http://www.rabbitmq.com/clustering.html
http://www.rabbitmq.com/ha.html
http://karlgrz.com/rabbitmq-highly-available-queues-and-clustering-using-amazon-ec2/
My requirement for high availability is simple, i have two nodes (CentOS 6.4) with RabbitMQ (v3.2) and Erlang R15B03. The Node1 must be the "active", responding all requests, and the Node2 must be the "passive" node that has all the queues and messages replicated (from Node1).
To do that, i have configured the following:
Node1 with RabbitMQ working fine in non-cluster mode
Node2 with RabbitMQ working fine in non-cluster mode
The next I did was to create a cluster between both nodes: joining Node2 to Node1 (guide 1). After that I configured a policy to make mirroring of the queues (guide 2), replicating all the queues and messages among all the nodes in the cluster. This works, i can connect to any node and publish or consume message, while both nodes are available.
The problem occurs when i have a queue "queueA" that was created on the Node1 (master on queueA), and when Node1 is stopped, I can't connect to the queueA in the Node2 to produce or consume messages, Node2 throws an error saying that Node1 is not accessible (I think that queueA is not replicated to Node2, and Node2 can't be promoted as master of queueA).
The error is:
{"The AMQP operation was interrupted: AMQP close-reason, initiated by
Peer, code=404, text=\"NOT_FOUND - home node 'rabbit#node1' of durable
queue 'queueA' in vhost 'app01' is down or inaccessible\", classId=50,
methodId=10, cause="}
The sequence of steps used is:
Node1:
1. rabbitmq-server -detached
2. rabbitmqctl start_app
Node2:
3. Copy .erlang.cookie from Node1 to Node2
4. rabbitmq-server -detached
Join the cluster (Node2):
5. rabbitmqctl stop_app
6. rabbitmqctl join_cluster rabbit#node1
7. rabbitmqctl start_app
Configure Queue mirroring policy:
8. rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'
Note: The pattern used for queue names is "" (all queues).
When I run 'rabbitmqctl list_policies' and 'rabbitmqctl cluster_status' is everything ok.
Why the Node2 cannot respond if Node1 is unavailable? Is there something wrong in this setup?
You haven't specified the virtual host (app01) in your set_policy call, thus the policy will only apply to the default virtual host (/). This command line should work:
rabbitmqctl set_policy -p app01 ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'
In the web management console, is queueA listed as Node1 +1?
It sounds like there might be some issue with your setup. I've got a set of vagrant boxes that are pre-configured to work in a cluster, might be worth trying that and identifying issues in your setup?
Only mirror queue which are synchronized with the master are promoted to be master, after fails. This is default behavior, but can be changed to promote-on-shutdown always.
Read carefully your reference
http://www.rabbitmq.com/ha.html
You could use a cluster of RabbitMQ nodes to construct your RabbitMQ
broker. This will be resilient to the loss of individual nodes in
terms of the overall availability of service, but some important
caveats apply: whilst exchanges and bindings survive the loss of
individual nodes, queues and their messages do not. This is because a
queue and its contents reside on exactly one node, thus the loss of a
node will render its queues unavailable.
Make sure that your queue is not durable or exclusive.
From the documentation (https://www.rabbitmq.com/ha.html):
Exclusive queues will be deleted when the connection that declared them is closed. For this reason, it is not useful for an exclusive
queue to be mirrored (or durable for that matter) since when the node
hosting it goes down, the connection will close and the queue will
need to be deleted anyway.
For this reason, exclusive queues are never mirrored (even if they
match a policy stating that they should be). They are also never
durable (even if declared as such).
From your error message:
{"The AMQP operation was interrupted: AMQP close-reason, initiated by
Peer, code=404, text=\"NOT_FOUND - home node 'rabbit#node1' of
durable queue 'queueA' in vhost 'app01' is down or inaccessible\", classId=50, methodId=10, cause="}
It looks like you created a durable queue.
I'm tunneling all of my internet traffic through a remote computer hosting Debian using sshd. But my internet connection becomes so slow (something around 5 to 10 kbps!). Can be anything wrong with the default configuration to cause this problem?
Thanks in advance,
Tunneling TCP within another TCP stream can sometimes work -- but when things go wrong, they go wrong very quickly.
Consider what happens when the "real world" loses one of your TCP packets: after a certain amount of not getting an ACK packet back in response to new data packets, the sending side realizes a packet has gone missing and re-sends the data.
If that packet happens to be a TCP packet whose payload is another TCP packet, then you have two TCP stacks that are upset about their missing packet. The tunneled TCP layer will re-send packets and the outer TCP layer will also resend packets. This causes a giant pileup of duplicate packets that will eventually be delivered and must be dropped on the floor -- because the outer TCP reliably delivered the packet, eventually.
I believe you would be much better served by a more dedicated tunneling method such as GRE tunnels or IPSec.
Yes, tunelling traffic over tcp connection is not a good idea. See http://sites.inka.de/bigred/devel/tcp-tcp.html