Weblogic : node in cluster shuts down, no JMS message is sent - weblogic

I have a weblogic cluster which has 4 nodes (managed servers). Today I found two of them are down, and I found in suprise that some JMS messages are not sent.
I wonder if it's the normal behaviour ? Shouldn't the cluster continue to deliver JMS using the two available nodes ?

In order to reach high availability for JMS you should configure two things
Migratable targets.
Persistance based either on shared storage or a database.
Why migratable targets? This is because messages produced by i.e. JMSServer01 can only be processed by JMSServer01. Thus, when you configure migratable targets the JMSServer01 will be migrated automatically into another Weblogic server.
Why persistance based on shared storage or a database? This is because once the JMS Server is migrated into another server, it will try to process the messages, which must be in a shared storage or database that can be seen by all your Weblogic servers.
You can find more information here https://docs.oracle.com/middleware/1213/core/ASHIA/jmsjta.htm#ASHIA4396

Related

How does a GlassFish cluster find active IIOP endpoints?

I have a curiosity and I was searching for it without any result. In GlassFish documentation it is written:
If the GlassFish Server instance on which the application client is
deployed participates in a cluster, the GlassFish Server finds all
currently active IIOP endpoints in the cluster automatically. However,
a client should have at least two endpoints specified for
bootstrapping purposes, in case one of the endpoints has failed.
but I am asking myself how this list is created.
I've done some tests with a stand-alone client that is executed in a JVM and does some RMI calls on an application that is deployed in a GlassFish cluster and I can see from the logs that the IIOP endpoints list is completed automatically and it is set as com.sun.appserv.iiop.endpoints system property but if I stop a server instance or start another during the execution of the client the list remains the one that was created when the JVM was started.
GlassFish clustering is managed by the GMS (Group Management Service) which usually uses UDP Multicast, but can use TCP where that is not available.
See section 4 "Administering GlassFish Server Clusters" in the HA Administration Guide (PDF)
The Group Management Service (GMS) enables instances to participate in a cluster by
detecting changes in cluster membership and notifying instances of the changes. To
ensure that GMS can detect changes in cluster membership, a cluster's GMS settings
must be configured correctly.

Best Practice for setting up RabbitMQ cluster in production with NServiceBus

Currently we have 2 load balanced web servers. We are just starting to expose some functionality over NSB. If I create two "app" servers would I create a cluster between all 4 servers? Or should I create 2 clusters?
i.e.
Cluster1: Web Server A, App Server A
Cluster2: Web Server B, App Server B
Seems like if it is one cluster, how do I keep a published message from being handled by the same logical subscriber more than once if that subscriber is deployed to both app server A and B?
Is the only reason I would put RabbitMQ on the web servers for message durability (assuming I didn't have any of the app services running on the web server as well)? In that case my assumption is that I am then using the cluster mirroring to get the message to the app server. Is this correct?
Endpoints vs Servers
NServiceBus uses the concept of endpoints. An endpoint is related to a queue on which it receives messages. If this endpoint is scaled out for either high availability or performance then you still have one queue (with RabbitMQ). So if you would have an instance running on server A and B they both (with RabbitMQ) get their messages from the same queue.
I wouldn't think in app servers but think in endpoints and their non functional requirements in regards to deployment, availability and performance.
Availability vs Performance vs Deployment
It is not required to host all endpoints on server A and B. You can also run service X and Y on server A and services U and V on server B. You then scale out for performance but not for availability but availability is already less of an issue because of the async nature of messaging. This can make deployment easier.
Pubsub vs Request Response
If the same logical endpoint has multiple instances deployed then it should not matter which instance processes an event. If it is then it probably isn't pub sub but async request / response. This is handled by NServiceBus by creating a queue for each instance (with RabbitMQ) on where the response can be received if that response requires affinity to requesting instance.
Topology
You have:
Load balanced web farm cluster
Load balanced RabbitMQ cluster
NServiceBus Endpoints
High available multiple instances on different machines
Spreading endpoints on various machines ( could even be a machine per endpoint)
A combination of both
Infrastructure
You could choose to run the RabbitMQ cluster on the same infrastructure as your web farm or do it separate. It depends on your requirements and available resources. If the web farm and rabbit cluster are separate then you can more easily scale out independently.

Weblogic migratable JMS consumer doesn't follow the service to the new managed server if the old server remains running

I have a JMS service targeted at a migratable target (using an Auto-Migrate Exactly-Once policy) in a cluster which consists of 2 managed servers, at any point of time the service is hosted at one of them and the consumer (which is targeted at the cluster) is supposed to receive messages seamlessly no matter where the service is hosted.
When I manually switch the host of the migratable target (clicking migrate), without turning the hosting managed server off, the consumer fails to receive messages sent to the queues, unless I turn off the previous hosting managed server forcing the consumer to the new host.
I can rule out sender problems, I can see the messages in the queue right after them being sent.
I'll be grateful if anyone can advice on how to configure either the consumer or the migratable service to work seamlessly when migration happens.
I think that may just be a misunderstanding of how migration works. The docs state Auto-Migrate Exactly-Once:
indicates that if at least one Managed Server in the candidate list
is running, then the JMS service will be active somewhere in the
cluster if servers should fail or are shut down (either gracefully or
forcibly). For example, a migratable target hosting a path service
should use this option so if its hosting server fails or is shut down,
the path service will automatically migrate to another server and so
will always be active in the cluster. Note that this value can lead to
target grouping. For example, if you have five exactly-once migratable
targets and only one server member is started, then all five
migratable targets will be activated on that server member.
The docs also state:
Manual Service Migration—the manual migration of pinned JTA and
JMS-related services (for example, JMS server, SAF agent, path
service, and custom store) after the host server instance fails
Your server/service has neither failed or shut down, you are forcing it to migrate with a healthy host still running, so it has not met the criteria for migration.
See more here as well.
I have some experience that sounds reminiscent of what you're looking at. There was some WLS-specific capability around recognizing reconfiguration in JMS destinations as part of their clustered server design.
In one case I had to call a WLS-specific method: weblogic.jms.extensions.WLSession.setExceptionListener(). This was on their implementation of the JMS Session interface. This is analogous to the standard JMS Connection.setExceptionListener().
With this WLS-specific capability, the WLSession.setExceptionListener() callback would occur at a point where the consuming client should tear down and re-establish the connection / session / consumer in reaction to a reconfiguration (migration) that had happened.

rabbitMQ federation VS ActiveMQ Master/Slave

I am trying to set up cluster of brokers, which should have same feature like rabbitMQ cluster, but over WAN (my machines are in different locations), so rabbitMQ cluster does not work.
I am looking to alternatives, rabbitMQ federation is just backup the messages in the downstream, can not make sure they have exactly the same messages available at any time (downstream still keeps the old messages already consumed in the upstream)
how about ActiveMQ Master/Slave, I have found :
http://activemq.apache.org/how-do-distributed-queues-work.html
"queues and topics are all replicated between each broker in the cluster (so often to a master and maybe a single slave). So each broker in the cluster has exactly the same messages available at any time so if a master fails, clients failover to a slave and you don't loose a message."
My concern is that if it can automatically update to make sure Master/Slave always have the same messages, which means the consumed messages in Master will also disappear in Slaves.
Thanks :)
ActiveMQ has various clustering features.
First there is High Availability - "Master/Slave". The idea is that several physical servers act as a single logical ActiveMQ broker. If one goes down, another takes it place without losing data. You can do that by sharing the message store (shared file system or shared JDBC), or you could setup a replicated cluster, which replicates read/writes to the master down to all slaves (you need three+ servers). ActiveMQ is using LevelDB and Apache Zookeeper to achieve this.
The other format of cluster available in ActiveMQ is to be able to distribute load and separate security over several logical brokers. Brokers are then connected in a network of brokers. Messages are by default passed around to the broker with available consumers for that message. However, there is a rich toolbox of features in ActiveMQ to tweak a network of brokers to do things as always send a copy of a message to specific broker etc. It takes some messing with the more advanced features though (static network connectors and queue mirroring, maybe more).
Maybe there is a better way to solve your requirements, which is not really specified in the question?

Is the RavenDB subscription storage a central point of failure for NServiceBus?

I am evaluating using NServiceBus as a SOA mechanism in our product. I'm looking into using the publish/subscribe pattern and my understanding is that the subscription service will store all subscriptions.
Does that mean that if my RavenDB server goes down then my publishers lose the ability to send to subscribers? Or is there a way for the publishers to cache the subscribers it has and if RavenDB were to go down then it would deliver to its known subscribers?
You can run the RavenDB server as a replicated node, to avoid this being a single point of failure.
The general pattern is for an endpoint to have a master node that acts as worker and distributor, and then the master node uses a Raven installation on that same server to store its subscriptions and saga storage.
So, it is a point of failure for that one endpoint, but other endpoints in the distributed system will use the Raven installs on their own servers. Thus, the system is kept distributed and the entire system does not have a single point of failure. RavenDB enables this because it is fairly easy to install it on any server.
Contrast this to SQL Server, which is frequently centralized, scaled up to the max, and even clustered in order to provide high availability. (Read: expensive!)
You can also run RavenDB in a Windows failover cluster where the nodes use a shared SAN for the RavenDB data files. If the active node dies, another takes over. Since the data is stored on the SAN, you shouldn't even notice it except the time it takes to start up the RavenDB windows service on the new node. Check out http://ravendb.net/docs/server/administration/fmc_configuration
This is also the recommended setup for High Availability when running with Distributors. http://docs.particular.net/nservicebus/scalability-and-ha/distributor/