I've spent quite a bit of time trying to figure out whether I should use the RabbitMQ federation plugin or shovel.
Basically I have two microservices. I want one of them to send a message to another. Each microservice has a different rabbitMQ cluster, so I need to use Federation/shovel.
I read this post When to use RabbitMQ shovels and when Federation plugin? and still couldn't figure it out / make a decision.
I want to satisfy the following:
Loose coupling
Microservices don't know about each other -- I.e the first microservice emits a message saying "i'm done doing x". And the second microservice just listens to that 'event' and acts accordingly..
In the future I 'might' want to add more microservices, each with their own rabbitMQ cluster / vhost.
Based on this information - what do you recommend, shovel or federation?
Why not just have one cluster for everything? RabbitMQ is build for handling 10k+ exchanges and queues, actually there is no upper limit except memory or disk space. Setting up a cluster for each microservice is too much work and creates unnecessary overhead. Using vhost should also not be used for this, but for each business area.
I'm only using shovels and I use them to transfer messages from my production environment to test, so I can test with real data. It's very easy to setup with scripts. And yes, you should only do this with scripts. Using the UI is too slow.
I know this doesn't answer your question directly, but I hope it has given you some food for thought.
Happy messaging!
Related
We have multiple web and windows applications which were deployed to different servers that we are planning to integrate using NservierBus to let all apps can pub/sub message between them, I think we using pub/sub pattern and using MSMQ transport will be good for it. but one thing I am not clear if it is a way to avoid hard code to set sub endpoint to MSMQ QueueName#ServerName which has server name in it directly if pub is on another server. on 6-pre I saw idea to set endpoint name then using routing to delegate to transport-level address, is that a solution to do that? or only gateway is the solution? is a broker a good idea? what is the best practice for this scenario?
When using pub/sub, the subscriber currently needs to know the location of the queue of the publisher. The subscriber then sends a subscription-message to that queue, every single time it starts up. It cannot know if it subscribed already and if it subscribed for all the messages, since you might have added/configured some new ones.
The publisher reads these subscriptions messages and stores the subscription in storage. NServiceBus does this for you, so there's no need to write code for this. The only thing you need is configuration in the subscriber as to where the (queue of the) publisher is.
I wrote a tutorial myself which you can find here : http://dennis.bloggingabout.net/2015/10/28/nservicebus-publish-subscribe-tutorial/
That being said, you should take special care related to issues regarding websites that publish messages. More information on that can be found here : http://docs.particular.net/nservicebus/hosting/publishing-from-web-applications
In a scale out situation with MSMQ, you can also use the distributor : http://docs.particular.net/nservicebus/scalability-and-ha/distributor/
As a final note: It depends on the situation, but I would not worry too much about knowing locations of endpoints (or their queues). I would most likely not use pub/sub just for this 'technical issue'. But again, it completely depends on the situation. I can understand that rich-clients which spawn randomly might want this. But there are other solutions as well, with a more centralized storage and an API that is accessed by all the rich clients.
I'm in the process of implementing various remote methods/RPCs on the top of AMQP (RabbitMQ in particular). When a worker (or a client) comes online, it could, in theory, declare (create) a queue on the exchange. The other approach is to just start using a queue and assume that it already exists on the exchange.
Which approach is more common? Creating queues manually has a higher administrative cost, maybe; however, it can result in a more consistent environment if we decouple queue management from queue usage.
It depends what is the requirement. If you have a fixed number of queues and dont need it to be generated dynamically, then go for manual. Example : It is a integration application and I know I have 3 consumers A,B,C then I will manually create 3 queues. Another example in a chat application for every logged in user I want to create a queue, in that case queues should be created programatically. And in case manual creation, you have more control to implement permissions and ACLs.
Meanwhile I found out that according to RabbitMQ applications should take care of managing the queues they use.
I have a middleware based on Apache Camel which does a transaction like this:
from("amq:job-input")
to("inOut:businessInvoker-one") // Into business processor
to("inOut:businessInvoker-two")
to("amq:job-out");
Currently it works perfectly. But I can't scale it up, let say from 100 TPS to 500 TPS. I already
Raised the concurrent consumers settings and used empty businessProcessor
Configured JAVA_XMX and PERMGEN
to speed up the transaction.
According to Active MQ web Console, there are so many messages waiting for being processed on scenario 500TPS. I guess, one of the solution is scale the ActiveMQ up. So I want to use multiple brokers in cluster.
According to http://fuse.fusesource.org/mq/docs/mq-fabric.html (Section "Topologies"), configuring ActiveMQ in clustering mode is suitable for non-persistent message. IMHO, it is true that it's not suitable, because all running brokers use the same store file. But, what about separating the store file? Now it's possible right?
Could anybody explain this? If it's not possible, what is the best way to load balance persistent message?
Thanks
You can share the load of persistent messages by creating 2 master/slave pairs. The master and slave share their state either though a database or a shared filesystem so you need to duplicate that setup.
Create 2 master slave pairs, and configure so called "network connectors" between the 2 pairs. This will double your performance without risk of loosing messages.
See http://activemq.apache.org/networks-of-brokers.html
This answer relates to an version of the question before the Camel details were added.
It is not immediately clear what exactly it is that you want to load balance and why. Messages across consumers? Producers across brokers? What sort of concern are you trying to address?
In general you should avoid using networks of brokers unless you are trying to address some sort of geographical use case, have too many connections for a signle broker to handle, or if a single broker (which could be a pair of brokers configured in HA) is not giving you the throughput that you require (in 90% of cases it will).
In a broker network, each node has its own store and passes messages around by way of a mechanism called store-and-forward. Have a read of Understanding broker networks for an explanation of how this works.
ActiveMQ already works as a kind of load balancer by distributing messages evenly in a round-robin fashion among the subscribers on a queue. So if you have 2 subscribers on a queue, and send it a stream of messages A,B,C,D; one subcriber will receive A & C, while the other receives B & D.
If you want to take this a step further and group related messages on a queue so that they are processed consistently by only one subscriber, you should consider Message Groups.
Adding consumers might help to a point (depends on the number of cores/cpus your server has). Adding threads beyond the point your "Camel server" is utilizing all available CPU for the business processing makes no sense and can be conter productive.
Adding more ActiveMQ machines is probably needed. You can use an ActiveMQ "network" to communicate between instances that has separated persistence files. It should be straight forward to add more brokers and put them into a network.
Make sure you performance test along the road to make sure what kind of load the broker can handle and what load the camel processor can handle (if at different machines).
When you do persistent messaging - you likely also want transactions. Make sure you are using them.
If all running brokers use the same store file or tx-supported database for persistence, then only the first broker to start will be active, while others are in standby mode until the first one loses its lock.
If you want to loadbalance your persistence, there were two way that we could try to do:
configure several brokers in network-bridge mode, then send messages
to any one and consumer messages from more than one of them. it can
loadbalance the brokers and loadbalance the persistences.
override the persistenceAdapter and use the database-sharding middleware
(such as tddl:https://github.com/alibaba/tb_tddl) to store the
messages by partitions.
Your first step is to increase the number of workers that are processing from ActiveMQ. The way to do this is to add the ?concurrentConsumers=10 attribute to the starting URI. The default behaviour is that only one thread consumes from that endpoint, leading to a pile up of messages in ActiveMQ. Adding more brokers won't help.
Secondly what you appear to be doing could benefit from a Staged Event-Driven Architecture (SEDA). In a SEDA, processing is broken down into a number of stages which can have different numbers of consumer on them to even out throughput. Your threads consuming from ActiveMQ only do one step of the process, hand off the Exchange to the next phase and go back to pulling messages from the input queue.
You route can therefore be rewritten as 2 smaller routes:
from("activemq:input?concurrentConsumers=10").id("FirstPhase")
.process(businessInvokerOne)
.to("seda:invokeSecondProcess");
from("seda:invokeSecondProcess?concurentConsumers=20").id("SecondPhase")
.process(businessInvokerTwo)
.to("activemq:output");
The two stages can have different numbers of concurrent consumers so that the rate of message consumption from the input queue matches the rate of output. This is useful if one of the invokers is much slower than another.
The seda: endpoint can be replaced with another intermediate activemq: endpoint if you want message persistence.
Finally to increase throughput, you can focus on making the processing itself faster, by profiling the invokers themselves and optimising that code.
I am a newbie to real-time application development and am trying to wrap my head around the myriad options out there. I have read as many blog posts, notes and essays out there that people have been kind enough to share. Yet, a simple problem seems unanswered in my tiny brain. I thought a number of other people might have the same issues, so I might as well sign up and post here on SO. Here goes:
I am building a tiny real-time app which is asynchronous chat + another fun feature. I boiled my choices down to the following two options:
LAMP + RabbitMQ
Node.JS + Redis + Pub-Sub
I believe that I get the basics to start learning and building this out. However, my (seriously n00b) questions are:
How do I communicate with the end-user -> Client to/from Server in both of those? Would that be simple Javascript long/infinite polling?
Of the two, which might more efficient to build out and manage from a single Slice (assuming 100 - 1,000 users)?
Should I just build everything out with jQuery in the 'old school' paradigm and then identify which stack might make more sense? Just so that I can get the product fleshed out as a prototype and then 'optimize' it. Or is writing in one over the other more than mere optimization? ( I feel so, but I am not 100% on this personally )
I hope this isn't a crazy question and won't get flamed right away. Would love some constructive feedback, love this community!
Thank you.
Architecturally, both of your choices are the same as storing data in an Oracle database server for another application to retrieve.
Both the RabbitMQ and the Redis solution require your apps to connect to an intermediary server that handles the data communications. Redis is most like Oracle, because it can be used simply as a persistent database with a network API. But RabbitMQ is a little different because the MQ Broker is not really responsible for persisting data. If you configure it right and use the right options when publishing a message, then RabbitMQ will actually persist the data for you but you can't get the data out except as part of the normal message queueing process. In other words, RabbitMQ is for communicating messages and only offers persistence as a way of recovering from network problems or system crashes.
I would suggest using RabbitMQ and whatever programming languages you are already familiar with. Since the M in LAMP is usually interpreted as MySQL, this means that you would either not use MySQL at all, or only use it for long term storage of data, not for the realtime communications.
The RabbitMQ site has a huge amount of documentation about building apps with AMQP. I suggest that after you install RabbitMQ, you read through the docs for rabbitmqctl and then create a vhost to experiment in. That way it is easy to clean up your experiments without resetting everything. I also suggest using only topic exchanges because you can emulate the behavior of direct and fanout exchanges by using wildcards in the routing_key.
Remember, you only publish messages to exchanges, and you only receive messages from queues. The exchange is responsible for pattern matching the message's routing_key to the queue's binding_key to determine which queues should receive a copy of the message. It is worthwhile learning the whole AMQP model even if you only plan to send messages to one queue with the same name as the routing_key.
If you are building your client in the browser, and you want to build a prototype, then you should consider just using XHR today, and then move to something like Kamaloka-js which is a pure Javascript implementation of AMQP (the AMQ Protocol) which is the standard protocol used to communicate to a RabbitMQ message broker. In other words, build it with what you know today, and then speed it up later which something (AMQP) that has a long term future in your toolbox.
Should I just build everything out with jQuery in the 'old school' paradigm and then identify which stack might make more sense? Just so that I can get the product fleshed out as a prototype and then 'optimize' it. Or is writing in one over the other more than mere optimization? ( I feel so, but I am not 100% on this personally )
This is usually called RAD (rapid application design/development) and it is what I would recommend right now. This lets you build the proof of concept that you can use to work off of later to get what you want to happen.
As for how to talk to the clients from the server, and vice versa, have you read at all on websockets?
Given the choice between LAMP or event based programming, for what you're suggesting, I would tell you to go with the event based programming, so nodejs. But that's just one man's opinion.
Well,
LAMP - Apache create new process for every request. RabbitMQ can be useful with many features.
Node.js - Uses single process to handle all request asynchronously with help of event looping. So, no extra overhead process creation like apache.
For asynchronous chat application,
socket.io + Node.js + redis pub-sup is best stack.
I have already implemented real-time notification using above stack.
I'm thinking of adding a queue function in a product based on a bunch of WCF services. I've read some about MSMQ, first I thought that was what I needed but I'm not sure and are considering to just put the queue in a database table. I wonder if somone here got some feedback on which way to go.
Basicly I'm planning to have a facade WCF service called over http. The facade service should only write all incoming messages to a queue to give a fast response to the calling system. The messages in the queue should then be processed by another component, either a WCF service or a Windows service depending om my choice of queue.
The product is running in a load balanced enviroment with 2 to n web servers.
The options I'm considering and the questions I got are:
To let the facade WCF write to a MSMQ and then have anothther WCF service reading from this queue to do the processing of the messages. What I don't feel confident about for this alternative from what I've read is how this will work in a load balanced enviroment.
1A. Where should the MSMQ(s) be placed? One on each web server? One on a separate server? Mulitple on a separate server? (not considering need of redundance and that data in rare cases could be lost and re-sent)
1B. How it the design affected if I want the system redundant? I'd like to be alble to lose a server (it never comes up online again) holding the MSMQ without losing the data in that queue. From what I've read about MSMQ that leaves me to the only option of placing the MSMQ on a windows cluster. Is that correct? (I'd like to avoid using a windows cluster fo this).
The second design alternative is to let the facade WCF service write the queue to a database. Then have two or more Windows services to do the processing of the queue. I don't have any questions on this alternative. If you wonder why I don't pick this one as it seems simpler to me then it is because I'd like to build this not introducing any windows services to the solution, that I beleive the MSMQ got functionality I don't want to code myself and I'm also curious about using MSMQ as I've never used it before.
Best Regards
HÃ¥kan
OK, so you're not using WCF with MSMQ integration, you're using WCF to create MSMQ messages as an end-product. That simplifies things to "how do I load balance MSMQ?"
The arrangement you use is based on what works best for you.
You could have multiple webservers sending messages to a remote queue on a central machine.
Instead you could have a webservers putting messages in local queues with a central machine polling the queues for new arrivals.
You don't need to cluster MSMQ to make it resilient. You can instead make your code resilient so that it copes with lost messages using dead letter queues, transactional queues, journaling, and so on. Hardware clustering is the easy option :-)
Load-balancing MSMQ - a brief
discussion
Oil and water - MSMQ transactional
messages and load balancing
After reading some more on the subjet I haver decided to not use MSMQ. It seems like I really got no reason to go down this road. I need this to be non-transactional and as I understand it none of the journaling or dead letter techniques will help me with my redundancy requirement.
All my components will be online most of the time (maybe a couple of hours per year when they got access problems).
The MSQM will only add complexity to the exciting solution, another technique and maybe another server to keep track of.
To get full redundance to prevent data loss in MSMQ I will need a windows cluster or implement send/recieve to multiple identical queues. I don't want to do either of those.
All this lead me to front my recieving application with a WCF facade accepting http calls writing to a database queue. This database is already protected from data loss. The queue will be polled by muliple active instances of a Windows Servce containing all the heavy business logic. With low process priority these services could be hosted on the already existing nodes used by the load balaced web application. If I got time to use MSMQ or if I needed it for another reason in my application I might change my decision.