Inter process(service) communication without message queue - rabbitmq

We want to develop an application based on micro services architecture.
To communicate between various services asynchronously, we plan to use message queues(like RabbitMQ,ActiveMQ,JMS etc.,) . Is there any approach other than message queue is available to achieve inter process communication?
Thanks.

You should use Queues to handle the tasks that needs not to be completed in real time.
Append the tasks in queue and when there is a room, processor will take tasks from queue and will handle & will remove from queue.
Example :
Assuming your application deals with images, users are uploading so many images. Upload the tasks in a queue to compress the images. And when processor is free it will compress the queued images.
When you want to write some kind of logs of your system, give it to the queue and one process will take logs from queue and write that to disk. So the main process will not waste its time for the I/O operations.
Suggestion :
If you want the real time responses, you should not use the queue. You need to ping the queue constantly to read the incomings, and that is bad practice. And there is no guarantee that queue will handle your tasks immediately.
So the solutions are :
Redis cache - You can put your messages into cache and other process will read that message. Redis is "In memory data-structure". It is very fast and easy to use. Too much libraries and good resources available on the Internet, as it is open source. Read more about Redis. But here you also need to keep check whether there is some kind of message available and if available read from it, process and give response. But to read from Redis, is not very much costlier. With redis, you do not need to worry about memory management, it is well managed by open source community.
Using Sockets. Socket is very much faster, you can make this lightweight(if you want) as it is event based. One process will ping on port and other process will listen and give response. But you need to manage memory. If the buffered memory gets full, you can not put more messages here. If there are so many users producing messages, you need to manage to whom to you want to respond.
So it depends upon your requirement, like do you want to read messages constantly?, do you want to make one to one communication or many to one communication?

Related

How to load balancing ActiveMQ with persistent message

I have a middleware based on Apache Camel which does a transaction like this:
from("amq:job-input")
to("inOut:businessInvoker-one") // Into business processor
to("inOut:businessInvoker-two")
to("amq:job-out");
Currently it works perfectly. But I can't scale it up, let say from 100 TPS to 500 TPS. I already
Raised the concurrent consumers settings and used empty businessProcessor
Configured JAVA_XMX and PERMGEN
to speed up the transaction.
According to Active MQ web Console, there are so many messages waiting for being processed on scenario 500TPS. I guess, one of the solution is scale the ActiveMQ up. So I want to use multiple brokers in cluster.
According to http://fuse.fusesource.org/mq/docs/mq-fabric.html (Section "Topologies"), configuring ActiveMQ in clustering mode is suitable for non-persistent message. IMHO, it is true that it's not suitable, because all running brokers use the same store file. But, what about separating the store file? Now it's possible right?
Could anybody explain this? If it's not possible, what is the best way to load balance persistent message?
Thanks
You can share the load of persistent messages by creating 2 master/slave pairs. The master and slave share their state either though a database or a shared filesystem so you need to duplicate that setup.
Create 2 master slave pairs, and configure so called "network connectors" between the 2 pairs. This will double your performance without risk of loosing messages.
See http://activemq.apache.org/networks-of-brokers.html
This answer relates to an version of the question before the Camel details were added.
It is not immediately clear what exactly it is that you want to load balance and why. Messages across consumers? Producers across brokers? What sort of concern are you trying to address?
In general you should avoid using networks of brokers unless you are trying to address some sort of geographical use case, have too many connections for a signle broker to handle, or if a single broker (which could be a pair of brokers configured in HA) is not giving you the throughput that you require (in 90% of cases it will).
In a broker network, each node has its own store and passes messages around by way of a mechanism called store-and-forward. Have a read of Understanding broker networks for an explanation of how this works.
ActiveMQ already works as a kind of load balancer by distributing messages evenly in a round-robin fashion among the subscribers on a queue. So if you have 2 subscribers on a queue, and send it a stream of messages A,B,C,D; one subcriber will receive A & C, while the other receives B & D.
If you want to take this a step further and group related messages on a queue so that they are processed consistently by only one subscriber, you should consider Message Groups.
Adding consumers might help to a point (depends on the number of cores/cpus your server has). Adding threads beyond the point your "Camel server" is utilizing all available CPU for the business processing makes no sense and can be conter productive.
Adding more ActiveMQ machines is probably needed. You can use an ActiveMQ "network" to communicate between instances that has separated persistence files. It should be straight forward to add more brokers and put them into a network.
Make sure you performance test along the road to make sure what kind of load the broker can handle and what load the camel processor can handle (if at different machines).
When you do persistent messaging - you likely also want transactions. Make sure you are using them.
If all running brokers use the same store file or tx-supported database for persistence, then only the first broker to start will be active, while others are in standby mode until the first one loses its lock.
If you want to loadbalance your persistence, there were two way that we could try to do:
configure several brokers in network-bridge mode, then send messages
to any one and consumer messages from more than one of them. it can
loadbalance the brokers and loadbalance the persistences.
override the persistenceAdapter and use the database-sharding middleware
(such as tddl:https://github.com/alibaba/tb_tddl) to store the
messages by partitions.
Your first step is to increase the number of workers that are processing from ActiveMQ. The way to do this is to add the ?concurrentConsumers=10 attribute to the starting URI. The default behaviour is that only one thread consumes from that endpoint, leading to a pile up of messages in ActiveMQ. Adding more brokers won't help.
Secondly what you appear to be doing could benefit from a Staged Event-Driven Architecture (SEDA). In a SEDA, processing is broken down into a number of stages which can have different numbers of consumer on them to even out throughput. Your threads consuming from ActiveMQ only do one step of the process, hand off the Exchange to the next phase and go back to pulling messages from the input queue.
You route can therefore be rewritten as 2 smaller routes:
from("activemq:input?concurrentConsumers=10").id("FirstPhase")
.process(businessInvokerOne)
.to("seda:invokeSecondProcess");
from("seda:invokeSecondProcess?concurentConsumers=20").id("SecondPhase")
.process(businessInvokerTwo)
.to("activemq:output");
The two stages can have different numbers of concurrent consumers so that the rate of message consumption from the input queue matches the rate of output. This is useful if one of the invokers is much slower than another.
The seda: endpoint can be replaced with another intermediate activemq: endpoint if you want message persistence.
Finally to increase throughput, you can focus on making the processing itself faster, by profiling the invokers themselves and optimising that code.

How to handle long asynchronous requests with pyramid and celery?

I'm setting up a web service with pyramid. A typical request for a view will be very long, about 15 min to finish. So my idea was to queue jobs with celery and a rabbitmq broker.
I would like to know what would be the best way to ensure that bad things cannot happen.
Specifically I would like to prevent the task queue from overflow for example.
A first mesure will be defining quotas per IP, to limit the number of requests a given IP can submit per hour.
However I cannot predict the number of involved IPs, so this cannot solve everything.
I have read that it's not possible to limit the queue size with celery/rabbitmq. I was thinking of retrieving the queue size before pushing a new item into it but I'm not sure if it's a good idea.
I'm not used to good practices in messaging/job scheduling. Is there a recommended way to handle this kind of problems ?
RabbitMQ has flow control built into the QoS. If RabbitMQ cannot handle the publishing rate it will adjust the TCP window size to slow down the publishers. In the event of too many messages being sent to the server it will also overflow to disk. This will allow your consumer to be a bit more naive although if you restart the connection on error and flood the connection you can cause problems.
I've always decided to spend more time making sure the publishers/consumers could work with multiple queue servers instead of trying to make them more intelligent about a single queue server. The benefit is that if you are really overloading a single server you can just add another one (or another pair if using RabbitMQ HA. There is a useful video from Pycon about Messaging at Scale using Celery and RabbitMQ that should be of use.

How can I tell a WAS service polling an MSMQ to wait when busy?

I'm working on a system which amongst other things, runs payroll, a heavy load process. It is likely that soon, there may be so many requests to run payroll at peak times that the batch servers will be overwhelmed.
I'm looking to put together a proof of concept to cope with this by using MSMQ (probably replacing this with a commercial solution like nservicebus later). I using this this example as a basis. I can see how to set up the bindings and stick it together, but I still need a way to tell the subscribers hosted by WAS to only process the 'run heavy payroll process' message if they are not busy. Otherwise the messages on the queue will get picked up straightaway and we have the same problem as before.
Can I set up the subscribing service to say, "I'm busy, I can't take the message, leave it on the queue"? Does the queue need to be transactional?
If you're using WCF then there's no way to conditionally activate the channel thereby leaving the messages on the queue for later.
A better solution is to host the message receiver in a completely different process, for example as a windows service. These can then be enabled/disabled according to your service window requirement.
You also get the additional benefit of being able to very easily scale out the message receivers to handle greater loads (by hosting more instances of your receiver).
One way to do this is to have 2 queues, your polling always checks the high priority queue first, only if there are no items in that queue does it take an item from the other

Maximize throughput with RabbitMQ

In our project, we want to use the RabbitMQ in "Task Queues" pattern to pass data.
On the producer side, we build a few TCP server(in node.js) to recv
high concurrent data and send it to MQ without doing anything.
On the consumer side, we use JAVA client to get the task data from
MQ, handle it and then ack.
So the question is:
To get the maximum message passing throughput/performance( For example, 400,000 msg/second) , How many queues is best? Does that more queue means better throughput/performance? And is there anything else should I notice?
Any known best practices guide for using RabbitMQ in such scenario?
Any comments are highly appreciated!!
For best performance in RabbitMQ, follow the advice of its creators. From the RabbitMQ blog:
RabbitMQ's queues are fastest when they're empty. When a queue is
empty, and it has consumers ready to receive messages, then as soon as
a message is received by the queue, it goes straight out to the
consumer. In the case of a persistent message in a durable queue, yes,
it will also go to disk, but that's done in an asynchronous manner and
is buffered heavily. The main point is that very little book-keeping
needs to be done, very few data structures are modified, and very
little additional memory needs allocating.
If you really want to dig deep into the performance of RabbitMQ queues, this other blog entry of theirs goes into the data much further.
According to a response I once got from the rabbitmq-discuss mailing group there are other things that you can try to increase throughput and reduce latency:
Use a larger prefetch count. Small values hurt performance.
A topic exchange is slower than a direct or a fanout exchange.
Make sure queues stay short. Longer queues impose more processing
overhead.
If you care about latency and message rates then use smaller messages.
Use an efficient format (e.g. avoid XML) or compress the payload.
Experiment with HiPE, which helps performance.
Avoid transactions and persistence. Also avoid publishing in immediate
or mandatory mode. Avoid HA. Clustering can also impact performance.
You will achieve better throughput on a multi-core system if you have
multiple queues and consumers.
Use at least v2.8.1, which introduces flow control. Make sure the
memory and disk space alarms never trigger.
Virtualisation can impose a small performance penalty.
Tune your OS and network stack. Make sure you provide more than enough
RAM. Provide fast cores and RAM.
You will increase the throughput with a larger prefetch count AND at the same time ACK multiple messages (instead of sending ACK for each message) from your consumer.
But, of course, ACK with multiple flag on (http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.ack) requires extra logic on your consumer application (http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-August/029600.html). You will have to keep a list of delivery-tags of the messages delivered from the broker, their status (whether your application has handled them or not) and ACK every N-th delivery-tag (NDTAG) when all of the messages with delivery-tag less than or equal to NDTAG have been handled.

Does the redis pub/sub model require persistent connections to redis?

In a web application, if I need to write an event to a queue, I would make a connection to redis to write the event.
Now if I want another backend process (say a daemon or cron job) to process the or react the the publishing of the event in redis, do I need a persistant connection?
Little confused on how this pub/sub process works in a web application.
Basically in Redis there are two different messaging models:
Fire and Forget / One to Many: Pub/Sub. At the time a message is PUBLISH-ed all the subscribers will receive it, but this message is then lost forever. If a client was not subscribed there is no way it can get it back.
Persisting Queues / One to One: Lists, possibly used with blocking commands such as BLPOP. With lists you have a producer pushing into a list, and one or many consumers waiting for elements, but one message will reach only one of the waiting clients. With lists you have persistence, and messages will wait for a client to pop them instead of disappearing. So even if no one is listening there is a backlog (as big as your available memory, or you can limit the backlog using LTRIM).
I hope this is clear. I suggest you studying the following commands to understand more about Redis and messaging semantics:
LPUSH/RPUSH, RPOP/LPOP, BRPOP/BLPOP
PUBLISH, SUBSCRIBE, PSUBSCRIBE
Doc for this commands is available at redis.io
I'm not totally sure, but I believe that yes, pub/sub requires a persistent connection.
For an alternative I would take a peek at resque and how it handles that. Instead of using pub/sub it simply adds an item to a list in redis, and then whatever daemon or cron job you have can use the lpop command to get the first one.
Sorry for only giving a pseudo answer and then a plug.