Is there a limit to the number of exchanges for rabbitmq? - rabbitmq

Could not find anything about this in either the docs or on google, except that it should be bound to the available resources of the server.
Does anyone have experience with really large numbers of exchanges at a time in a working environment? Just creating exchanges should not be the issue (simply until the memory limit is reached) but to use it in a working project with high-message throughput and mostly dynamic exchange creation/deletion.

Given how everything else in RabbitMQ is built (and knowing that it's written in Erlang and uses services like Mnesia internally) there probably isn't any hardcoded limit. You'll probably hit a resource limit on your broker's machine before anything else.
If you plan on using non-persistent exchanges (that is, ones that don't survive a broker restart) you could likely create very large numbers of them. Why don't you use the HTTP management API to write a script that just keeps creating exchanges using curl and see how far you get?

Related

Highly available and load balanced ActiveMQ cluster

Please be aware that I am a relative newbie to ActiveMQ.
I am currently working with a small cluster of ActiveMQ (version 5.15.x) nodes (< 5). I recently experimented with setting up the configuration to use "Shared File System Master Slave" with kahadb, in order to provide high availability to the cluster.
Having done so and seeing how it operates, I'm now considering whether this configuration provides the level of throughput required for both consumers/producers, as only one broker's ports are available at one time.
My question is basically two part. First, does it make sense to configure the cluster as highly available AND load balanced (through Network of Brokers)? Second, is the above even technically viable, or do I need to review my design consideration to favor one aspect over the other?
I had some discussions with the ActiveMQ maintainers in IRC on this topic a few months back.
It seems that they would recommend using ActiveMQ Artemis instead of ActiveMQ 5.
Artemis has a HA solution:
https://activemq.apache.org/artemis/docs/latest/clusters.html
https://activemq.apache.org/artemis/docs/latest/ha.html
The idea is to use Data Replication to allow for failover, etc:
When using replication, the live and the backup servers do not share the same data directories, all data synchronization is done over the network. Therefore all (persistent) data received by the live server will be duplicated to the backup.
And, I think you want to have at least 3 nodes (or some odd number) to avoid issues with split brain during network partitions.
It seems like Artemis can mostly be used as a drop-in replacement for ActiveMQ; it can still speak the OpenWire protocol, etc.
However, I haven't actually tried this yet, so YMMV.

How to handle resource limits for apache in kubernetes

I'm trying to deploy a scalable web application on google cloud.
I have kubernetes deployment which creates multiple replicas of apache+php pods. These have cpu/memory resources/limits set.
Lets say that memory limit per replica is 2GB. How do I properly configure apache to respect this limit?
I can modify maximum process count and/or maximum memory per process to prevent memory overflow (thus the replicas will not be killed because of OOM). But this does create new problem, this settings will also limit maximum number of requests that my replica could handle. In case of DDOS attack (or just more traffic) the bottleneck could be the maximum process limit, not memory/cpu limit. I think that this could happen pretty often, as these limits are set to worst case scenario, not based on average traffic.
I want to configure autoscaler so that it will create multiple replicas in case of such event, not only when the cpu/memory usage is near limit.
How do I properly solve this problem? Thanks for help!
I would recommend doing the following instead of trying to configuring apache to limit itself internally:
Enforce resource limits on pods. i.e let them OOM. (but see NOTE*)
Define an autoscaling metric for your deployment based on your load.
Setup a namespace wide resource-quota. This enforeces a clusterwide limit on the resources pods in that namespace can use.
This way you can let your Apache+PHP pods handle as many requests as possible until they OOM, at which point they respawn and join the pool again, which is fine* (because hopefully they're stateless) and at no point does your over all resource utilization exceed the resource limits (quotas) enforced on the namespace.
* NOTE: This is only true if you're not doing fancy stuff like websockets or stream based HTTP, in which case an OOMing Apache instance takes down other clients that are holding an open socket to the instance. If you want, you should always be able to enforce limits on apache in terms of the number of threads/processes it runs anyway, but it's best not to unless you have solid need for it. With this kind of setup, no matter what you do, you'll not be able to evade DDoS attacks of large magnitudes. You're either doing to have broken sockets (in the case of OOM) or request timeouts (not enough threads to handle requests). You'd need far more sophisticated networking/filtering gear to prevent "good" traffic from taking a hit.

server-to-server multicast messaging with Google Cloud PubSub?

I have a cluster of backend servers on GCP, and they need to send messages to each other. All the servers need to receive every message, but I can tolerate a low error rate. I can deal with receiving the message more than once on a given server. Packet ordering doesn't matter.
I don't need much of a persistence layer. A message becomes stale within a couple of seconds after sending it.
I wired up Google Cloud PubSub and pretty quickly realized that for a given subscription, you can have any number of subscribers but only one of them is guaranteed to get the message. I considered making the subscribers all fail to ack it, but that seems like a gross hack that probably won't work well.
My server cluster is sized dynamically by an autoscaler. It spins up VM instances as needed, with dynamic hostnames and IP addresses. There is no convenient way to map the dynamic hosts to static subscriptions, but it feels like that's my only real option: Create more subscriptions than my max server pool size, and then use some sort of paxos system (runtime config, zookeeper, whatever) to allocate servers to subscriptions.
I'm starting to feel that even though my use case feels really simple ("Every server can multicast a message to every other server in my group"), it may not be a good fit for Cloud PubSub.
Should I be using GCM/FCM? Or some other technology?
Cloud Pub/Sub may or may not be a fit for you, depending on the size of your server cluster. Failing to ack the messages certainly won't work because you can't be sure each instance will get the message; it could just be redelivered to the same instance over and over again.
You could use multiple subscriptions and have each instance create a new subscription when it starts up. This only works if you don't plan to scale beyond 10,000 instances in your cluster, as that is the maximum number of subscriptions per topic allowed. The difficulty here is in cleaning up subscriptions for instances that go down. Ones that cleanly shut down could probably delete their own subscriptions, but there will always be some that don't get cleaned up. You'd need some kind of external process that can determine if the instance for each subscription is still up and running and if not, delete the subscription. You could use GCE shutdown scripts to catch this most of the time, though there will still be edge cases where deletes would have to be done manually.

RabbitMQ - federated queues Vs exchange federation

I have set up a rabbit cluster and I publish messages into a fanout exchange every time something changes in a database.
I have dedicated queues bound to this exchange for some of my microservices that consume these updates and I also originally set up a dedicated queue for an external client so that they can federate it with their own rabbit infrastructure and consume a copy of every message.
Now I'm wondering whether allowing exchange federation rather than creating a new dedicated queue for each new external consumer would be a better approach since more and more users will come.
What are the pros and cons?
Thanks
As long as you manage permissions properly, the final decision is up to you. You can give a try to all variants first and find what will fit your actual needs.
Having local queue may have it pros and cons: it allows end-user to survive some outage with their infrastructure or network issue at the cost of your disk/memory, however, you may limit queue length and/or size.
I'd suggest you to take a look at Shovel plugin and Dynamic shovels. With local queue it may server a good job.
Comparing to federation, shovel is much simpler, e.g. it doesn't sync content between upstream and downstream but simply moves message from one queue to another in a reliable manner. As long as you don't need what federation provides, shovel could be a good choice.
Also, you may find this q/a useful (however, it might be a bit outdated) - https://stackoverflow.com/a/19357272.

Gathering distributed data into central database

I was assigned to update existing system of gathering data coming from points of sale and inserting it into central database. The one that is working now is based on FTP/SFTP transmission, where the information is sent once a day, usually at night. Unfortunately, because of unstable connection links (low quality 2G/3G modems), some of the files appear to be broken. With just a few shops connected that way everything was working smooth, but along with increasing number of shops, errors became more often. What is worse, the time needed to insert data into central database is taking up to 12 - 14h (including waiting for the data to be downloaded from all of the shops) and that cannot happen during the working day as it would block the process of creating sale reports and other activities with the database - so we are really tight with processing time here.
The idea my manager suggested is to send the data continuously, during the day. Data packages would be significantly smaller, so their transmission and insertion would be much faster, central server would contain actual (almost real time) data and night could be used for long running database activities like creating backups, rebuilding indexes etc.
After going through many websites, I found that:
using ASMX web service is now obsolete and WCF should be used instead
WCF with MSMQ or System Messaging could be used to safely transmit data, where I don't have to care that much about acknowledging delivery of data, consistency, nodes going offline etc.
according to http://blogs.msdn.com/b/motleyqueue/archive/2007/09/22/system-messaging-versus-wcf-queuing.aspx WCF queuing is better
there are also other technologies for implementing message queue, like RabbitMQ, ZeroMQ etc.
And that is where I become confused. With so many options, do you have any pros and cons of these technologies?
We were using .NET with Windows Forms and SQL Server, but if it would be necessary, we could change to something more suitable. I am also a bit afraid of server efficiency. After some calculations, server would be receiving about 15 packages of data per second (peak). Is it much? I know there are many websites without serious server infrastructure, that handle hundreds of visitors online and still run smooth, but the website mainly uploads data to the client, and here we would download it from the client.
I also found somewhat similar SO question: Middleware to build data-gathering and monitoring for a distributed system
where DDS was mentioned. What do you think about introducing some middleware servers that would cope with low quality links to points of sale, so the main server would not be clogged with 1KB/s transmission?
I'd be grateful with all your help. Thank you in advance!
Rabbitmq can easily cope with thousands of 1kb messages per second.
As your use case is not about processing real time data, I'd say you should combine few messages and send them as a batch. That would be good enough in order to spread load over the day.
As the motivation here is not to process the data in real time, then any transport layer would do the job. Even ftp/sftp. As rabbitmq will work fine here, it's not the typical use case for it.
As you mentioned that one of your concerns is slow/unreliable network, I'd suggest to compress the files before sending them, and on the receiving end, immediately verify their integrity. Rsync or similar will probably do great job in doing that.
From what I understand, you have basically two problems:
Potential for loss/corruption of call data
Database write performance
The potential for loss/corruption of call data is being caused by a lack of reliability in the transmission of data from client to service.
And it's not clear what is causing the database contention/performance issues, beyond a vague reference to high volumes, so this answer will be more geared towards solving the first problem.
You have correctly identified the need for reliable asynchronous communication transport as a way to address the reliability issues in your current setup.
Looking at MSMQ to deliver this is a valid first step. MSMQ provides reliable communication via a store and forward messaging semantic which comes out of the box and requires very little in the way of configuration.
Unfortunately, while suitable for your needs, MSMQ relies on 2 things:
A reliable network protocol, and
A client service running on both sending and receiving machine.
From your description above, I don't believe 1 exists (the internet is not a reliable network), and you might well struggle with 2 - MSMQ only ships with Windows Server or business/enterprise versions of Windows on the desktop.(*see below...)
As a possible solution to the network reliability problem, you could use a WCF or a RESTful endpoint (using Nancy or WebApi) to expose a service operation(s) exposed over HTTP, which would accept the incoming calls from the client machines. These technologies are quite different, so you'll need to make sure you're making the correct choice early on.
WCF supports WS-ReliableMessaging from the SOAP 1.2 specification out of the box, which allows for reliable web service calls over http, however it's very config-heavy and not generally a nice framework to work with.
REST much simpler than WCF in .Net, is very lightweight and easy to use. However, for reliable delivery you would have to expose some kind of GET operation (in addition to a POST to allow the client to send data) to be called (within a reasonable time-frame) to verify the data was committed. The client would have to implement some kind of retry semantic if the result of the GET "acknowledgement" was negative.
Despite requiring two operations rather than one for the WCF route, I would favour the REST approach. I've done plenty of both and find REST services way nicer to work with.
(*) That's not to say that MSMQ wouldn't work in your ultimate solution, just that it would not be used to address the transmission reliability issue. However it could still be used to address another of your problems, that of database write contention. If you were to queue incoming requests once they came into the server, then these could be processed by an "offline" process, which could then perform the required database operations in a reliable manner. This could be done by using MSMQ transactional queues.
In response to comments:
99% messages are passed from shop to main server, but if some change
is needed (price correction, discounts etc.), that data has to be sent
to shop.
This kind of changes things. Had I understood from the beginning that you had a bidirectional requirement, and seeing as how you have managed to establish msmq communication, I would have nudged you towards NServiceBus, which is a really, really cool wrapper around MSMQ. The reason I would have done this is that you appear to have both a one way, and a publish-subscribe requirement, which is supported really nicely by NServiceBus.