Spring cloud stream rabbitmq performance testing - rabbitmq

I have set up a spring cloud stream with rabbitmq binder. I want to do performance testing with the spring cloud stream. is there any way to do performance testing with that?

You can use any available performance tools as you would with an other Java application.
Do you have some SLA that you are trying to target or you just want to compare it to some other approach? Do you know message sizes, network/hardware infrastructure, bandwidth etc.?
All I am trying to say is that "performance" testing only makes sense if you know your targets, otherwise what is fast/slow?

Related

zookeeper vs redis server sync

I have a small cluster of servers I need to keep in sync. My initial thought on this was to have one server be the "master" and publish updates using redis's pub/sub functionality (since we are already using redis for storage) and letting the other servers in the cluster, the slaves, poll for updates in a long running task. This seemed to be a simple method to keep everything in sync, but then I thought of the obvious issue: What if my "master" goes down? That is where I started looking into techniques to make sure there is always a master, which led me to reading about ideas like leader election. Finally, I stumbled upon Apache Zookeeper (through python binding, "pettingzoo"), which apparently takes care of a lot of the fault tolerance logic for you. I may be able to write my own leader selection code, but I figure it wouldn't be close to as good as something that has been proven and tested, like Zookeeper.
My main issue with using zookeeper is that it is just another component that I may be adding to my setup unnecessarily when I could get by with something simpler. Has anyone ever used redis in this way? Or is there any other simple method I can use to get the type of functionality I am trying to achieve?
More info about pettingzoo (slideshare)
I'm afraid there is no simple method to achieve high-availability. This is usually tricky to setup and tricky to test. There are multiple ways to achieve HA, to be classified in two categories: physical clustering and logical clustering.
Physical clustering is about using hardware, network, and OS level mechanisms to achieve HA. On Linux, you can have a look at Pacemaker which is a full-fledged open-source solution coming with all enterprise distributions. If you want to directly embed clustering capabilities in your application (in C), you may want to check the Corosync cluster engine (also used by Pacemaker). If you plan to use commercial software, Veritas Cluster Server is a well established (but expensive) cross-platform HA solution.
Logical clustering is about using fancy distributed algorithms (like leader election, PAXOS, etc ...) to achieve HA without relying on specific low level mechanisms. This is what things like Zookeeper provide.
Zookeeper is a consistent, ordered, hierarchical store built on top of the ZAB protocol (quite similar to PAXOS). It is quite robust and can be used to implement some HA facilities, but it is not trivial, and you need to install the JVM on all nodes. For good examples, you may have a look at some recipes and the excellent Curator library from Netflix. These days, Zookeeper is used well beyond the pure Hadoop contexts, and IMO, this is the best solution to build a HA logical infrastructure.
Redis pub/sub mechanism is not reliable enough to implement a logical cluster, because unread messages will be lost (there is no queuing of items with pub/sub). To achieve HA of a collection of Redis instances, you can try Redis Sentinel, but it does not extend to your own software.
If you are ready to program in C, a HA framework which is often forgotten (but can be quite useful IMO) is the one coming with BerkeleyDB. It is quite basic but support off-the-shelf leader elections, and can be integrated in any environment. Documentation can be found here and here. Note: you do not have to store your data with BerkeleyDB to benefit from the HA mechanism (only the topology data - the same ones you would put in Zookeeper).

Replicated message queue

I am looking for a message queue which would replicate messages across a cluster of servers. I am aware that this will cause a performance hit, but that's what the requirements are - message persistence is very important.
The replication can be asynchronous, but it should be there - if there's a large backlog of messages waiting for processing, they shouldn't be lost.
So far I didn't manage to find anything from the well-known MQs. HornetQ for example supported message replication in 2.0 but in 2.2 it seems to be removed. RabbitMQ doesn't replicate messages at all, etc.
Is there anything out there that could meet my requirements?
There are at least three ways of tackling this that come to mind, depending upon how robust you need the solution to be.
One: pick any messaging tech, then replicate your disk-storage. Using something like DRBD you can have the file-backed storage copied to another machine under the covers. If your primary box dies, you should be able to restart on your second machine from the replicated files.
Two: Keep looking. There are various commercial systems that definitely do this, two such (no financial benefit on my part) are Informatica Ultra Messaging (formerly 29West) and Solace. These are commonly used in the financial community.
Three: build your own. ZeroMQ is one such toolkit that you could use to roll-your-own system from pre-built messaging blocks. Even a system that does not officially support it could fairly easily be configured to publish all messages to two queues. Your reader would have to drain both somehow, so this may well be a non-starter, but possible in any case.
Overall: do test your performance assumptions, as all of these will have various performance implications in various scenarios.
Amazon SQS is designed with this very thing in mind, but because of the consistency model (which is a part of messaging anyway), you're responsible for de-duplicating messages on the consumer side. Granted, SQS maybe somewhat slow and the costs can add up for lots of messages, but if you want to guarantee that no messages are lost, then it's a pretty solid way to go.
new Kafka 0.8.1 offers replication!

Experiences with message based master-worker frameworks (Java/Python/.Net)

I am designing a distributed master-worker system which, from 10,000 feet, consists of:
Web-based UI
a master component, responsible for generating jobs according to a configurable set of algorithms
a set of workers running on regular pc's, a HPC cluster, or even cloud
a digital repository
messaging based middleware
different categories of tasks, with running times ranging from < 1s to ~6hrs. Tasks are computation heavy, rather than data/IO heavy. The volume of tasks is not expected to be great (as far as I can see now). Probably maxing around 100/min.
Strictly speaking there is no need to move outside of the Windows ecosystem but I would be more comfortable with a cross-platform solution to keep options open (nb. some tasks are Windows only).
I have pretty much settled on RabbitMQ as a messaging layer and Fedora-commons seems to be the most mature off-the-shelf repository. As for the master/worker logic I am evaluating:
Java-based: Grails + Postgres + DOSGi or GridGain with
Zookeeper
Python-based: Django + Postgres + Celery
.net-based: ASP.NET MVC + SQL Server + NServiceBus + Sharepoint or Zentity as the repository
I have looked at various IoC/DI containers but doubt they are really the best fit for a task execution container and add extra layers/complexity. But maybe I'm wrong.
Currently I am leaning towards the python solution (keep it lightweight) but I would be interested in any experiences/suggestions people have to share, particularly with the .net stack. Open source/scalability/resilience features are plus points.
PS: A more advanced future requirement will be the ability for the user to connect directly to a running task (using a web UI) and influence its behaviour (real-time steering). A direct communication channel will be needed to do this (doing this over AMQP does not seem like a good idea).
Dirk
With respect to the master / worker logic and the Java option.
Nimble (see http://www.paremus.com/products/products_nimble.html) with its OSGi Remote Services stack might provide an interesting / agile pure OSGi approach. You still have to decided on a specific distribution mechanism. But given that the USe Case is computationally heavy & data-lite, using the Essence RMI transport that ships with Nimble RSA with a simple front end load balancer function might work really well.
An good approach to 'direct communication channel' - would be to leverage DDS - this a low latency Publication / Subscription peer to peer messaging standard - used in distributed command/control type environments. I think there is a bare-bones OSS project somewhere but we (Paremus) work with RTI in this area.
Hope the above is of background interest.

Win CE 6.0 client using WCF Services - Reduce Bandwidth

We have a Win CE 6.0 device that is required to consume services that will be provided using WCF. We are attempting to reduce bandwidth usage as much as possible and with a simple test we have found that using UDP instead of HTTP saved significant data usage.
I understand there are limitations regarding WCF on .NET Compact Framework 3.5 devices and was curious what people thought would be the appropriate way forward. Would it make sense to develop a custom UDP binding, and would that work for both sides?
Any feedback would be appreciated. Thanks.
While http does have some overhead, if this is becoming a significant part of your data usage, then I would suspect that your API is too "chatty", and maybe fewer messages (each carrying more payload) should be considered.
The next point would be; how can we reduce the bandwidth for a given amount of payload? Compression is an option, but can be a problem on some platforms. Another is to use a serialization format that is inherently dense and efficient to process (in terms of CPU cycles, since you are using low-power devices). For that purpose, something like "protocol buffers" would be ideal.
protobuf-net is a CF-compatible implementation of protocol buffers for .NET; the CF build doesn't have all the nice WCF features (because CF doesn't support them), but it can work very effectively.
Additionally, if you do go http, then MTOM should be considered, as this reduces the encoding overhead of binary data (i.e. what protobuf-net would use).
Moving to UDP can be an option, but I would try something like http + protobuf-net + MTOM first (combined with a less "chatty" API), and see how it stacks up.
I should also note that the current (downloadable) version of protobuf-net has some "kinks" with CF; it works, but it isn't as fast etc as it could be (due to limitations in meta-programming on CF). The "v2" product (not yet released) addresses all these points, allowing fully static (and fast) execution on CF. And best of all, it is free.

Spread vs MPI vs zeromq?

In one of the answers to Broadcast like UDP with the Reliability of TCP, a user mentions the Spread messaging API. I've also run across one called ØMQ. I also have some familiarity with MPI.
So, my main question is: why would I choose one over the other? More specifically, why would I choose to use Spread or ØMQ when there are mature implementations of MPI to be had?
MPI was deisgned tightly-coupled compute clusters with fast, reliable networks. Spread and ØMQ are designed for large distributed systems. If you're designing a parallel scientific application, go with MPI, but if you are designing a persistent distributed system that needs to be resilient to faults and network instability, use one of the others.
MPI has very limited facilities for fault tolerance; the default error handling behavior in most implementations is a system-wide fail. Also, the semantics of MPI require that all messages sent eventually be consumed. This makes a lot of sense for simulations on a cluster, but not for a distributed application.
I have not used any of these libraries, but I may be able to give some hints.
MPI is a communication protocol while Spread and ØMQ are actual implementation.
MPI comes from "parallel" programming while Spread comes from "distributed" programming.
So, it really depends on whether you are trying to build a parallel system or distributed system. They are related to each other, but the implied connotations/goals are different. Parallel programming deals with increasing computational power by using multiple computers simultaneously. Distributed programming deals with reliable (consistent, fault-tolerant and highly available) group of computers.
The concept of "reliability" is slightly different from that of TCP. TCP's reliability is "give this packet to the end program no matter what." The distributed programming's reliability is "even if some machines die, the system as a whole continues to work in consistent manner." To really guarantee that all participants got the message, one would need something like 2 phase commit or one of faster alternatives.
You're addressing very different APIs here, with different notions about the kind of services provided and infrastructure for each of them. I don't know enough about MPI and Spread to answer for them, but I can help a little more with ZeroMQ.
ZeroMQ is a simple messaging communication library. It does nothing else than send a message to different peers (including local ones) based on a restricted set of common messaging patterns (PUSH/PULL, REQUEST/REPLY, PUB/SUB, etc.). It handles client connection, retrieval, and basic congestion strictly based on those patterns and you have to do the rest yourself.
Although appearing very restricted, this simple behavior is mostly what you would need for the communication layer of your application. It lets you scale very quickly from a simple prototype, all in memory, to more complex distributed applications in various environments, using simple proxies and gateways between nodes. However, don't expect it to do node deployment, network discovery, or server monitoring; You will have to do it yourself.
Briefly, use zeromq if you have an application that you want to scale from the simple multithread process to a distributed and variable environment, or that you want to experiment and prototype quickly and that no solutions seems to fit with your model. Expect however to have to put some effort on the deployment and monitoring of your network if you want to scale to a very large cluster.