AMQP (RabbitMQ) and passing data in work-flow situations to other consumers - rabbitmq

I'm working with RabbitMQ 3.7, and I'm finding that my microservice architecture is starting to feel tangled and coupled.
I'm finding that I'm publishing messages from within my consumer's received event to other queues. This feels wrong. But I'm not sure what the alternative is, since I benefit from the efficiency in passing the data from the consumer directly to the next queue/task.
Note that the above is just an example, and the service I'm running are similar, and fairly work-flow dependent (although they can be ran independently!)
Questions:
How is data normally passed from process to process (or consumer to publisher) in situations where the micro-services are fairly dependent on each other. Not that they can't be ran individually, but that they work best in a work-flow scenario?
If the solution involves not publishing new messages from within the received event of a consumer, then what is the proper way to get the data to that microservice/process?

I find that chaining workflows across queues can create more complex workflows than desired, where on the other hand, creating simpler consumer applications can make for more maintainable code.
Do you gain or lose any scalability or simplicity in your code by splitting the first two steps? Without more detailed info to consider, I probably would not split up the first two parts of the functionality. I don't see anything wrong with directly storing the scraping results.
I like your isolated consumer for sending email, though you might consider making a generic email sending consumer that any of your applications could use and have the message format contain the proper mail parts and have the consumer construct the mail and deliver it.
I don't think there's a "right" answer to your architecture here other than to think about finding the right balance of simplicity/complexity, scalability, and maintainability.

Related

Where to create queues and exchanges?

I'm using RabbitMQ as message broker in first time and now I have a question about when to declare queues and exchanges using rabbit's own management tool and when to do it in the code of the software? In my opinion is that it is much better to create queues and exchanges using the management tool, because it's a centralized place to add new or remove useless queues without the need to modify the actual software. I am asking some advice and opinions.
Thank you.
The short answer is: whatever works best for you.
I've worked with message brokers that required external tools for defining the topology (exchanges, queues, bindings, etc) and with RabbitMQ that allows me to define them at runtime, as needed.
I don't think either scenario is "the right way". Rather, it depends entirely on your situation.
Personally, I see a lot of value in letting my software define the topology at runtime with RabbitMQ. But there are still times when it gets frustrating because I often end up duplicating my definitions between producers and consumers.
But then, moving from development to production is easier when the software itself defines the topology. No need to pre-configure things before moving code to production.
It's all tradeoffs.
Try it however you're comfortable. Then try it the other way. See what happens, and learn which you prefer and when. Just remember that you don't have to do one or the other. You can do both if you want.

Why use Queueing systems such as RabbitMQ

I am not a senior programmer but I have been deploying applications for a while and devloped small complete systems.
I am starting to hear about queueing systems such as RabbitMQ. May be, I never developed any systems that had to use a queueing system. But, I am worried if I am not using it because I have no idea what to do with this. I have read RabbitMQ tutorial on their site but I am not sure why I would use this for. I am not sure if any of those cannot be achieved by conventional programming with no additional component and regular databases or similar.
Can someone please explain why I would use a queueing system with a small example. I mean not a hello world example, but a a practical scenario.
Thanks a lot for your time
RM
One of the key uses of middleware like message queues is to be able to send data between non homogenous systems. The messages themselves can be many things. Strings are the easiest to be understood by different languages on different systems but are often less useful for transferring more meaningful data. As a result JSON and XML are very popular for the messages. These are just structured strings that can be converted into objects in the language of choice at the consumer end.
Additional useful features:
In some MQ systems like RabbitMQ (not true in all MQ systems) is that the client handles the communication side of things very nicely.
The messages can be asynchronous. If the consumer goes down, the messages will remain until the consumer is back online.
The MQ system can be setup to varying degrees of message durability. They can be removed from the queue once read or remain until the are acknowledged. They can be persistent so even if the MQ systems goes down message will not be lost.
Here goes with some possibly contrived examples. A Java program on a local system wants to send a message to a system on the connected through the internet. The local system has a server connected to the internet. Everything is blocked coming from the internet except a connection to the MQ. The Java program can publish the message to the MQ with out needing access to the internet. The message sits on the queue until the external system picks it up. The Java program publishes a message, lets say XML, and the consumer could be a Perl program. As long as they have some way of understanding the XML with a predefined way of serialization and deserialization it will be fine.
MQ systems tend to work best in "fire-and-forget" scenarios. If an event happens and others need to be notified of it, but the source system has no need for feedback from the other systems, then MQ might be a good fit.
If you understand the pros and cons of MQ and still don't understand why it would be a good fit for a particular system, then it probably isn't. I've seen systems where MQ was used but not needed, and the result was not pretty.
Most of the scenarios I've seen where it's worked out well is integration between unrelated systems (usually out-of-the-box type system). Let's say you have one system that takes orders, and a different system that fills the orders and ships them. In that scenario, the order system can use a MQ to notify the fulfillment system of the order, but the order system has no interest in waiting until the fulfillment system receives the order. So it puts a message in a queue keep going.
This is a very simplified answer, but it gives the general ideas.
Let's think about this in terms of telephone vs. email. Pretend for a minute that email does not exist. To get work done, you must phone everyone. When you communicate with someone via telephone, you need to have them at their desk in order to reach them (assume they are in a factory and can't hear their cell phone ring) :-) If the person you wish to reach isn't at the desk, you are stuck waiting until they return your call (or far more likely, you call them back later). It's the same with you - you don't have any work to do until someone calls you up. If multiple people call at once, you don't know about it because you can only handle one person at a time.
However, if we have email, it is possible for you to "queue" your requests with someone else, to answer (but more likely ignore) at their convenience. If they do ignore your email, you can always re-send it. You don't have to wait for them to be at the desk, and they don't have to wait until you are off the phone. The workload evens out and things run much more smoothly. As an added bonus, you can forward messages that you don't want to deal with to your peons.
In systems engineering, we use the term "closely coupled" to define programs (or parts of programs) that work like the telephone scenario above. They depend very closely upon each other, often sharing implementations among various parts of the program. In these programs, data is processed in serial order, one at a time. These systems are typically easy to build, but there are a few important drawbacks to consider: (1) changing any part of the program likely will cause cascading changes throughout the code, and this introduces bugs; (2) the system is not very scalable, and typically must be scrapped and rebuilt as needs grow; (3) all parts of the system must be functioning simultaneously or the whole system will not work.
Basically, closely-coupled programs are good if the program is very simple or if there is some specialized reason to use a closely-coupled program.
In the real world, things are much more complex. Programs cannot be that simple, and it becomes a nightmare to develop enterprise applications in a closely-coupled manner. Therefore, we use the term "loosely-coupled" to define large systems that are composed of many smaller pieces. The pieces have very well-defined boundaries and functions, so that changing of the system may be accomplished more easily. It is the essence of object-oriented design. Message queues (like RabbitMQ) allow email-like communication to take place among various programs and parts of programs, thus making workflow much more like it would be with people. Adding extra capacity then becomes a simple matter of starting up and additional computer wherever you need it.
Obviously, this is a gross simplification, but I think it conveys the general idea. Building applications that use message queuing enables you to deploy massively scalable applications leveraging cloud service providers. Here is an article that talks about designing for the cloud:
http://blogs.msdn.com/b/silverlining/archive/2011/08/23/designing-and-building-applications-for-the-cloud.aspx

Does a CQRS project need a messaging framework like NServiceBus?

The last 6 months learning curve have been challenging with CQRS and DDD the main culprits.
It has been fun and we are 1/2 way through our project and the area I have not had time to delve into is a messaging framework.
Currently I don't use DTC so there is a very good likely hood that if my read model is not updated then I will have inconsistency between the read and write databases. Also my read and write database will be on the same machine. I doubt we will ever put them on separate machines.
I don't have a large volume of messages in my system so my concern is more to do with consistency and reliability of the system.
So, do I have to put in a messaging framework like NServiceBus (even though both read and write databases are on the same machine) or do I have other options? Yes there is learning curve but I suppose there would be a hell of a lot to learn if I don't use it.
Also, I don't want to put in a layer if it is not necessary
Thoughts?
Currently I don't use DTC so there is a very good likely hood that if
my read model is not updated then I will have inconsistency between
the read and write databases.
Personally, I dislike the DTC and try to avoid it. Instead, it is often possible to implement a compensation mechanism, especially for something like a read model where eventual consistency is already acceptable and updates are idempotent. For example, you could implement a version on entities and have a background task which ensures versions are in-sync. Having a DTC will provide transactional retry functionality, but it still won't solve cases where failure occurs after retries - you still have to watch the error log and have procedures in place to deal with errors.
So, do I have to put in a messaging framework like NServiceBus (even
though both read and write databases are on the same machine) or do I
have other options?
It depends on a few things. What you often encounter in a CQRS system is need for pub/sub where several sub-systems publish events to which the query/caching system subscribes to. If you see a need for pub/sub beyond basic point-to-point messaging, then go with something like NServiceBus. Also, I wouldn't immediately shy away from using NServiceBus even if you don't need it for scalability purposes because I think the logical partitioning is beneficial on its own. On the other hand, as you point out, adding layers of complexity is costly, therefore first try to see if the simplest possible thing will work.
Another question to ask is whether you need a separate query store at all. If all you have is a single machine, why bother? You could use something simpler like the read-model pattern and still reap a lot of the benefits of CQRS.
Does a CQRS project need a messaging framework like NServiceBus?
The short answer: no.
It is the first time I hear about the 'read-model pattern' mentioned by eulerfx. It is a nice enough name but there is a bit more to it:
The general idea behind the 'query' part is to query a denormalized view of your data. In the 'read-model pattern' link you will notice that the query used to populate the read-model is doing some lifting. In the mentioned example the required data manipulation is not that complex but what if it does become more complex? This is where the denomalization comes in. When you perform your 'command' part the next action is to denormalize the data and store the results for easy reading. All the heavy lifting should be done by your domain.
This is why you are asking about the messaging. There are several techniques here:
denormalized data in same database, same table, different columns
denormalized data in same database, different table
denormalized data in different database
That's the storage. How about the consistency?:
immediately consistent
eventually consistent
The simplest solution (quick win) is to denormalize your data in your domain and then after saving your domain objects through the repository you immediately save the denomarlized data to the same data store, same table(s), different columns. 100% consistent and you can start reading the denormalized data immediately.
If you really want to you can create a separate bunch of objects to transport that data but it is simpler to just write a simple query layer that returns some data carrying object provided by your data-access framework (in the case of .Net that would be a DataRow/DataTable). Absolutely no reason to get fancy. There will always be exceptions but then you can go ahead and write a data container.
For eventual consistency you will need some form of queuing and related processing. You can roll your own solution or your may opt for a service bus. That is up to you and your time / technical constraints :)
BTW: I have a free open-source service bus here:
Shuttle.Esb
documentation
Any feedback would be welcomed. But any old service bus will do (MassTransit / NServiceBus / etc.).
Hope that helps.

API Versioning and long running processes with nServiceBus and REST API

We are building a web API and using nServiceBus for messaging under the hood for all asynchronous and long running processes.
Question is when we spin off a new version of the API should we use a new set of queues?
Like, for the API version 1,
blobstore.v1.inbound
blobstore.v1.outbound
blobstore.v1.timeout
blobstore.v1.audit
and for the API version 2,
blobstore.v2.inbound
blobstore.v2.outbound
blobstore.v2.timeout
blobstore.v2.audit
Or should we strive to use the same set of queues with multiple message formats and handlers (assuming change of requirements and evolving message formats)?
I am trying to understand pros and cons in the long run from the architecture standpoint. Having a separate set of queues gives the flexibility of building, deploying and managing different API versions in isolation without worrying about compatibility and sociability.
Personally I am leaning towards to the latter but the challenges around compatibility and upgrades are not clearly understood.
If you have dealt with a similar scenario in the past, please share your experiences, thoughts, suggestions and recommendations.
Your time is much appreciated!
The more frequent your releases, the less appropriate a queue-per-version strategy becomes, and the more important backwards-compatibility becomes (both in structure and in behavior).
The decision between going with a different set of queues or a single queue to support different versions of messages depends on the extent of the difference between messages. In the versioning sample the V2 message is a pure extension of the V1 message which can be represented by interface inheritance. Subscribers of V1 messages can receive V2 messages which are proper super sets of V1 messages. In this case, it makes sense to keep the same queue and only update subscribers as needed. If the messages are drastically different it may be easier to deploy a second set of queues. This has the benefits you described, namely isolation. You don't have to worry about messing up dependent components. However, this will have a bigger impact on your system because you have to consider everything that may depend on the queues. It may be that you have to deploy multiple endpoints and services at once to make the V2 roll out complete.

Why use AMQP/ZeroMQ/RabbitMQ

as opposed to writing your own library.
We're working on a project here that will be a self-dividing server pool, if one section grows too heavy, the manager would divide it and put it on another machine as a separate process. It would also alert all connected clients this affects to connect to the new server.
I am curious about using ZeroMQ for inter-server and inter-process communication. My partner would prefer to roll his own. I'm looking to the community to answer this question.
I'm a fairly novice programmer myself and just learned about messaging queues. As i've googled and read, it seems everyone is using messaging queues for all sorts of things, but why? What makes them better than writing your own library? Why are they so common and why are there so many?
what makes them better than writing your own library?
When rolling out the first version of your app, probably nothing: your needs are well defined and you will develop a messaging system that will fit your needs: small feature list, small source code etc.
Those tools are very useful after the first release, when you actually have to extend your application and add more features to it.
Let me give you a few use cases:
your app will have to talk to a big endian machine (sparc/powerpc) from a little endian machine (x86, intel/amd). Your messaging system had some endian ordering assumption: go and fix it
you designed your app so it is not a binary protocol/messaging system and now it is very slow because you spend most of your time parsing it (the number of messages increased and parsing became a bottleneck): adapt it so it can transport binary/fixed encoding
at the beginning you had 3 machine inside a lan, no noticeable delays everything gets to every machine. your client/boss/pointy-haired-devil-boss shows up and tell you that you will install the app on WAN you do not manage - and then you start having connection failures, bad latency etc. you need to store message and retry sending them later on: go back to the code and plug this stuff in (and enjoy)
messages sent need to have replies, but not all of them: you send some parameters in and expect a spreadsheet as a result instead of just sending and acknowledges, go back to code and plug this stuff in (and enjoy.)
some messages are critical and there reception/sending needs proper backup/persistence/. Why you ask ? auditing purposes
And many other use cases that I forgot ...
You can implement it yourself, but do not spend much time doing so: you will probably replace it later on anyway.
That's very much like asking: why use a database when you can write your own?
The answer is that using a tool that has been around for a while and is well understood in lots of different use cases, pays off more and more over time and as your requirements evolve. This is especially true if more than one developer is involved in a project. Do you want to become support staff for a queueing system if you change to a new project? Using a tool prevents that from happening. It becomes someone else's problem.
Case in point: persistence. Writing a tool to store one message on disk is easy. Writing a persistor that scales and performs well and stably, in many different use cases, and is manageable, and cheap to support, is hard. If you want to see someone complaining about how hard it is then look at this: http://www.lshift.net/blog/2009/12/07/rabbitmq-at-the-skills-matter-functional-programming-exchange
Anyway, I hope this helps. By all means write your own tool. Many many people have done so. Whatever solves your problem, is good.
I'm considering using ZeroMQ myself - hence I stumbled across this question.
Let's assume for the moment that you have the ability to implement a message queuing system that meets all of your requirements. Why would you adopt ZeroMQ (or other third party library) over the roll-your-own approach? Simple - cost.
Let's assume for a moment that ZeroMQ already meets all of your requirements. All that needs to be done is integrating it into your build, read some doco and then start using it. That's got to be far less effort than rolling your own. Plus, the maintenance burden has been shifted to another company. Since ZeroMQ is free, it's like you've just grown your development team to include (part of) the ZeroMQ team.
If you ran a Software Development business, then I think that you would balance the cost/risk of using third party libraries against rolling your own, and in this case, using ZeroMQ would win hands down.
Perhaps you (or rather, your partner) suffer, as so many developers do, from the "Not Invented Here" syndrome? If so, adjust your attitude and reassess the use of ZeroMQ. Personally, I much prefer the benefits of Proudly Found Elsewhere attitude. I'm hoping I can proud of finding ZeroMQ... time will tell.
EDIT: I came across this video from the ZeroMQ developers that talks about why you should use ZeroMQ.
what makes them better than writing your own library?
Message queuing systems are transactional, which is conceptually easy to use as a client, but hard to get right as an implementor, especially considering persistent queues. You might think you can get away with writing a quick messaging library, but without transactions and persistence, you'd not have the full benefits of a messaging system.
Persistence in this context means that the messaging middleware keeps unhandled messages in permanent storage (on disk) in case the server goes down; after a restart, the messages can be handled and no retransmit is necessary (the sender does not even know there was a problem). Transactional means that you can read messages from different queues and write messages to different queues in a transactional manner, meaning that either all reads and writes succeed or (if one or more fail) none succeeds. This is not really much different from the transactionality known from interfacing with databases and has the same benefits (it simplifies error handling; without transactions, you would have to assure that each individual read/write succeeds, and if one or more fail, you have to roll back those changes that did succeed).
Before writing your own library, read the 0MQ Guide here: http://zguide.zeromq.org/page:all
Chances are that you will either decide to install RabbitMQ, or else you will make your library on top of ZeroMQ since they have already done all the hard parts.
If you have a little time give it a try and roll out your own implemntation! The learnings of this excercise will convince you about the wisdom of using an already tested library.