Is Message Queuing the right strategy for a high-bandwidth data feed?

Is Message Queuing the right strategy for a high-bandwidth data feed? - api

I have a huge network of data-collection servers which generate a large volume of real-time data.
In the past I've provided partners with the ability to get this data in near-real-time using HTTP GET's. But for many reasons I'm eager to ditch this.
So yeah... I'm eager to build out a new distribution system and I was thinking that a Message Queuing System was the way to go.
I need to be able to distribute data from my sources to a number of different partners. Some partners receive all of it, others just get a portion. And, if a partner gets disconnected, they need to be able to reconnect and not miss any data. (Although, for the sake of disk and memory I'd like their queued messages to expire after hour or so)
Lastly I need the system to be able to handle tens of thousands of enqueue's per minute.
Do you think Message Queuing is an appropriate scheme?
I was looking at using RabbitMQ. Is it difficult to maintain?
Thanks Very Much!
-Z

I cannot tell you if it is the right strategy in your specific case, but message products are indeed used in high message rate systems every day.
Much of the investment world uses various products, both commercial (Tibco) and Open source (ZeroMQ) to name just two, to handle market data from exchanges and other sources. These are likely at least as active as your data sensors are.
The publish/subscribe model, where some receivers want some messages and some receivers want all, along with late-join or other so-called guaranteed messaging are indeed standard features on most of these products.
So do go ahead and investigate products, I have not used RabbitMQ myself, so cannot comment on it specifically, however with a minimal abstraction layer, you should be able to insulate yourself from too many platform specific calls, and therefore allow you to swap message-bus implementers if the need arises. (You may even want to build such a shim as part of a proof-of-concept to test out more than one product for your specific purpose. You get experience in multiple products, flesh out the facade layer, and get up to speed on the products)
Good Luck

Related

Hashgraph, what's it and how does it work?

Does anyone know what's a hashgraph and the difference from blockchain? If you could explain it to me how it works, I would be pleased!
Thank you in advance!

Hedera Hashgraph is very much different from a regular Blockchain (which is a chain of blocks as its name suggests), it is a DAG (Directed Acyclic Graph).
You can read more about what a DAG is in the link provided but apart from that how Hedera is different is that it is solving the below problems that any DLT should overcome to become Scalable, Secure and Fast without losing Trust:
(1) Performance,
(2) Security,
(3) Governance,
(4) Stability,
(5) Regulatory Compliance.
How is it solving all this simultaneously without losing Trust is something that you can read more in their whitepaper which explains beautifully or refer this article from Rejolut.
One primary problem with any Public Blockchain is the TPS or Transaction per second it can handle and Hedera Hashgraph is miles ahead of the others with being able to clock 100,000 TPS which makes it valuable and preferred option as it offers services like Crypto Service, File Service, Smart Contract Service and Consensus Service.
You can have a look at some of the explorers like Kabuto and Dragonglass to get an idea of the transaction speed and other details.
If you want to play around with Hedera Hashgraph, their docs are very well written or you can use console playground to work on Smart contract and Consensus services directly and maybe use fairorder to do the account management for you to begin with.

Hedera does not use blockchain. It uses DAG, "Directed Acyclic Graph. it keeps the records of communication of all the nodes in Dag and time stamps all these messages. Sending messages between nodes is called gossiping`. Its source code is not open source.
How does the gossip protocol work
In hashgraph, when sending information between nodes, "Alice" will
choose another member at random, such as "Bob". Then Alice will tell
Bob all of the information she knows so far. Alice then repeats this
information with a different random member. Bob repeatedly does the
same, and all other members do the same. In this way, if a single
member becomes aware of new information, it will spread exponentially
fast through the community, until every member is aware of it.
The synchronization of information between two members through the
gossip protocol is called a gossip sync. Upon completion of a gossip
sync, each participating member commemorates the gossip sync with an
event. An event is stored in memory as a data structure composed of a
timestamp, an array of zero or more transactions, two parent hashes,
and a cryptographic signature. The two parent hashes are the hash of
the last event created by the self-parent prior to the gossip sync and
the hash of the last event created by the other-parent prior to the
gossip sync.
Gossiping is the basis of the hashgrap consensus algorithm which uses asynchronous byzantine fault tolerant (abft) . this indicates that there is no specific time when all the nodes reach consensus. In other words abft consensus mechanism has no block time. (there is no blockchain here. neither a block reward)
Since gossiping the fastest way of sperading information, it takes 3-5 seconds for the entire network to reach a consensus.
Hedera claims that the network achieves 10,000 HBAR cryptocurrency transactions per second. (i personally do not believe those claims)
You can also write smart contracts on Hedera. But you cannot execute 10000 smart contract operations per second because Hedera uses EVM and EVM is very slow.
Hedera has no slashing. If validators misbehave, they will not lose any of their crypto holdings. because they claim that hedera hashgraph gives validators less opportunity to misbehave. validators receive the 90% of the network fees.

ISO-8583 message processing(defining priority of messages)

I need to get an understanding of ISO-8583 message platform,lets say i want to perform a authorization of a card transaction,so in real time at a particular instance lets say i got 100000 requests from network(VISA/MASTERCARD) all for authorization,how do i define priority of there request and the response,can the connection pool handle it(in my case its HIKARI),how is it done banks/financial institutions for authorizing a request.Please provide me some insights on how to manage all these requests.Should i go for a MQ?
Tech used are:-spring boot,hibernate,spring-tcp-starter

Your question doesn't seem to be very well researched as there are a ton of switch platforms out there that due this today and many of their technology guides can be found on the web including for major vendors like ACI, FIS, AJB,.. etc if you look yard enough.
I have worked with several iso-interface specifications, commercial switches, and home grown platforms and it is actually pretty consistent in how they do the core realtime processing.
This information on prioritization is generally in each ISO-8583 message processing specification and is made explicitly clear in almost every specification I've ever read written by someone who is familar with ISO-8533 and not just making up their own variant or copying someone elses.
That said.. in general at a high level authorizations / financials (0100, 0200) requests always have high priority than force posts (0x20) messages.
Administrative messages in the 05xx and 06xx and 08xx sometimes also get bumped up above other advices.. but these are still advices and almost always auths/financials are always processed first as they A) Impact the customer B) have much tighter timers than any advice by usually more than double or more.
Most switches I have seen do it entirely in memory without going to MQ and or some other disk for core authorization process to manage these.. but not to say there is not some sort of home grown middle ware sometimes involved.. but non-realtime processes regularly use a MQ process to queue or disk queuing these up into processes not in-line of the approval for this Store-and-forward (SAF) processing.. but many of these still use memory only processing for the front of their queue.
It is important to also differentiate between 100000 requests and 100000 transactions.. the various exchanges both internal and external make a big difference in the number of actual requests/responses in flight at even given time.. a basic transaction can be accomplished in like two messages.. but some of the more complex ones can easily exceed 20 messages just for a pre-authorization or a completion component.
If you are dealing with largely batch transaction bursts.. I can see the challenge of queuing but almost every application I have seen has a max in flight for advices and requests separate of each other.. and sometimes even with different timers.. and the apps pumping the transactions almost always wait for the response back before sending more.. and this tends to work fine for just about everyone.. including big posting batches from retailers and card networks. So if your app doesn't have them.. you probably need to add them.

In fact your 100000 requests should be sorted by (Terminal ID and/or Merchant ID) + (timestamp/local timestamp) + (STAN and/or RRN).
Duplicated transaction requests expected to be rejected.
If you simulating multiple requests from single terminal (or host) with same test card details the increasing of STAN/RRN would be a case.
Please refer to previous answers about STAN and RRN ISO 8583 fields.
In ISO message, what's the use of stan and rrn ?

Is there a RabbitMQ pattern for a client election

Is there a way to have a pub/sub queue in RabbitMq in which any of the subscribers could vote and all give a thumbs up (or more importantly a thumbs down) before processing continues?
I am not sure what to call this so It is very hard to research.
I am trying to make subscribers that can be added and have the ability to veto a process without knowing about them up front.
{edit below}
I am essentially trying to build a distributed set of services that could filter in very specific use cases as they are discovered. I am trying to do this so I do not have to down my service and version every time one of these new use cases is discovered.
Here is a contrived example but it gets the point across:
Lets say I want to calculate if a number is Prime.
I would like to have a service that has general easy rules, (is factor of 2? is factor of 3?)
But lets say that we are getting into very large numbers, and I find a new algorithm that is faster for finding specific cases.
I would like to be able to add this service, have it subscribe to the feed and be able to trigger "NotPrime" and have all other subscribers abandon their work as a result of the veto.
In a monolithic world I would look at some sort of plug in framework and have that implement the filters. That seems like mixing strategies in a bad way if were to do the same within a micro service.

processing messages

I use a queue per message type. I have tended to create a windows service per queue to process those messages. Is this the best use of resources? I suspect not. How do you decide how many processes should service a queue(s)?

One thing to consider here is service levels. Does all of the data represented by the message types require identical processing service levels? Are some messages more important than others? Do some messages have latency requirement for delivery? Are some messages critical to the business whereas others not? Are the expected volumes of all message types different?
Currently the way you have things set up means that you can manage each of your message type channels as a separate concern, which allows you maximum flexibility to support all possible service level scenarios. However this comes as a cost of higher resource cost/more moving parts.
I would say that unless resource usage is a concern, then your set up is the best possible as you decouple your data processing channels from one another very effectively in this way.

What is an MQ and why do I want to use it?

On my team at work, we use the IBM MQ technology a lot for cross-application communication. I've seen lately on Hacker News and other places about other MQ technologies like RabbitMQ. I have a basic understanding of what it is (a commonly checked area to put and get messages), but what I want to know what exactly is it good at? How will I know where I want to use it and when? Why not just stick with more rudimentary forms of interprocess messaging?

All the explanations so far are accurate and to the point - but might be missing something: one of the main benefits of message queueing: resilience.
Imagine this: you need to communicate with two or three other systems. A common approach these days will be web services which is fine if you need an answers right away.
However: web services can be down and not available - what do you do then? Putting your message into a message queue (which has a component on your machine/server, too) typically will work in this scenario - your message just doesn't get delivered and thus processed right now - but it will later on, when the other side of the service comes back online.
So in many cases, using message queues to connect disparate systems is a more reliable, more robust way of sending messages back and forth. It doesn't work well for everything (if you want to know the current stock price for MSFT, putting that request into a queue might not be the best of ideas) - but in lots of cases, like putting an order into your supplier's message queue, it works really well and can help ease some of the reliability issues with other technologies.

MQ stands for messaging queue.
It's an abstraction layer that allows multiple processes (likely on different machines) to communicate via various models (e.g., point-to-point, publish subscribe, etc.). Depending on the implementation, it can be configured for things like guaranteed reliability, error reporting, security, discovery, performance, etc.
You can do all this manually with sockets, but it's very difficult.
For example: Suppose you want to processes to communicate, but one of them can die in the middle and later get reconnected. How would you ensure that interim messages were not lost? MQ solutions can do that for you.

Message queueuing systems are supposed to give you several bonuses. Among most important ones are monitoring and transactional behavior.
Transactional design is important if you want to be immune to failures, such as power failure. Imagine that you want to notify a bank system of ATM money withdrawal, and it has to be done exactly once per request, no matter what servers failed temporarily in the middle. MQ systems would allow you to coordinate transactions across multiple database, MQ and other systems.
Needless to say, such systems are very slow compared to named pipes, TCP or other non-transactional tools. If high performance is required, you would not allow your messages to be written thru disk. Instead, it will complicate your design - to achieve exotic reliable AND fast communication, which pushes the designer into really non-trivial tricks.
MQ systems normally allow users to watch the queue contents, write plugins, clear queus, etc.

MQ simply stands for Message Queue.
You would use one when you need to reliably send a inter-process/cross-platform/cross-application message that isn't time dependent.
The Message Queue receives the message, places it in the proper queue, and waits for the application to retrieve the message when ready.

reference: web services can be down and not available - what do you do then?
As an extension to that; what if your local network and your local pc is down as well?? While you wait for the system to recover the dependent deployed systems elsewhere waiting for that data needs to see an alternative data stream.
Otherwise, that might not be good enough 'real time' response for today's and very soon in the future Internet of Things (IOT) requirements.
if you want true parallel, non volatile storage of various FIFO streams(at least at some point along the signal chain) use an FPGA and FRAM memory. FRAM runs at clock speed and FPGA devices can be reprogrammed on the fly adding and taking away however many independent parallel data streams are needed(within established constraints of course).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas