I am looking for a high performance inter process communication system in macos X.
What is the best system? AppleEvents or NSNotifications?
Distributed notifications (i.e. notifications sent through NSDistributedNotificationCenter) are most likely not a good option if your goal is high performance and/or reliability. Here is Apple's own take on this subject:
Posting a distributed notification is an expensive operation. The notification gets sent to a system-wide server that distributes it to all the tasks that have objects registered for distributed notifications. The latency between posting the notification and the notification’s arrival in another task is unbounded. In fact, when too many notifications are posted and the server’s queue fills up, notifications may be dropped. http://developer.apple.com/mac/library/documentation/Cocoa/Reference/Foundation/Classes/NSDistributedNotificationCenter_Class/Reference/Reference.html
Depending on what you mean by "high performance", you might want to look into distributed objects, or plain old Unix IPC mechanisms (sockets, pipes, shared memory etc).
If you control both the sender and the recipient, you can open a socket between the two processes ( man socketpair ), which is quite high performance. You can also open a file in a shared location ( like /tmp ) and write to it from one process and read from the other, which is quite speedy. You can also open two TCP/IP ports on the local machine, one in each process, and then send from one to the other "over the network".
If your only two choices are NSNotifications or AppleEvents, well, AppleEvents will likely perform better.
Related
We want to develop an application based on micro services architecture.
To communicate between various services asynchronously, we plan to use message queues(like RabbitMQ,ActiveMQ,JMS etc.,) . Is there any approach other than message queue is available to achieve inter process communication?
Thanks.
You should use Queues to handle the tasks that needs not to be completed in real time.
Append the tasks in queue and when there is a room, processor will take tasks from queue and will handle & will remove from queue.
Example :
Assuming your application deals with images, users are uploading so many images. Upload the tasks in a queue to compress the images. And when processor is free it will compress the queued images.
When you want to write some kind of logs of your system, give it to the queue and one process will take logs from queue and write that to disk. So the main process will not waste its time for the I/O operations.
Suggestion :
If you want the real time responses, you should not use the queue. You need to ping the queue constantly to read the incomings, and that is bad practice. And there is no guarantee that queue will handle your tasks immediately.
So the solutions are :
Redis cache - You can put your messages into cache and other process will read that message. Redis is "In memory data-structure". It is very fast and easy to use. Too much libraries and good resources available on the Internet, as it is open source. Read more about Redis. But here you also need to keep check whether there is some kind of message available and if available read from it, process and give response. But to read from Redis, is not very much costlier. With redis, you do not need to worry about memory management, it is well managed by open source community.
Using Sockets. Socket is very much faster, you can make this lightweight(if you want) as it is event based. One process will ping on port and other process will listen and give response. But you need to manage memory. If the buffered memory gets full, you can not put more messages here. If there are so many users producing messages, you need to manage to whom to you want to respond.
So it depends upon your requirement, like do you want to read messages constantly?, do you want to make one to one communication or many to one communication?
I am using RabbitMQ with Spring AMQP
large message (>100MB, 102400KB)
small bandwidth (<512Kbps)
low heartbeat interval (10 seconds)
single broker
It will take >= 200*8 seconds to consume the message, which is more than my heartbeat interval. From https://stackoverflow.com/a/42363685/418439
If the message transfer time between nodes (60seconds?) > heartbeat time between nodes, it will cause the cluster to disconnect and the loose the message
Will I also face the disconnection issue even I am using single broker?
Does the heartbeat and consumer using the same thread, where if
consumer is consuming, it is not possible to perform heartbeat?
If so, what can I do to consume the message, without increase heartbeat interval or reduce my message size?
Update:
I have received another answer and comments after I posted my own answer. Thanks for the feedback. Just to clarify, I do not use AMQP for file transfer. Actually the data is in JSON message, some are simple and small but some contain complex information, include some free hand drawing. Besides saving the data at Data Center, we also save a copy of message at branch level via AMQP, for case connectivity to Data Center is not available.
So, the real questions here are a bit more fundamental, and those are: (1) is it appropriate to perform a large file transfer via AMQP, and (2) what purpose does the heartbeat serve?
Heartbeats
First off, let's address the heartbeat question. As the RabbitMQ documentation clearly states, the purpose of the heartbeat is "to ensure that the application layer promptly finds out about disrupted connections."
The reason for this is simple. In an ordinary AMQP usage, there may be several seconds, even minutes between the arrival of successive messages. Without data being exchanged across a TCP session, many firewalls and other networking equipment automatically close ports to lower exposure to the enterprise network. Heartbeats further help mitigate a fundamental weakness in TCP, which is the difficulty of detecting a dropped connection. Networks experience failure, and TCP is not always able to detect that on its own.
So, the bottom line here is that, while you're transferring a large message, the connection is active and the heartbeat function serves no useful purpose, and can cause you trouble. It's best to turn it off in such cases.
AMQP For Moving Large Files?
The second issue, and I believe more important question, is how should large files be dealt with. To answer this, let's first consider what a message queue does: sending messages -- small bits of data which communicate something to another computer system. The operative word here is small. Messages typically contain one of three things: 1. commands (go do something), 2. events (something happened), 3. requests (give me some data), and 4. responses (here is your data). A full discussion on these is beyond the scope, but suffice it to say that each of these can generally be composed of a small message less than 100kB.
Indeed, the AMQP protocol, which underlies RabbitMQ, is a fairly chatty protocol. It requires large messages be divided into multiple segments of no more than 131kB. This can add a significant amount of overhead to a large file transfer, especially when compared to other file transfer mechanisms (FTP, for instance). Secondly, the message has to be fully processed by the broker before it is made available in a queue, and it ties up valuable resources on the broker while this is being done. For one, the whole message must fit into RAM on the broker due to its architecture. This solution may work for one client and one broker, but it will break quickly when scaling out is attempted.
Finally, compression is often desirable when transferring files - HTTP supports gzip compression automatcially. AMQP does not. It is quite common in message-oriented applications to send a message containing a resource locator (e.g. URL) pointing to the larger data file, which is then accessed via appropriate means.
The moral of the story
As the adage goes: "to the man with a hammer, everything looks like a nail." AMQP is not a hammer- it's a precision scalpel. It has a very specific purpose, and narrow applicability within that purpose. Using it for something other than its intended purpose will lead to stability and reliability problems in whatever it is you are designing, and overall dissatisfaction with your end product.
Will I also face the disconnection issue even I am using single
broker?
Yes
Does the heartbeat and consumer use the same thread, where
if consumer is consuming, it is not possible to perform heartbeat?
Can't confirm the thread, but from what I observe when Java RabbitMQ consumer consumes a message, it won't perform heartbeat acknowledgement. If the time to consume longer than 3 x heartbeat timeout timer (due to large message and/or low bandwidth), MQ server will close AMQP connection.
If so, what can I do to consume the message, without increase
heartbeat interval or reduce my message size?
I resolved my issue by increasing heartbeat size. No further code change is required.
I'm looking to write a toy application for my own personal use (and possibly to share with friends) for peer-to-peer shared status on a local network. For instance, let's say I wanted to implement it for the name of the current building you're in (let's pretend the network topology is weird, and multiple buildings occupy the same LAN). The idea is if you run the application, you can set what building you're in, and you can see the buildings of every other user running the application on the local network.
The question is, what's the best transport/network layer technology to use to implement this?
My initial inclination was to use UDP Multicast, but the more research I do about it, the more I'm scared off by it: while the technology is great and seems easy to use, if the application is not tailored for a particular site deployment, it also seems most likely to get you a visit from an angry network admin.
I'm wondering, therefore, since this is a relatively low bandwidth application — probably max one update every 4–5 minutes or so from each client, with likely no more than 25–50 clients — whether it might be "cheaper" in many ways to use another strategy:
Multicast: find a way to pick a well-known multicast address from 239.255/16 and have interested applications join the group when they start up.
Broadcast: send out a single UDP Broadcast message every time someone's status changes (and one "refresh" broadcast when the app launches, after which every client replies directly to the requesting user with their current status).
Unicast: send a UDP Broadcast at application start to announce interest, and when a client's status changes, it sends a UDP packet directly to every client who has announced. This results in the highest traffic, but might be less likely to annoy other systems with needless broadcast packets. It also introduces potential complications when apps crash (in terms of generating unnecessary traffic).
Multicast is most certainly the best technology for the job, but I'm wondering if the associated hassles are worth avoiding since this is just a "toy application," not a business-critical service intended for professional network admin deployment and configuration.
I've been trying to read up on the DDS standard, and OpenSplice in particular and I'm left wondering about the architecture.
Does DDS require that a broker be running, or any particular daemon to manage message exchange and coordination between different parties?
If I just launch a single process publishing data for a topic, and launch another process subscribing for the same topic, is this sufficient? Is there any reason one might need another process running?
In the alternative, does it use UDP multicasting to have some sort of automated discovery between publishers and subscribers?
In general, I'm trying to contrast this to traditional queue architectures such as MQ Series or EMS.
I'd really appreciate it if anybody could help shed some light on this.
Thanks,
Faheem
DDS doesn't have a central broker, it uses a multicast based discovery protocol. OpenSplice has a model with a service for each node, but that is an implementation detail, if you check for example RTI DDS, they don't have that.
DDS specification is designed so that implementations are not required to have any central daemons. But of course, it's a choice of implementation.
Implementations like RTI DDS, MilSOFT DDS and CoreDX DDS have decentralized architectures, which are peer-to-peer and does not need any daemons. (Discovery is done with multicast in LAN networks). This design has many advantages, like fault tolerance, low latency and good scalability. And also it makes really easy to use the middleware, since there's no need to administer daemons. You just run the publishers and subscribers and the rest is automatically handled by DDS.
OpenSplice DDS used to require daemon services running on each node, but they have added a new feature in v6 so that you don't need daemons anymore. (They still support the daemon option).
OpenDDS is also peer-to-peer, but it needs a central daemon running for discovery as far as I know.
Think its indeed good to differentiate between a 'centralized broker' architecture (where that broker could be/become a single-point of failure) and a service/daemon on each machine that manages the traffic-flows based on DDS-QoS's such as importance (DDS:transport-priority) and urgency (DDS: latency-budget).
Its interesting to notice that most people think its absolutely necessary to have a (real-time) process-scheduler on a machine that manages the CPU as a critical/shared resource (based on timeslicing, priority-classes etc.) yet that when it comes to DDS, which is all about distributing information (rather than processing of application-code), people find it often 'strange' that a 'network-scheduler' would come in 'handy' (the least) that manages the network(-interface) as a shared-resource and schedules traffic (based on QoS-policy driven 'packing' and utilization of multiple traffic-shaped priority-lanes).
And this is exactly what OpenSplice does when utilizing its (optional) federated-architecture mode where multiple applications that run on a single-machine can share data using a shared-memory segment and where there's a networking-service (daemon) for each physical network-interface that schedules the in- and out-bound traffic based on its actual QoS policies w.r.t. urgency and importance. The fact that such a service has 'access' to all nodal information also facilitates combining different samples from different topics from different applications into (potentially large) UDP-frames, maybe even exploiting some of the available latency-budget for this 'packing' and thus allowing to properly balance between efficiency (throughput) and determinism (latency/jitter). End-to-End determinism is furthermore facilitated by scheduling the traffic over pre-configured traffic-shaped 'priority-lanes' with 'private' Rx/Tx threads and DIFSERV settings.
So having a network-scheduling-daemon per node certainly has some advantages (also as it decouples the network from faulty-applications that could be either 'over-productive' i.e. blowing up the system or 'under-reactive' causing system-wide retransmissions .. an aspect thats often forgotten when arguing over the fact that a 'network-scheduling-daemon' could be viewed as a 'single-point-of-failure' where as the 'other view' could be that without any arbitration, any 'standalone' application that directly talks to the wire could be viewed as a potential system-thread when it starts misbehaving as described above for ANY reason.
Anyhow .. its always a controversial discussion, thats why OpenSplice DDS (as of v6) supports both deployment modes: federated and non-federated (also called 'standalone' or 'single process').
Hope this is somewhat helpful.
On my team at work, we use the IBM MQ technology a lot for cross-application communication. I've seen lately on Hacker News and other places about other MQ technologies like RabbitMQ. I have a basic understanding of what it is (a commonly checked area to put and get messages), but what I want to know what exactly is it good at? How will I know where I want to use it and when? Why not just stick with more rudimentary forms of interprocess messaging?
All the explanations so far are accurate and to the point - but might be missing something: one of the main benefits of message queueing: resilience.
Imagine this: you need to communicate with two or three other systems. A common approach these days will be web services which is fine if you need an answers right away.
However: web services can be down and not available - what do you do then? Putting your message into a message queue (which has a component on your machine/server, too) typically will work in this scenario - your message just doesn't get delivered and thus processed right now - but it will later on, when the other side of the service comes back online.
So in many cases, using message queues to connect disparate systems is a more reliable, more robust way of sending messages back and forth. It doesn't work well for everything (if you want to know the current stock price for MSFT, putting that request into a queue might not be the best of ideas) - but in lots of cases, like putting an order into your supplier's message queue, it works really well and can help ease some of the reliability issues with other technologies.
MQ stands for messaging queue.
It's an abstraction layer that allows multiple processes (likely on different machines) to communicate via various models (e.g., point-to-point, publish subscribe, etc.). Depending on the implementation, it can be configured for things like guaranteed reliability, error reporting, security, discovery, performance, etc.
You can do all this manually with sockets, but it's very difficult.
For example: Suppose you want to processes to communicate, but one of them can die in the middle and later get reconnected. How would you ensure that interim messages were not lost? MQ solutions can do that for you.
Message queueuing systems are supposed to give you several bonuses. Among most important ones are monitoring and transactional behavior.
Transactional design is important if you want to be immune to failures, such as power failure. Imagine that you want to notify a bank system of ATM money withdrawal, and it has to be done exactly once per request, no matter what servers failed temporarily in the middle. MQ systems would allow you to coordinate transactions across multiple database, MQ and other systems.
Needless to say, such systems are very slow compared to named pipes, TCP or other non-transactional tools. If high performance is required, you would not allow your messages to be written thru disk. Instead, it will complicate your design - to achieve exotic reliable AND fast communication, which pushes the designer into really non-trivial tricks.
MQ systems normally allow users to watch the queue contents, write plugins, clear queus, etc.
MQ simply stands for Message Queue.
You would use one when you need to reliably send a inter-process/cross-platform/cross-application message that isn't time dependent.
The Message Queue receives the message, places it in the proper queue, and waits for the application to retrieve the message when ready.
reference: web services can be down and not available - what do you do then?
As an extension to that; what if your local network and your local pc is down as well?? While you wait for the system to recover the dependent deployed systems elsewhere waiting for that data needs to see an alternative data stream.
Otherwise, that might not be good enough 'real time' response for today's and very soon in the future Internet of Things (IOT) requirements.
if you want true parallel, non volatile storage of various FIFO streams(at least at some point along the signal chain) use an FPGA and FRAM memory. FRAM runs at clock speed and FPGA devices can be reprogrammed on the fly adding and taking away however many independent parallel data streams are needed(within established constraints of course).