How do I subscribe to a reactive streams implementation running on a different JVM? - jvm

Let's assume we have two Akka Stream flows, each running on its own JVM.
// A reactive streams publisher running on JVM 1:
val stringPublisher: Publisher[String] = Source(() => "Lorem Ipsum".split("\\s").iterator).runWith(Sink.publisher[String])
// A reactive streams subscriber running on JVM 2:
def subscriber: Subscriber[String] = Sink.foreach[String](println(_)).runWith(Source.subscriber[String])
// Subscribe the second stream to the first stream
stringPublisher.subscribe(subscriber)
This example runs fine on one JVM, but how can I subscribe to a publisher running on a different JVM?
Do I have to use messaging/queueing middleware or can I use the reactive streams API to connect the two together?

The reactive-streams spec does not speak about distributed (crossing network) streams, and none of the current implementations of it (Akka Streams as an example) implement streams that cross network boundaries. It's a bit tricky to do (but can be done and possibly will be) as it requires transparent re-delivery in case of message loss.
Short answer: you (currently) can't. However since Akka HTTP is stream based and applies back-pressure via TCP you can connect streams via stream based TCP or HTTP and the back-pressure will work as expected.

Related

Kotlin Flow with a buffer and no replay

I am confused about Flow, but is there any way to have a Flow that functions in this fashion?
Buffers data until delivered. No subscriber buffer until full, which won't ever happen
When a subscriber comes in deliver everything in buffer and remove them from buffer as delivered.
Subscriber can unsub and resub and it won't replay just deliver items that were emitted, but not yet delivered to a subscriber.
Doesn't have to multicast should only be one subscriber. I tried MutableSharedFlow(extraBufferCapacity = 10), but from what I have just found if there is no subscriber when the event comes in it just disposes of it.
I am using x.onEach{}.collect() if that is the correct way.
You can have this behaviour by using a single-consumer hot flow - a Flow based on a Channel.
Declare a Channel(BUFFERED) as the variable/property you will use to send elements. Then you may expose it as a flow using channel.receiveAsFlow() so a consumer can start/stop collecting as it sees fit. Elements will be buffered as long as no consumer collects elements.
If you want to avoid confusion for users of your API, you may want to expose a ReceiveChannel directly. In some cases however, this makes things non-composable, so using receiveAsFlow() allows you to declare a Flow API despite the hotness, which is more composable.

AspNetCore SignalR Streaming Clarifications

I have been going through the recent signalr documentation, and i stumbled across the new feature called Streaming. I also, and i managed to get it running with a JS client. However, i am still not clear on when to use it.
1- Does ChannelReader stream data to a single client?
2- If yes, what is the difference than calling this.Clients.Caller.Invoke()
3- Lets say i am listening to an external realtime feed e.g. stock exchange, is it recommended to use signalr stream?
4- According to this post, the writer lives within a Task.Run(). So how is this scalable if i need to push a real time feed using streams to lets say 1000 clients? Are there any scalability concers of using signalr streams generally?
1- Does ChannelReader stream data to a single client?
Yes.
2- If yes, what is the difference than doing this.Clients.Caller.Invoke()
You can only invoke a single method at a time (sequentially). As long as you are in an invocation, the rest will be queued for that connection until the previous one is finished. With streaming methods, you can start a stream and pump data to the client while still invoking other methods on the same hub.
3- Lets say i am listening to an external realtime feed e.g. stock exchange, is it recommended to use signalr stream?
Streams are for streaming data triggered from a client action. You can still do unsolicited (not from the client) streaming by just calling a method on the IHubContext.
4- According to this post, the writer lives within a Task.Run(). So how is this scalable if i need to push a real time feed using streams to lets say 1000 clients? Are there any scalability concers of using signalr streams generally?
It scales fine. The Task.Run kicks off the Stream but you're never holding a thread hostage.

How to customize Akka Http Client execution context

When calling singleRequest, how can one customize the execution context that is used by the connection pool?
I took a brief look at the code, and a call to singleRequest results in a message being sent to the PoolMasterActor, which in turn sends a message to the pool interface actor.
Is each connection blocking or non-blocking?
Which context is used for the connection pool? (I want to make sure that my HTTP requests don't block all the threads)
If you check out the singleRequest signature, it requires an implicit Materializer (and therefore an ActorSystem and its dispatchers) to run the underlying HTTP infrastructure - which is based on Akka Streams. More knowledge on how materializers spawn threads under-the-hood can be found in the docs, and this blogpost.
Going back to your questions:
The whole Akka-HTTP infrastructure is inherently non-blocking (as it's based on Akka Streams - which adheres to the Reactive Streams spec and is based on Akka Actors).
The threading used by the singleRequest call inherits from the ActorSystem dispatcher used down the line. Unless you do anything specific, you will end up using your system's default dispatcher. This is reasonable choice in many cases when you are writing an Akka HTTP client.
In case you really need your materializer to use a custom dispatcher you can achieve this by customizing your ActorMaterializerSettings, e.g.
implicit val materializer = ActorMaterializer(
ActorMaterializerSettings(actorSystem).withDispatcher("my-custom-dispatcher")
)

Rabbitmq - Should multithreaded application use single or multi channels

My app has multiple threads that publish messages to a single RabbitMQ cluster.
Reading the rabbit docs: i read the following:
For applications that use multiple threads/processes for processing, it is very common to open a new channel per thread/process and not share channels between them.
And I understand that instead of opening multiple connection (expensive)
it is better to open multiple channels.
But why not use a single channel to all threads?
What are the benefits of using multiple channels over a single channel?
AMQP has the concept of Channel to provide more flexibility over reliable TCP connections. Opening a TCP connection per message would be extremely expensive, so they came up with the idea of logical Channels within a connection.
It is not a good idea to use a Channel for all the threads because if anything fails in a particular thread and the Channel dies, the rest of the threads will throw the exception AlreadyClosedException. A channel can die for multiple reasons: for example for trying to declare something that is already declared with other parameters or trying to cancel a consumer which doesn't exist, publishing to an exchange that doesn't exist, etc...
My best advice would be to have an object that holds a Channel in a local variable and also implements ShutdownListener interface, so every time the channel fails, it is able to recover and create a new one from a connection. So I would say that the main benefit is failure tolerance and scalability, since if a Channel dies it won't affect the rest.

Message broker vs. MOM (Message-Oriented Middleware)

I'm a little confused as to what the difference is between a message broker e.g. RabbitMQ and Message-orientated Middleware. I can't find much info apart from what's on Wikipedia. When searching MOM I find info on AMQP which states is a protocol for MOM.. what does this mean? What is MOM then? I also have read that RabbitMQ implements the AMPQ protocol.. so why does that make a RabbitMQ a messsage broker? Are a message broker and MOM the same thing?
Hope some can unravel my confusion. thanks
An overview -
A protocol - A set of rules.
AMQP - AMQP is an open internet protocol for reliably sending and receiving messages.
MOM (message-oriented-middleware) - is an approach, an architecture for distributed system i.e. a middle layer for the whole distributed system, where there's lot of internal communication (a component is querying data, and then needs to send it to the other component, which will be doing some processing on the data) so components have to share info/data among them.
Message broker - is any system (in MOM) which handles messages (sending as well as receiving), or to be more precise which routes messages to the specific consumer/recipient. A Message Broker is typically built upon a MOM. The MOM provides the base communication among the applications, and things like message persistence and guaranteed delivery. "Message brokers are a building block of Message oriented middleware."
Rabbitmq - a message broker; a MOM implementation; an open-source implementation of AMQP; as per Wikipedia:
RabbitMQ is an open source message broker software (sometimes
called message-oriented middleware) that implements the Advanced
Message Queuing Protocol (AMQP).
As you asked:
When searching MOM I find info on AMQP which states is a protocol for MOM.. what does this mean?
MOM is about having a messaging middleware (middle layer) between (distributed) system components, and AMQP is protocol (set of rules) for reliably sending and receiving messages. So, a MOM implementation (i.e. Rabbitmq) may use AMQP.
What is MOM then?
Message-Oriented-Middleware - is an approach, an architecture for distributed system i.e. a middle layer for the whole distributed system, where there's lot of internal communication (a component is querying data, and then needs to send it to the other component, which will be doing some processing on the data) so components have to share info/data among them.
In short it's a way to design a system, for example: depending upon the overall requirements we need to develop a distributed system, with some internal communication. The biggest advantage of MOM architecture/decision is decoupling of the components i.e. if we're going to change the data query component it'll have no effect on the data processing components, as they're communicating via MOM (e.g. Rabbitmq Cluster) - the data processing component is getting the data in form messages, which then parses and processes them.
MOM at the end is just a design decision, that we use a middleware for gluing our system (distributed) components, a middleware for handling communication between them, in the form of messages (i.e. JSON). To implement a message-oriented-middleware we need more - set of specific rules i.e. how the messages will be published, consumed, how the acknowledgement will work, the lifetime of a message is until it is consumed, the persistence of a message, etc. AMQP is basically these set of rules i.e. a standard/protocol for implementing a MOM i.e. a messaging system using AMQP, means it confines itself by the stated rules. From Wikipedia:
AMQP mandates the behavior of the messaging provider and client to the
extent that implementations from different vendors are inter-operable,
in the same way as SMTP, HTTP, FTP, etc. have created inter-operable
systems.
I also have read that RabbitMQ implements the AMPQ protocol.. so why does that make a RabbitMQ a message broker?
Yes, Rabbitmq is a message broker (publisher -> exchange -> queue -> consumer). It's an open source AMQP implementation i.e. a messaging system/broker which confines to AMQP (the AMQP rules) - one can use Rabbitmq as the middleware, hence MOM.
AMQP - is just a set of rules i.e .how messages will be published, kept (in queues), consumed, delivery acknowledgement, etc.
Are a message broker and MOM the same thing?
In simple words, Yes. If we need to go with MOM design for our distributed system, we can simply use Rabbitmq (a message broker; an AMQP implementation) as the middleware.
"MOM" broadly means any technology that can deliver "messages" from one user-space application to another. A message is usually understood to be a discrete piece of information, as compared to a stream.
MOM products used to be quite large and complex: CORBA, JMS, TIBCO, WebsphereMQ, etc. and tried to do a lot more than simply deliver messages.
A broker is a particular set of routing and queuing patterns, and we usually use the term "broker" specifically in MOM (as compared to HTTP, email, XMPP, etc.) Routing means, one message goes to one peer, to one of many peers, to all of many peers, etc. Queuing means messages are held in memory or disk until they can be delivered (and in some cases, acknowledged).
AMQP used to specific those broker patters, so an application could rely on consistent behavior from any AMQP-compatible broker (thus RabbitMQ and OpenAMQ looked much the same to a client app, like two HTTP or two XMPP servers would look the same). AMQP/1.0 specifies just the connection between nodes, so you don't have guarantees of behavior. This makes AMQP/1.0 much easier for firms to implement, but doesn't deliver interoperability.
ZeroMQ is message-oriented middleware that defines, like AMQP/1.0, the connections between pieces rather than the behaviour of a central broker. However it's relatively easy to write MOM brokers using 0MQ, and we've done a few of these (like Majordomo).
Message brokers are one (quite popular) kind of MOM. Another kind of MOM would be brokerless MOM, like ZeroMQ. With broker based MOM, all messages go to one central place: broker, and get distributed from there. Broker less MOM usually allows for peer to peer messaging (but does not exclude option of central server as well) .
AMQP is broker based MOM protocol definition (at least all versions prior to 1.0, which drifts into more general MOM), and there are several different Message brokers implementing that protocol, RabbitMQ is just one of them.