Akka.net: Child per entity pattern in a cluster - akka.net

We are developing a project to process finantial transactions. We would like to have a "distributed memory" accross three machines where these transactions live. All machines would have their own copy of a transaction as a transaction update request could arrive to any of those machines.
We were thinking about using Akka.net cluster to try to resolve this problem. Is there a way to use the child per entity pattern (all transactions are accessed by their own transaction id and we want that the transaction actor doesn't have to load anything from disk) in an Akka cluster?

I think cluster sharding can help as you can create shards per entity. https://getakka.net/articles/clustering/cluster-sharding.html

Related

Akka.NET: Restrict child actor creation in akka.net cluster to a single machine

We have a particular scenario in our application - All the child actors in this application deals with huge volume of data (Around 50 - 200 MB).
Due to this, we have decided to create the child actors in the same machine (worker process) in which parent actor was created.
Currently, this is achieved by the use of Roles. We also use .NET memory cache to transfer the data (Several MBs) between child actors.
Question : Is it ok to turn off clustering in the child actors to achieve the result we are expecting?
Edit: To be more specific, I have explained the our application setup in detail, below.
The whole process happens inside a Akka.NET cluster of around 5 machines
Worker processes (which contains both parent and child actors) are deployed in each of those machines
Both parent and child actors are cluster enabled, in this setup
When we found out the network overhead caused by distributing the child actors across machines, we decided to restrict child actor creation to the corresponding machines which received the primary request, and distribute only the parent actor across machines.
While approaching an Akka.NET expert with this problem, we were advised to use "Roles" in order to restrict the child actor creation to a single machine in a cluster system. (E.g., Worker1Child, Worker2Child instead of "Child" role)
Question (Contd.) : I just want to know, if simply by disabling cluster option in child actors will achieve the same result; and is it a best practice to do so?
Please advise.
Sounds to me like you've been using a clustered pool router to remotely deploy worker actors across the cluster - you didn't explicitly mention this in your description, but that's what it sounds like.
It also sounds like, what you're really trying to do here is take advantage of local affinity: have child worker actors for the same entities all work together inside the same process.
Here's what I would recommend:
Have all worker actors created as children of parents, locally, inside the same process, but either using something like the child-per-entity pattern or a LOCAL pool router.
Distribute work between the worker nodes using a clustered group router, using roles etc.
Any of the work in that high volume workload should all flow directly from parent to children, without needing to round-trip back and forth between the rest of the cluster.
Given the information that you've provided here, this is as close to a "general" answer as I can provide - hope you find it helpful!

Redis: Using lua and concurrent transactions

Two issues
Do lua scripts really solve all cases for redis transactions?
What are best practices for asynchronous transactions from one client?
Let me explain, first issue
Redis transactions are limited, with an inability to unwatch specific keys, and all keys being unwatched upon exec; we are limited to a single ongoing transaction on a given client.
I've seen threads where many redis users claim that lua scripts are all they need. Even the redis official docs state they may remove transactions in favour of lua scripts. However, there are cases where this is insufficient, such as the most standard case: using redis as a cache.
Let's say we want to cache some data from a persistent data store, in redis. Here's a quick process:
Check cache -> miss
Load data from database
Store in redis
However, what if, between step 2 (loading data), and step 3 (storing in redis) the data is updated by another client?
The data stored in redis would be stale. So... we use a redis transaction right? We watch the key before loading from db, and if the key is updated somewhere else before storage, storage would fail. Great! However, within an atomic lua script, we cannot load data from an external database, so lua cannot be used here. Hopefully I'm simply missing something, or there is something wrong with our process.
Moving on to the 2nd issue (asynchronous transactions)
Let's say we have a socket.io cluster which processes various messages, and requests for a game, for high speed communication between server and client. This cluster is written in node.js with appropriate use of promises and asynchronous concepts.
Say two requests hit a server in our cluster, which require data to be loaded and cached in redis. Using our transaction from above, multiple keys could be watched, and multiple multi->exec transactions would run in overlapping order on one redis connection. Once the first exec is run, all watched keys will be unwatched, even if the other transaction is still running. This may allow the second transaction to succeed when it should have failed.
These overlaps could happen in totally separate requests happening on the same server, or even sometimes in the same request if multiple data types need to load at the same time.
What is best practice here? Do we need to create a separate redis connection for every individual transaction? Seems like we would lose a lot of speed, and we would see many connections created just from one server if this is case.
As an alternative we could use redlock / mutex locking instead of redis transactions, but this is slow by comparison.
Any help appreciated!
I have received the following, after my query was escalated to redis engineers:
Hi Jeremy,
Your method using multiple backend connections would be the expected way to handle the problem. We do not see anything wrong with multiple backend connections, each using an optimistic Redis transaction (WATCH/MULTI/EXEC) - there is no chance that the “second transaction will succeed where it should have failed”.
Using LUA is not a good fit for this problem.
Best Regards,
The Redis Labs Team

Can Infinispan be forced to fully replicate to a new cluster member

Looking through the Infinispan getting started guide it states [When in replication mode]
Infinispan only replicates data to nodes which are already in the
cluster. If a node is added to the cluster after an entry is added, it
won’t be replicated there.
Which I read as any cluster member will always be ignorant of any data that existed in the cluster before it became a cluster member.
Is there a way to force Infinispan to replicate all existing data to a new cluster member?
I see two options currently but I'm hoping I can just get Infinispan to do the work.
Use a distributed cache and live with the increase in access times inherent in the model, but this at least leaves Infinispan to handle its own state.
Create a Listener to listen for a new cache member joining and iterate through the existing data, pushing it into the new member. Unfortunately this would in effect cause every entry to replicate out to the existing cluster members again. I don't think this option will fly.
This information sounds as misleading/outdated. When the node joins a cluster, a rebalance process is initiated and when you query for these data during the rebalance prior to delivering these data to the node, the entry is fetched by remote RPC.

many Distributors to many workers

We're a licensed product using NServiceBus as the messaging framework in our federated system.
Looking for opportunities for using it in a new feature-
Is there a way to build a multi-site system (scaled-out), at which each site/node produces and distributes messages to workers located on several nodes/sites?
Each distributor and worker can be hosted at it's own site (same LAN), and each site can go down at any point. All distributors and workers should be symmetric.
At first it looked like a classical "many producers to many competing consumers" problem. but I can't find a built-in way to achieve it with NServiceBus as, from what I saw, each worker can send health sings to only one distributor (I might be wrong about that).
Another issue I came against is with having a centralized RavenDB instance holding the distributor subscriptions. Having the RavenDB in it's own "availability group" requires additional resources. Is there a way to host the RavenDB instance under the same sites, having their data replicated while each site is using it's local DB instance? This will also bind the HA of the distributors to their subscription DB's.
Reading this discussion- http://tech.groups.yahoo.com/group/nservicebus/message/18412
It seems that NServiceBus requires a cluster to keep HA of the published data. But why can’t the distributor wait for an acknowledge that the msg was successfully consumed & processed by the worker, and keep retrying to publish it to the same or a different worker? This way, even if the VM went down with data in the queues, the data will be sent to another node which is available at that time.
Edit: Tried to ask the same at NServiceBus official Yahoo group but keep getting pythonError within the yahoo group.
Thanks in advance,
Rami Prilutsky, dbMotion

NService Bus: Nitty-Gritty Deployment Issues

Please consider the following questions in the context of multiple publications from a scaled out publisher (using DB subscription storage) and multiple subscriptions with scaled out subscribers (using distributors) where installs and uninstalls happen regularly for initial deployments, upgrades, etc. using automated MSI's.
Using DB subscription storage, what happens if the DB goes down? If access to the Subscription DB is required in order to Publish a message, how will it be delivered? Will it get lost? Will the call to Bus.Publish throw an exception?
Assuming you need to have no down-time deployments: What if you want to move your subscription DB for a particular publication to a different server? How do you manage a transition like this?
Same question goes for a distributor on the subscriber side: What if you want to move your distributor endpoint? One scenario I can think of is if you have multiple subscriptions utilizing a single distributor machine, it might be hard if you want to move some of them to another distributor server to reduce load.
What would the install/uninstall scenarios look like for a setup like this (both initially, and for continuous upgrades)? It seems like you would want to have some special install/uninstall scripts for deployment of the "logical publication" and subscription DB, as well as for the "logical subscriptions" and the distributors. The publisher instances wouldn't need any special install/uninstall logic (since they just start publishing messages using the configured subscription DB, and then stop when they are uninstalled). The subscriber worker nodes wouldn't need anything special on install other than the correct configuration of the distributor endpoint, but would need uninstall logic to make sure they are removed from the distributors list of worker nodes.
Eventually the publisher will fail and the messages will build up in the internal queue. You will have to plan the size of disk you need to handle this based on the message size and how long you want to wait for a DB to come up. From there it is based how much downtime you can handle. You can use DB mirroring or clustering to make the DB have less downtime.
Mirroring and clustering technologies can also help with this. Depends on if you want to do manual or automatic failover and where your doing it(remote sites?).
Clustering MSMQ could help you here. If you want to drop a distributor and move it within a cluster you'd be ok. Another possibility is to expose your distributors via HTTP and load balance them behind either a software or hardware load balancing solution. Behind the load balancer you'd be more free to move things around.
Sounds like you have a good grasp on this one already :)
To your first question, about the high availability of the subscription DB, you can use a cluster for failover. If the DB is down, then the Bus.Publish will throw an exception, yes. It is recommended to keep the subscription DB separate from your applicative DB to avoid having to bring it down when upgrading your app. This doesn't have to be a separate DB server, a separate DB on the same DB server will be fine.
About moving servers, this is usually managed at a DNS level where for a certain period of time you'll have both running, until communication moves over.
On your third question about distributors - don't share a distributor between different publishers or subscribers.
As a rule of thumb, it is recommended to not add/remove subscribers when doing these kinds of maintainenance activities. This usually simplifies things quite a bit.