At What Point is a Persisted Actor Dehydrated/Terminated? - akka.net

Is there documentation of the lifecycle of a ReceivePersistentActor? I'm interested in the circumstances a persistent actor is killed/stopped/dehydrated to allocate resources for other actors. Our application creates a lot of persistent actors and I'm seeing that some are Terminated. Is there a timeframe that a persistent actor has to be "inactive" before it is terminated? What other conditions are considered?

Actors are only terminated automatically when:
Their parent is shut down or
If you're running Akka.Cluster.Sharding, actors that are created via the sharding system will be automatically passivated after two minutes of inactivity: https://getakka.net/articles/clustering/cluster-sharding.html#passivation
Normal persistent actors don't shut down on their own - they'll hang around so long as the ActorSystem and their parent actor are live.

Related

How can I force Akka.net Postgres persistence to reconnect

I am having some issues with persistent actors which use the Postgres plugin where it seems the actors never manage to reconnect to the database after a database outage.
The persistent actors are stopped after 1 minute of inactivity so I am getting new actors all the time, but they never seem to be able to reconnect.
Restarting the pod the actor system is running on fixes the problem.
I can kind of replicate this locally by :
Stopping the database
Starting the actor system
Send a message which should force recovery
Recovery fails because of no database connection
I then start the database without restarting the actor system and send a new message which spawns a new persistent actor which fails with the same database error.
Is there some way of forcing Akka.Persistence to reconnect?
There are two ways you can go about solving this problem:
Recreate the entity actors using a BackoffSupervisor - which will recreate the actor according to an exponential backoff schedule.
Or recreate the entity actors using the third party Akka.Persistence.Extras NuGet package which works similarly to the BackoffSupervisor but is able to save messages that weren't successfully persisted provided that you implement the messaging protocol it expects. The documentation for that feature is here: https://devops.petabridge.com/articles/state-management/akkadotnet-persistence-failure-handling.html

Best practice for cleaning up EntityStoppedManifest journal entries for permanently terminated actors?

In our actor system, using sharding and persistence, the concrete instances of one of our ReceivePersistentActor implementations are not re-used once they are terminated (passivated), as they represent client sessions identified by a GUID that is generated for each new session.
When a session ends, the ReceivePersistentActor is responsible for cleaning up it's own persistence data and will call DeleteSnapshots and DeleteMessages, which works fine. Once these calls have been processed, the actor will Context.Parent.Tell(new Passivate(PoisonPill.Instance)); to terminate.
After that, the event journal will still contain an EntityStoppedManifest entry ("CD"), as this is generated through the Passivate message.
Over time this will lead to many "CD" entries remaining in the event journal.
Is there a recommended approach for cleaning up such residue entries?
Maybe a separate Janitor actor that cleans up these entries manually?
Or is this even a design flaw on our end?
Looks like I came here too hastily, as those events have been mostly cleaned up by now automagically.
What might have been the issue for those events to accumulate in such high numbers in the first place was that these events had been generated during actor recovery instead of during normal operation. But this is just an assumption.

Is it right to ceate actor instance for each new process managed by FMS

I'm trying to design application which will manage multi state processes. Something like money transfer processes from one account to another. I have decided to use Akka.Net FMS. But then I have stucked when I found out that each new process (new Transfer) needs new actor instance because FMS state is stored in "running" actor. For me it means that if I have 1000 simultaneous requests for transfer then I should create 1000 instances. Keeping in mind that according the documentation each actor is working in its own thread how realistic is this approach?. Or did I understand anything wrongly?
Actors don't work "in their own threads", they work on one thread at a time which is different thing - you can have millions of actors working perfectly on 2 OS threads, but at any given time the same actor will always be executed only one one of them (unless you'll escape that barrier explicitly eg. by running task inside of an actor). Single actor by itself occupies less than 1kB or memory and doesn't have any inherent requirements on operating system resources (like threads).
In general having one actor working as a transfer coordinator is ok and it's quite common pattern in Akka.NET.

Akka.NET: Restrict child actor creation in akka.net cluster to a single machine

We have a particular scenario in our application - All the child actors in this application deals with huge volume of data (Around 50 - 200 MB).
Due to this, we have decided to create the child actors in the same machine (worker process) in which parent actor was created.
Currently, this is achieved by the use of Roles. We also use .NET memory cache to transfer the data (Several MBs) between child actors.
Question : Is it ok to turn off clustering in the child actors to achieve the result we are expecting?
Edit: To be more specific, I have explained the our application setup in detail, below.
The whole process happens inside a Akka.NET cluster of around 5 machines
Worker processes (which contains both parent and child actors) are deployed in each of those machines
Both parent and child actors are cluster enabled, in this setup
When we found out the network overhead caused by distributing the child actors across machines, we decided to restrict child actor creation to the corresponding machines which received the primary request, and distribute only the parent actor across machines.
While approaching an Akka.NET expert with this problem, we were advised to use "Roles" in order to restrict the child actor creation to a single machine in a cluster system. (E.g., Worker1Child, Worker2Child instead of "Child" role)
Question (Contd.) : I just want to know, if simply by disabling cluster option in child actors will achieve the same result; and is it a best practice to do so?
Please advise.
Sounds to me like you've been using a clustered pool router to remotely deploy worker actors across the cluster - you didn't explicitly mention this in your description, but that's what it sounds like.
It also sounds like, what you're really trying to do here is take advantage of local affinity: have child worker actors for the same entities all work together inside the same process.
Here's what I would recommend:
Have all worker actors created as children of parents, locally, inside the same process, but either using something like the child-per-entity pattern or a LOCAL pool router.
Distribute work between the worker nodes using a clustered group router, using roles etc.
Any of the work in that high volume workload should all flow directly from parent to children, without needing to round-trip back and forth between the rest of the cluster.
Given the information that you've provided here, this is as close to a "general" answer as I can provide - hope you find it helpful!

Paxos: How are proposers, accepters and learners selected?

I want to elect a leader from a number of identical processes. All the explanations of Paxos say that some processes are Proposers, some are Voters and some are Accepters. Do I need to assign these roles to my processes when I launch them?
What if all my proposers die? Can I switch existing learners/voters to proposers?
The way I thought about this is that all processes come up as voters and then wait with random timeout for messages. If they don't get any message until the timeout expires, they assume the role a proposer. If they receive a message, they kick of another timeout and then can become a proposer again if they don't receive any more messages until the timeout ends (or they have an agreement).
Is that a valid approach?
I am a maintainer of several production paxos systems. I'll detail what I have seen in mine and other practical/production systems.
All the explanations of Paxos say that some processes are Proposers, some are Voters and some are Accepters. Do I need to assign these roles to my processes when I launch them?
In a practical system the proposers and accepters are the same set of nodes. That is, proposer and acceptor are two roles taken on by the same process.
What if all my proposers die?
Many Paxos systems (such as those used as kv stores) work on a sequence of paxos instances (like a transaction log). It is infeasible to assume that the same set of processes will live forever, so there must be mechanisms to change the set of nodes in the quorum.
In some of the systems I maintain there are two parts to the proposed/chosen value: the payload from the customer and the paxos membership. The paxos round is run by the quorum chosen in the prior value. Doing it this way does impede the ability of your system to pipeline the values chosen; you'll have to batch them instead if you want choose multiple values at onceā€”or you can look at the way Raft chooses membership.
In Raft choosing quorum members is a two-phase process. First, the new quorum is proposed and committed; both quorums are used for a while; and then the new quorum takes over. Specifically, in the interim a majority of both the old and the new quorums is required to commit anything, including the take-over from the new quorum.