Best practice for cleaning up EntityStoppedManifest journal entries for permanently terminated actors? - akka.net

In our actor system, using sharding and persistence, the concrete instances of one of our ReceivePersistentActor implementations are not re-used once they are terminated (passivated), as they represent client sessions identified by a GUID that is generated for each new session.
When a session ends, the ReceivePersistentActor is responsible for cleaning up it's own persistence data and will call DeleteSnapshots and DeleteMessages, which works fine. Once these calls have been processed, the actor will Context.Parent.Tell(new Passivate(PoisonPill.Instance)); to terminate.
After that, the event journal will still contain an EntityStoppedManifest entry ("CD"), as this is generated through the Passivate message.
Over time this will lead to many "CD" entries remaining in the event journal.
Is there a recommended approach for cleaning up such residue entries?
Maybe a separate Janitor actor that cleans up these entries manually?
Or is this even a design flaw on our end?

Looks like I came here too hastily, as those events have been mostly cleaned up by now automagically.
What might have been the issue for those events to accumulate in such high numbers in the first place was that these events had been generated during actor recovery instead of during normal operation. But this is just an assumption.

Related

Is it right to ceate actor instance for each new process managed by FMS

I'm trying to design application which will manage multi state processes. Something like money transfer processes from one account to another. I have decided to use Akka.Net FMS. But then I have stucked when I found out that each new process (new Transfer) needs new actor instance because FMS state is stored in "running" actor. For me it means that if I have 1000 simultaneous requests for transfer then I should create 1000 instances. Keeping in mind that according the documentation each actor is working in its own thread how realistic is this approach?. Or did I understand anything wrongly?
Actors don't work "in their own threads", they work on one thread at a time which is different thing - you can have millions of actors working perfectly on 2 OS threads, but at any given time the same actor will always be executed only one one of them (unless you'll escape that barrier explicitly eg. by running task inside of an actor). Single actor by itself occupies less than 1kB or memory and doesn't have any inherent requirements on operating system resources (like threads).
In general having one actor working as a transfer coordinator is ok and it's quite common pattern in Akka.NET.

Implementing a mutual exclusion system / distributed queue in Postgres

I want to implement a mutual exclusion system in PostgreSQL where multiple worker processes will temporarily lock resources (rows) from a table (queue) while they work on them. If the worker processes crash, I want the lock to be cleanly released and not have to rely on another process to clean up the leaked locks.
What I have come up with so far is to use a SELECT ... FOR UPDATE SKIP LOCKED query within a transaction, which locks the row it finds and skips any other locked row.
It works well but one of the issues is that the worker might take a while to do its task and I need to keep the transaction open for the entire duration of its task.
Another problem is that the workers work incrementally and persist their state to the database so that if they're stopped or crash, they can resume quickly where they were. The row being locked makes it impossible to persist their state in the same table (though I think I can get away from that by using another table to persist the state).
I've searched on the Web on how to implement a semaphore or a resource borrowing system in SQL/PostgreSQL but I haven't found something that fits my needs. Is there a simple way of achieving this with PostgreSQL?

Akka.net persistence delete messages from a certain sequence number

Is there a way to delete messages after a certain sequence number in Akka.net? I know that DeleteMessages(seqNumber) deletes all messages before a certain sequence number, is there a way to delete after a seqNumber? The main goal would be to revert to a previous state (perhaps those messages were created in error).
It's obviously possible to edit the database manually (or set is_deleted to true for those events) but I'm not sure if that would be a great idea.
Thanks
DeleteMessages(seqNr) exists only for purpose of saving the space in case when you're using eventsourcing with snapshots, and your system can tolerate incomplete history of events.
Deleting events is against eventsourcing as a concept. Purpose of the event is to describe fact, that has already happened. You cannot alter the past, as there might have been some other sources that already read up that event and updated some state / performed an action according to it.
Correcting effects of events in eventsourced systems usually comes down to producing a compensating event, that is going to reverse effects of the one, you want to fix.

how to deal with race conditions among jobs with e.g. beanstalkd

I am wanting to set up a job queue with multiple workers. Right now I am looking at beanstalkd, but this is more of a conceptual problem, I believe: How can you ensure that jobs related to a single entity get handled in order?
Let's say the workers manage an email platform for some UI. For a given mailbox, jobs need to be performed serially. For example, sometimes a user will want to re-push their password into the mail platform while troubleshooting. So, they change their password, then change it back right away. That's two password-change jobs submitted to beanstalkd.
Now, most of the time this will go fine, as beanstalkd will hand those jobs out to workers in order. However, some transient error like a DNS lookup delay could cause the second password change (back to the proper one) to go through before the first, leaving the mailbox with an incorrect password.
I have thought about introducing semophores/mutexes, and having a 1:1 worker-machine:beanstalkd-server ratio, but even that would only work of the locks requests are granted in the order requested, which doesn't seem fully reliable. Having a queue per entity opens some other options, but this needs to support hundreds of thousands of entities.
Judging by how little discussion around this topic I've found, this must not be as common of a scenario as I initially thought. Does anyone have experience dealing with this problem?
A couple of potential methods come to mind.
As you point out, unless you are changing priorities, Beanstalkd is a FIFO queue. This means that, if only one worker is dealing with changing the password, it would handle the jobs in order.
If there are multiple workers, then you could store meta-data alongside the password - a last modified time (more exactly, when the password change request was made). That time would be set from the job, but if the time that is already in the database (alongside the password) is ever newer than the latest request - the new request would be dropped as out of date.
Depending on the user data storage, you may need additional locking around the database (with an SQL database, this is quite easy, but a file-based store would need additional locking to avoid potential file corruption).

Trident or Storm topology that writes on Redis

I have a problem with a topology. I try to explain the workflow...
I have a source that emits ~500k tuples every 2 minutes, these tuples must be read by a spout and processed exatly once like a single object (i think a batch in trident).
After that, a bolt/function/what else?...must appends a timestamp and save the tuples into Redis.
I tried to implement a Trident topology with a Function that save all the tuples into Redis using a Jedis object (Redis library for Java) into this Function class, but when i deploy i receive a NotSerializable Exception on this object.
My question is.How can i implement a Function that writes on Redis this batch of tuples? Reading on the web i cannot found any example that writes from a function to Redis or any example using State object in Trident (probably i have to use it...)
My simple topology:
TridentTopology topology = new TridentTopology();
topology.newStream("myStream", new mySpout()).each(new Fields("field1", "field2"), new myFunction("redis_ip", "6379"));
Thanks in advance
(replying about state in general since the specific issue related to Redis seems solved in other comments)
The concepts of DB updates in Storm becomes clearer when we keep in mind that Storm reads from distributed (or "partitioned") data sources (through Storm "spouts"), processes streams of data on many nodes in parallel, optionally perform calculations on those streams of data (called "aggregations") and saves the results to distributed data stores (called "states"). Aggregation is a very broad term that just means "computing stuff": for example computing the minimum value over a stream is seen in Storm as an aggregation of the previously known minimum value with the new values currently processed in some node of the cluster.
With the concepts of aggregations and partition in mind, we can have a look at the two main primitives in Storm that allow to save something in a state: partitionPersist and persistentAggregate, the first one runs at the level of each cluster node without coordination with the other partitions and feels a bit like talking to the DB through a DAO, while the second one involves "repartitioning" the tuples (i.e. re-distributing them across the cluster, typically along some groupby logic), doing some calculation (an "aggregate") before reading/saving something to DB and it feels a bit like talking to a HashMap rather than a DB (Storm calls the DB a "MapState" in that case, or a "Snapshot" if there's only one key in the map).
One more thing to have in mind is that the exactly once semantic of Storm is not achieved by processing each tuple exactly once: this would be too brittle since there are potentially several read/write operations per tuple defined in our topology, we want to avoid 2-phase commits for scalability reasons and at large scale, network partitions become more likely. Rather, Storm will typically continue replaying the tuples until he's sure they have been completely successfully processed at least once. The important relationship of this to state updates is that Storm gives us primitive (OpaqueMap) that allows idempotent state update so that those replays do not corrupt previously stored data. For example, if we are summing up the numbers [1,2,3,4,5], the resulting thing saved in DB will always be 15 even if they are replayed and processed in the "sum" operation several times due to some transient failure. OpaqueMap has a slight impact on the format used to save data in DB. Note that those replay and opaque logic are only present if we tell Storm to act like that, but we usually do.
If you're interested in reading more, I posted 2 blog articles here on the subject.
http://svendvanderveken.wordpress.com/2013/07/30/scalable-real-time-state-update-with-storm/
http://svendvanderveken.wordpress.com/2014/02/05/error-handling-in-storm-trident-topologies/
One last thing: as hinted by the replay stuff above, Storm is a very asynchronous mechanism by nature: we typically have some data producer that post event in a queueing system (e,g. Kafka or 0MQ) and Storm reads from there. As a result, assigning a timestamp from within storm as suggested in the question may or may not have the desired effect: this timestamp will reflect the "latest successful processing time", not the data ingestion time, and of course it will not be identical in case of replayed tuples.
Have you tried trident-state for redis. There is a code on github that does it already:
https://github.com/kstyrc/trident-redis.
Let me know if this answers your question or not.