Is it right to ceate actor instance for each new process managed by FMS - akka.net

I'm trying to design application which will manage multi state processes. Something like money transfer processes from one account to another. I have decided to use Akka.Net FMS. But then I have stucked when I found out that each new process (new Transfer) needs new actor instance because FMS state is stored in "running" actor. For me it means that if I have 1000 simultaneous requests for transfer then I should create 1000 instances. Keeping in mind that according the documentation each actor is working in its own thread how realistic is this approach?. Or did I understand anything wrongly?

Actors don't work "in their own threads", they work on one thread at a time which is different thing - you can have millions of actors working perfectly on 2 OS threads, but at any given time the same actor will always be executed only one one of them (unless you'll escape that barrier explicitly eg. by running task inside of an actor). Single actor by itself occupies less than 1kB or memory and doesn't have any inherent requirements on operating system resources (like threads).
In general having one actor working as a transfer coordinator is ok and it's quite common pattern in Akka.NET.

Related

Best practice for cleaning up EntityStoppedManifest journal entries for permanently terminated actors?

In our actor system, using sharding and persistence, the concrete instances of one of our ReceivePersistentActor implementations are not re-used once they are terminated (passivated), as they represent client sessions identified by a GUID that is generated for each new session.
When a session ends, the ReceivePersistentActor is responsible for cleaning up it's own persistence data and will call DeleteSnapshots and DeleteMessages, which works fine. Once these calls have been processed, the actor will Context.Parent.Tell(new Passivate(PoisonPill.Instance)); to terminate.
After that, the event journal will still contain an EntityStoppedManifest entry ("CD"), as this is generated through the Passivate message.
Over time this will lead to many "CD" entries remaining in the event journal.
Is there a recommended approach for cleaning up such residue entries?
Maybe a separate Janitor actor that cleans up these entries manually?
Or is this even a design flaw on our end?
Looks like I came here too hastily, as those events have been mostly cleaned up by now automagically.
What might have been the issue for those events to accumulate in such high numbers in the first place was that these events had been generated during actor recovery instead of during normal operation. But this is just an assumption.

configure parallel async event queue on replicated region in Gemfire

I'm trying to configure Gemfire/Geode in order to have an async event queue with parallel=true on a replicated region. However, I'm getting the following exception at startup:
com.gemstone.gemfire.internal.cache.wan.AsyncEventQueueConfigurationException: Parallel Async Event Queue myQueue can not be used with replicated region /myRegion
This (i.e. to prevent parallel queues on replicated regions) seems to be a design decision, but I can't understand why it is the case.
I have read all the documentation I've been able to find (primarily http://gemfire.docs.pivotal.io/docs-gemfire/latest/reference/book_intro.html and related docs),
and searched any kind of reference to this exception on the internet, but I didn't find any clear explanation on why I can't have an event listener on each member hosting a replicated region.
My conclusion is that I must be missing some fundamental concept about replicated regions and/or parallel queues, but since I can't find the appropriate documentation
on my own, I'm asking for an explanation and/or pointers to the right resources to read.
Thanks in advance.
EDIT : Let me put the question into context.
I have an external system sending data to my application using REST services, which are load balanced between nodes in order to maximize performance. Each of the nodes hosts the same regions (let's say 3, named A B and C). The data travels through all those regions (A to B to C) and is processed along the way. This means that region A hosts data that has just been received, region B data that has been partially processed and region C hosts data whose processing is complete.
I am using event listeners to process data and move it from region to region, and in case of the listener for region C, to export it to another external system.
All the listeners must (and I repeat, must) be transactional.
I also need horizontal scalability (i.e. adding nodes on the fly to increase throughput) and the maximum amount of data replication that can be possibily achieved.
Moreover, I want to run all of the nodes with the same gemfire configuration.
I have already tried to use partitioned regions, but they are not fit to my needs for a bunch of reasons that I won't explain here for the sake of brevity (just trust me, it is not currently possible).
So I thought that having all the nodes host the replicated regions could be the way, but I need all of them to be able to process events independently and perform region synchronization afterwards in an active/active scenario. It is my understanding that this requires event queues to be parallel, but it does not seem possible (by design).
So the (updated) question(s) are :
Is this scenario even possible? And if it is, how can I achieve it?
Any explanation and/or documentation, example, resource or anything else is more than welcome.
Again, thanks in advance.
An AsyncEventQueue is used to write data that arrives in GemFire to some other data store. You would ideally want to do this only once. Since the content of the replicated region is same on all the members of the system, you only need a Async event listener on one member, hence parallel=true is not supported.
For Partitioned regions, if you only had one member that hosts the AsyncQueue, then every single put to a partitioned region will also be routed through that member. This introduces a single point of contention in the system. The solution to this problem was introduction of parallel AsyncQueues, so that events on each member are only queued up locally in that member.
GemFire also supports CacheListeners, which are invoked on each member even for replicated regions, however, they are synchronous. You can introduce a thread pool in your CacheListener to get the same functionality.

How do I wait for all work to complete in Akka.Net?

I have successfully sent work to a pool of actors to perform my work, but now I want to do some aggregation on the results returned by all the workers. How do I know that everyone is done?
The best I have come up with is to maintain a set of requests ids and wait for that set to go to zero, but this seems inelegant.
Generally, you want to use what we call the "Commander" pattern for this. Essentially, you have one stateful actor (the Commander) that is responsible for starting and monitoring the task. You then farm out the actual work across the actor pool, and have them report back to the Commander as they finish. The commander can then track the progress of the job by calculating # completions / size of worker pool.
This way, the workers can be monitored and restarted independently as they do the work, but all the precious task-level state and information lives in the Commander (this is called the "Error Kernel pattern")
You can see an example of this in the Akka.NET scalable webcrawler demo.

How to create a distributed 'debounce' task to drain a Redis List?

I have the following usecase: multiple clients push to a shared Redis List. A separate worker process should drain this list (process and delete). Wait/multi-exec is in place to make sure, this goes smoothly.
For performance reasons I don't want to call the 'drain'-process right away, but after x milliseconds, starting from the moment the first client pushes to the (then empty) list.
This is akin to a distributed underscore/lodash debounce function, for which the timer starts to run the moment the first item comes in (i.e.: 'leading' instead of 'trailing')
I'm looking for the best way to do this reliably in a fault tolerant way.
Currently I'm leaning to the following method:
Use Redis Set with the NX and px method. This allows:
to only set a value (a mutex) to a dedicated keyspace, if it doesn't yet exist. This is what the nx argument is used for
expires the key after x milliseconds. This is what the px argument is used for
This command returns 1 if the value could be set, meaning no value did previously exist. It returns 0 otherwise. A 1 means the current client is the first client to run the process since the Redis List was drained. Therefore,
this client puts a job on a distributed queue which is scheduled to run in x milliseconds.
After x milliseconds, the worker to receive the job starts the process of draining the list.
This works on paper, but feels a bit complicated. Any other ways to make this work in a distributed fault-tolerant way?
Btw: Redis and a distributed queue are already in place so I don't consider it an extra burden to use it for this issue.
Sorry for that, but normal response would require a bunch of text/theory. Because your good question you've already written a good answer :)
First of all we should define the terms. The 'debounce' in terms of underscore/lodash should be learned under the David Corbacho’s article explanation:
Debounce: Think of it as "grouping multiple events in one". Imagine that you go home, enter in the elevator, doors are closing... and suddenly your neighbor appears in the hall and tries to jump on the elevator. Be polite! and open the doors for him: you are debouncing the elevator departure. Consider that the same situation can happen again with a third person, and so on... probably delaying the departure several minutes.
Throttle: Think of it as a valve, it regulates the flow of the executions. We can determine the maximum number of times a function can be called in certain time. So in the elevator analogy you are polite enough to let people in for 10 secs, but once that delay passes, you must go!
Your are asking about debounce sinse first element would be pushed to list:
So that, by analogy with the elevator. Elevator should go up after 10 minutes after the lift came first person. It does not matter how many people crammed into the elevator more.
In case of distributed fault-tolerant system this should be viewed as a set of requirements:
Processing of the new list must begin within X time, after inserting the first element (ie the creation of the list).
The worker crash should not break anything.
Dead lock free.
The first requirement must be fulfilled regardless of the number of workers - be it 1 or N.
I.e. you should know (in distributed way) - group of workers have to wait, or you can start the list processing. As soon as we utter the phrase "distributed" and "fault-tolerant". These concepts always lead with they friends:
Atomicity (eg by blocking)
Reservation
In practice
In practice, i am afraid that your system needs to be a little bit more complicated (maybe you just do not have written, and you already have it).
Your method:
Pessimistic locking with mutex via SET NX PX. NX is a guarantee that only one process at a time doing the work (atomicity). The PX ensures that if something happens with this process the lock is released by the Redis (one part of fault-tolerant about dead locking).
All workers try to catch one mutex (per list key), so just one be happy and would process list after X time. This process can update TTL of mutex (if need more time as originally wanted). If process would crash - the mutex would be unlocked after TTL and be grabbed with other worker.
My suggestion
The fault-tolerant reliable queue processing in Redis built around RPOPLPUSH:
RPOPLPUSH item from processing to special list (per worker per list).
Process item
Remove item from special list
Requirements
So, if worker would crashed we always can return broken message from special list to main list. And Redis guarantees atomicity of RPOPLPUSH/RPOP. That is, there is only a problem group of workers to wait a while.
And then two options. First - if have much of clients and lesser workers use locking on side of worker. So try to lock mutex in worker and if success - start processing.
And vice versa. Use SET NX PX each time you execute LPUSH/RPUSH (to have "wait N time before pop from me" solution if you have many workers and some push clients). So push is:
SET myListLock 1 PX 10000 NX
LPUSH myList value
And each worker just check if myListLock exists we should wait not at least key TTL before set processing mutex and start to drain.

MSSQL multi-user access

I am experiencing problems with a MSSQL instance, where deadlocks occur from time to time. I have a Table A, which holds temperature measurements. My application contains 1-10 worker threads, which collect measurements via TCP from remote locations and then want to store them inside the database. Of course these workers use transactions to conduct their tasks. The IsolationLevel of the transactions is set to ReadCommitted. Still deadlocks occur and the CPU load of the database server is up at 100%. Can anyone tell me, what I have to consider to get this working? I thought the database system will do the multi-user-synchronization for me. At least this is, what I learned at university.
My suggestion is to create another thread that will handle your updates into the database. So add the information into a collection from the threads in a thread safe manner, and let 1 worker thread do the updates/inserts into the table. You can even concatenate 10-30 of these statements and execute them together.
This is what we have done on a SMS Sender where we used up to 50 threads each sending SMS a SMS every 100ms. It worked brilliantly for us.