Queue inside microservice - rabbitmq

Let's imagine we want to clean sheep in the herd in such a way, that each sheep can be cleaned once at a time - other cleaning tasks for this sheep should be queued.
We also have a bunch of shepherds, which can clean multiple sheep at a time.
Number of sheep is very big and variable (they die and are born....).
Requests to clean a sheep are random and are made quite often.
In this approach shepherd is a microservice instance.
What architecture should be used to:
make sure that sheep is cleaned only once at the same time
one shepherd can clean multiple sheep at a time
the same sheep is not cleaned by multiple shepherds at the same time
sheep is automatically cleaned again and again, if there were multiple clean task send for the same sheep
Solutions so far:
In memory queue of each shepherd - doesn't work - multiple shepherds can clean the same sheep
External queue like RabbitMQ - would force me to create separate queue for each sheep - with variable number of sheep it would be impossible to mantain
Any other ideas?

Related

How to explore the redis queue using redis-cli

I am new to redis and I want to know if there is any way to explore the redis queues usign redis-cli.
I recently picked up redis and I was surprised to find that there are many old entries that have cluttered the queue. The queue has a size of around 95000 (DBSIZE) and using the keys * in the terminal, I could view from 85000 to 95000 which are entries from almost 3 years ago(I could identify this because some of the keys are like '22-06-2019_status_440792_68587277').
I want to know if I can view all the keys at the same time(terminal only displayed the last 10000 keys) and if there is a way to delete all the old keys at the same time.

Best practice for cleaning up EntityStoppedManifest journal entries for permanently terminated actors?

In our actor system, using sharding and persistence, the concrete instances of one of our ReceivePersistentActor implementations are not re-used once they are terminated (passivated), as they represent client sessions identified by a GUID that is generated for each new session.
When a session ends, the ReceivePersistentActor is responsible for cleaning up it's own persistence data and will call DeleteSnapshots and DeleteMessages, which works fine. Once these calls have been processed, the actor will Context.Parent.Tell(new Passivate(PoisonPill.Instance)); to terminate.
After that, the event journal will still contain an EntityStoppedManifest entry ("CD"), as this is generated through the Passivate message.
Over time this will lead to many "CD" entries remaining in the event journal.
Is there a recommended approach for cleaning up such residue entries?
Maybe a separate Janitor actor that cleans up these entries manually?
Or is this even a design flaw on our end?
Looks like I came here too hastily, as those events have been mostly cleaned up by now automagically.
What might have been the issue for those events to accumulate in such high numbers in the first place was that these events had been generated during actor recovery instead of during normal operation. But this is just an assumption.

Is it right to ceate actor instance for each new process managed by FMS

I'm trying to design application which will manage multi state processes. Something like money transfer processes from one account to another. I have decided to use Akka.Net FMS. But then I have stucked when I found out that each new process (new Transfer) needs new actor instance because FMS state is stored in "running" actor. For me it means that if I have 1000 simultaneous requests for transfer then I should create 1000 instances. Keeping in mind that according the documentation each actor is working in its own thread how realistic is this approach?. Or did I understand anything wrongly?
Actors don't work "in their own threads", they work on one thread at a time which is different thing - you can have millions of actors working perfectly on 2 OS threads, but at any given time the same actor will always be executed only one one of them (unless you'll escape that barrier explicitly eg. by running task inside of an actor). Single actor by itself occupies less than 1kB or memory and doesn't have any inherent requirements on operating system resources (like threads).
In general having one actor working as a transfer coordinator is ok and it's quite common pattern in Akka.NET.

Why does celery need a message broker?

As celery is a job queue/task queue, name illustrates that it can maintain its tasks and process them. Then why does it need a message broker like rabbitmq or redis?
Celery is a Distributed Task Queue that means that the system can reside across multiple computers (containers) across multiple locations with a single centralise bus
the basic architecture is as follows:
workers - processes that can take jobs (data) from the bus (task queue) and process it
*it can put the result back into the bus for farther processing by a different worker (create a processing flow)
bus - task queue, this is basically a db that store the jobs as messages, so the workers can retrieve them,
it's important to implement a concurrent and non blocking db, so when one process takes or puts job from/on the bus, it doesn't block other workers from getting/putting theirs jobs.
RabbitMQ, Redis, ActiveMQ Kafka and such are best candidates for this sort of behaviour
the bus has an api which let to submit jobs for workers and retrieve them (among more complex features)
most buses implement an ack/fail feature so workers can ack their job being done or if not ack (or report failure) this message can be served again to another worker, and might get processed successfully this time, thus no data is lost...(this depends highly on the fail over logic and the context of data as an input to a task)
Celery include a scheduler (beat) that periodically put specific jobs on the bus and thus create a periodically tasks
lets work with a scrapping example, you want to scrap the world, but china can only allow traffic from it's region and so is Europe and the USA
so you can build a workers and place them all over the world
you can use only one bus, lets say it's located in the usa, all other workers know this bus and can connect to it, so by placing a specific job (scrap china) on the bus located in the US, a process in china can work on it, hence distributed
of course, workers will increase the throughput of the system, only due to parallelism, unrelated to their geo location and this is the common case of using an event-driven architecture (i.e central bus, consumers and producers)
I suggest read the formal docs, it's pretty straight forward

Redis hmset, old data overwrite new?

I run two redis commands:
A: hmset k1,v1,k2,v2,k3,v3....(hundreds keys) at 11:03:05,450
B: hmset k1,v1.1 at 11:03:05,727
But the final data I get for k1 is v1.
I consider there are several possible reasons:
clocks on different machines are not accurate, so command B happens before A in fact. But I have other logic to prevent B run before A, and I'm 99 percent sure about that, so I don't want to trace this unless there are no other possible reasons.
I'm not sure if A is an atomic command, but I think so, as redis is single thread. So is it possible A started before A but finished after B?
May be related with the slave sync, but I can't figure out how?
I want to know if there are other possible reasons? And any suggestions how to check to make sure what happens?
I'm using redis cluster with several masters and slaves, and jedis 2.9.0.