If I have a multiprocessing.Manager instance and say I allocate a Queue using the manager, how can I deallocate this queue? I am going to create millions of queues because I will be generating millions of processes (during a few days), so I don't think leaving the Queue without deallocating it is an option.
Maybe I shouldn't create a new queue for each new process? The queues are used to send back the "return result" of each process and I need to know which process returned what. Well, I mean I need to know which input resulted in which output. Should I use a different data structure than a Queue?
Related
when I read https://github.com/rabbitmq/internals/blob/master/variable_queue.md, the variable_queue keeps messages on four queue data structures,but I am always confused why this design?Any one can give me a more intuitive explanation?
Thanks.
"q4. The need for these four queues becomes apparent once disk paging is taken into account." Per the authors from the link you provided.
Have you ever ran into a time where your queue ran into the 44 million messages range waiting to be processed? The reason for this design is those 44 million message have to go somewhere either the disk or memory, and going into memory would be really expansive.
Seems like the design for a variable queue is meant to keep messages in a queue while creating a buffer from the disk so you are never waiting for a message in any one of the other queues.
Essentially you have a queue of a queue of a queue that feeds queues messages being read from the disk to save on memory. Reading and writing to the disk is slow compared to writing/reading from memory, thus having this design seems to add some concurrency so you can keep getting your messages.
To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.
While I was working on the Message-Queue, I encounter the word static queue and dynamic queue.
Can any one tell me the difference?
A static queue is one that is defined ahead of time and the queue definition persists in the environment.
A dynamic queue is created on demand. Of these there are two varieties in IBM MQ. A temporary dynamic queue is created on demand and is deleted when the program that created it disconnects. A permanent dynamic queue is one that is created on demand but persists in the environment after the program which created it disconnects.
For example, a temporary dynamic queue is useful for catching replies in a request/reply scenario. The queue exists only so long as the application making requests is connected. When the program disconnects, the queue goes away so there is no need for the administrator to manually clean it up.
A permanent dynamic queue is useful for things like durable subscriptions. When a subscription is created, the queue needs to be unique and the overhead of having to define it ahead of time is excessive. So we let the application create it dynamically but also let the queue hang around when the program is offline in order to collect publications. Normally, the application deletes the queue when it is no longer needed so that the administrator doesn't need to.
For my ongoing project, I am using Redis for message distribution across several processes. Now, I am supposed to make them reliable.
I consider using the Reliable queue pattern through BRPOPLPUSH command. This pattern suggests that the processing thread remove the extra copy of message from "processing list" via lrem command, after the job has been successfully completed.
As I am using multiple threads to pop, the extra copies of popped item go into a processing list from several threads. That is to say, the processing queue contains elements popped by several threads. As a consequence, if a thread completes its job, it cannot know which item to remove from the "processing queue".
To overcome this problem, I am thinking that I should maintain multiple processing queues (one for each thread) based on threadId. So, my BRPOPLPUSH will be:
BRPOPLPUSH <primary-queue> <thread-specific-processing-queue>
Then for cleaning up timedout objects, my monitoring thread will have to monitor all these thread specific processing queues.
Are there any better approaches to this problem, than the one conceived above?
#user779159
To support reliable queue mechanism, we take the following approach:
- two data structures
-- Redis List (the original queue from which items are popped regularly)
-- a Redis z-set, which temporarily stores the popped item.
Algorithm:
-- When an element is popped, we store in z-set
-- If the task that picked the item completed its job, it will delete the entry from z-set.
-- If the task couldn't complete it, the item will be hanging around in z-set. So we know, whether a task was done within expected time or not.
-- Now, another background process periodically scans this z-set, picks up items which are timedout, and then puts them back to queue
How it is done:
we use zset to store the item that we poped (typically using a lua
script).
We store a timeout value as the rank/score of this item.
Another scanner process, will periodically (say every minute) run
z-set command zrangebyscore, to select items between (now and last 1
minute).
If there are items found by the above command, this means
the process that popped the item (via brpop) has not completed its
task in time.
So, this 2nd process will put the item back to the
queue (redis list) where it originally belonged.
I’m looking to introduce SS Service Broker,
I have a remote orders database and a local processing database, all activity on the processing database has to happen in sequence, this seems a perfect job for Service Broker!
I’ve set up the infrastructure, I can send and receive messages and now I’m looking at the design of the processing. As I said all processes for one order need to be completed in sequence so I’ll put them in one conversation.
One of these processes is a request for external flat file data, we then wait (could be several days) and then import and process this file when it returns. How can I process half the tasks, then wait for the flat file to return before processing the other half.
I’ve had some ideas but I’m sure I’m missing a trick somewhere
1) Write all queue items to a status table and use status values – seems to remove some of the flexibility of SSSB and add another layer of tasks
2) Keep the transaction open until we get the data back – not ideal
3) Have the flat file import task continually polling for the file to appear – this seems inefficient
What is the most efficient way of managing this workflow?
thanks in advance
In my opinion it is like chain of responsibility. As far as i can understand we have the following workflow.
1.) Process for message.
2.) Wait for external file, now this can be a busy wait or if external data provides you a notification then we can actually do it in non-polling manner.
3.) Once data is received then process the data.
So my suggestion would be to use 3 different Queues one for each part, when one is done it will forward or put a new message in chained queue.
I am assuming, one order processing will not disrupt another order processing.
I am thinking MSMQ with Windows Sequential Work flow, might also be a candidate for this task.