I have an algorithm (task in VxWorks) that is reading data from multiple queues to be able to manage priorities accordingly. Now , the msgQReceive( ) function, can be set to WAIT_FOREVER which would make it a blocking call until something is available to receive and process. Now how can I do this if I have multiple queues? Currently I check in a while(1) loop if any of the queues have any contents and receive them if so but if nothing is there, my algorithm just spins and spins and spins and eats CPU resources for nothing. How can I prevent this best?
You should be able to use VxWorks events coupled with a Message Queue.
See msgQEvStart function and Kernel Programmer's Guide, section 7.9.
This is akin to using a select() for I/O operation.
You do a blocking eventReceive which returns a bitmask indicating which queue has content and you then do a non-blocking msgQReceive to retrieve the data.
Or you can look at How can a task wait on multiple vxworks Queues? which I wrote a while ago,
As already mentioned, you could use events, alternatively if you can use a pipe instead of msgQ, you could potentially use select.
As another alternative, perhaps consider having multiple tasks, each servicing a single msgQ
Related
We want to develop an application based on micro services architecture.
To communicate between various services asynchronously, we plan to use message queues(like RabbitMQ,ActiveMQ,JMS etc.,) . Is there any approach other than message queue is available to achieve inter process communication?
Thanks.
You should use Queues to handle the tasks that needs not to be completed in real time.
Append the tasks in queue and when there is a room, processor will take tasks from queue and will handle & will remove from queue.
Example :
Assuming your application deals with images, users are uploading so many images. Upload the tasks in a queue to compress the images. And when processor is free it will compress the queued images.
When you want to write some kind of logs of your system, give it to the queue and one process will take logs from queue and write that to disk. So the main process will not waste its time for the I/O operations.
Suggestion :
If you want the real time responses, you should not use the queue. You need to ping the queue constantly to read the incomings, and that is bad practice. And there is no guarantee that queue will handle your tasks immediately.
So the solutions are :
Redis cache - You can put your messages into cache and other process will read that message. Redis is "In memory data-structure". It is very fast and easy to use. Too much libraries and good resources available on the Internet, as it is open source. Read more about Redis. But here you also need to keep check whether there is some kind of message available and if available read from it, process and give response. But to read from Redis, is not very much costlier. With redis, you do not need to worry about memory management, it is well managed by open source community.
Using Sockets. Socket is very much faster, you can make this lightweight(if you want) as it is event based. One process will ping on port and other process will listen and give response. But you need to manage memory. If the buffered memory gets full, you can not put more messages here. If there are so many users producing messages, you need to manage to whom to you want to respond.
So it depends upon your requirement, like do you want to read messages constantly?, do you want to make one to one communication or many to one communication?
We want to use delay feature from activeMQ to delay particural event. How does AMQ_SCHEDULED_DELAY work internaly? In documentation is information about scheduler but no information what mechanism it utilize to delay message. For that reason we are not sure how delaying is going to affect activeMQ. Does activeMQ utilize pooling or async to achive delay.
I ask this question because people from my organization want to pick diffrent technology. I do not have any proof delay from activeMQ is any better.
Here is link to source code. I was thinking of looking up code but I'm not good in java. Can anyone help?
Default implementation of ActiveMQ does utilize the polling.
Active MQ internally keep polling for the scheduled (or delayed) messages by a background scheduler thread. This thread read the list of scheduled events (or messages) and fires the jobs, reschedule repeating jobs as needed before firing the job event.
The list of scheduled events is stored in a sorted order in internal storage of activemq. So during poll, it just read event which are scheduled for earliest processing. Since the messages are persisted during enquing, scheduling many not have visible performance impact during processing.
However before adopting, you can setup your benchmark, without worries much internal implementation detail, to see that your performance/SLA requirement are getting met.
For more details, you may refer to Javadoc of job scheduler API. For default implementation can you refers to the code.
Hope this helps.
In looking at the source code mentioned by #skadya, the term "polling" is not what I interpret. It appears to use the Java Object class' wait(long timeout) method to determine when to "wake up" the thread that runs the jobs.
So, I wouldn't call it polling. I would call it an asynchronous mechanism in which the delay / timeout is set such that the thread will wake up (e.g. to run the next scheduled job at the appropriate time) via the timeout set to a value that is appropriate for the next scheduled job's commencement.
Javadoc for Object.wait(long timeout)
Note that the implementation for Object.wait is a native (i.e. non-java) implementation provided by the JDK / JRE / JVM for a given platform. For what that's worth.
It is possible to do performance test with activemq web console. There is an option to send message with configurable delay and number of messages to send. It doesn't answer my question but it seems like best option to compare two approaches.
We're currently using RabbitMQ, where a continuously super-fast producer is paired with a consumer limited by a limited resource (e.g. slow-ish MySQL inserts).
We don't like declaring a queue with x-max-length, since all messages will be dropped or dead-lettered once the limit is reached, and we don't want to loose messages.
Adding more consumers is easy, but they'll all be limited by the one shared resource, so that won't work. The problem still remains: How to slow down the producer?
Sure, we could put a flow control flag in Redis, memcached, MySQL or something else that the producer reads as pointed out in an answer to a similar question, or perhaps better, the producer could periodically test for queue length and throttle itself, but these seem like hacks to me.
I'm mostly questioning whether I have a fundamental misunderstanding. I had expected this to be a common scenario, and so I'm wondering:
What is best practice for throttling producers? How is this done with RabbitMQ? Or do you do this in a completely different way?
Background
Assume the producer actually knows how to slow himself down with the right input. E.g. a hardware sensor or hardware random number generator, that can generate as many events as needed.
In our particular real case, we have an API that users can use to add messages. Instead of devouring and discarding messages, we'd like to apply back-pressure by having our API return an error if the queue is "full", so the caller/user knows to back-off, or have the API block until the consumer catches up. We don't control our user, so regardless of how fast the consumer is, I can create a producer that is faster.
I was hoping for something like the API for a TCP socket, where a write() can block and where a select() can be used to determine if a handle is writable. So either having the RabbitMQ API block or have it return an error if the queue is full.
For the x-max-length property, you said you don't want messages to be dropped or dead-lettered. I see there was an update in adding some more capabilities for this. As I see it is specified in the documentation:
"Use the overflow setting to configure queue overflow behaviour. If overflow is set to reject-publish, the most recently published messages will be discarded. In addition, if publisher confirms are enabled, the publisher will be informed of the reject via a basic.nack message"
So as I understand it, you can use queue limit to reject the new messages from publishers thus pushing some backpressure to the upstream.
I don't think that this is in any way rabbitmq specific. Basically you have a scenario, where there are two systems of different processing capabilities, and this mismatch will either pose a risk of overflowing the queue (whatever it would be), or even in case of a constant mismatch between producer and consumer, simply create more and more time-distance between event creation and its handling.
I used to deal with this kind of scenarios, and unfortunately there is no magic bullet. You either have to speed up even handling (better hardware, more suited software?) or throttle the event creation (which has nothing to do with MQ really).
Now, I would ask you what's the goal and how the events are produced. Are the events are produced constantly, with either unlimitted or just very high rate (for example readings from sensors - the more, the better), or are they created in batches/spikes (for example: user requests in specific time periods, batch loads from CRM system). I assume that the goal is to process everything cause you mention you don't want to loose any queued message.
If the output is constant, then some limiter (either internal counter, if the producer is the only producer, or external queue length checks if queue can be filled with some other system) is definitely in place.
IF eventsInTimePeriod/timePeriod > estimatedConsumerBandwidth
THEN LowerRate()
ELSE RiseRate()
In real world scenarios we used to simply limit the output manually to the estimated values and there were some alerts set for queue length, time from queue entry to queue leaving etc. Where such limiters were omitted (by mistake mostly) we used to find later some tasks that were supposed to be handled in few hours, that were waiting for three months for their turn.
I'm afraid it's hard to answer to "How to slow down the producer?" if we know nothing about it, but some ideas are: aforementioned rate check or maybe a blocking AddMessage method:
AddMessage(message)
WHILE(getQueueLength() > maxAllowedQueueLength)
spin(1000); // or sleep or whatever
mqAdapter.AddMessage(message)
I'd say it all depends on specific of the producer application and in general your architecture.
Suppose I have a semaphore to control access to a dispatch_queue_t.
I wait for the semaphore (dispatch_semaphore_wait) before scheduling a block on the dispatch queue.
dispatch_semaphore_wait(semaphore,DISPATCH_TIME_FOREVER)
dispatch_async(queue){ //do work ; dispatch_semaphore_signal(semaphore); }
Suppose I have work waiting in several separate locations. Some "work" have higher priority than the other "work".
Is there a way to control which of the "work" will be scheduled next?
Additional information: using a serial queue without a semaphore is not an option for me because the "work" consist of its own queue with several blocks. All of the work queue has to run, or none of it. No work queues can run simultaneously. I have all of this working fine, except for the priority control.
Edit: (in response to Jeremy, moved from comments)
Ok, suppose you have a device/file/whatever like a printer. A print job consists of multiple function calls/blocks (print header, then print figure, then print text,...) grouped together in a transaction. Put these blocks on a serial queue. One queue per transaction.
However you can have multiple print jobs/transactions. Blocks from different print jobs/transactions can not be mixed. So how do you ensure that a transaction queue runs all of its jobs and that a transaction queue is not started before another queue has finished? (I am not printing, just using this as an example).
Semaphores are used to regulate the use of finite resources.
https://www.mikeash.com/pyblog/friday-qa-2009-09-25-gcd-practicum.html
Concurrency Programming Guide
The next step I am trying to figure out is how to run one transaction before another.
You are misusing the API here. You should not be using semaphores to control what gets scheduled to dispatch queues.
If you want to serialize execution of blocks on the queue, then use a serial queue rather than a concurrent queue.
If different blocks that you are enqueuing have different priority, then you should express that different priority using the QOS mechanisms added in OS X 10.10 and iOS 8.0. If you need to run on older systems, then you can use the different priority global concurrent queues for appropriate work. Beyond that, there isn't much control on older systems.
Furthermore, semaphores inherently work against priority inheritance since there is no way for the system to determine who will signal the semaphore and thus you can easily end up in a situation where a higher priority thread will be blocked for a long time waiting for a lower priority thread to signal the semaphore. This is called priority inversion.
I am using RabbitMQ to have worker processes encode video files. I would like to know when all of the files are complete - that is, when all of the worker processes have finished.
The only way I can think to do this is by using a database. When a video finishes encoding:
UPDATE videos SET status = 'complete' WHERE filename = 'foo.wmv'
-- etc etc etc as each worker finishes --
And then to check whether or not all of the videos have been encoded:
SELECT count(*) FROM videos WHERE status != 'complete'
But if I'm going to do this, then I feel like I am losing the benefit of RabbitMQ as a mechanism for multiple distributed worker processes, since I still have to manually maintain a database queue.
Is there a standard mechanism for RabbitMQ dependencies? That is, a way to say "wait for these 5 tasks to finish, and once they are done, then kick off a new task?"
I don't want to have a parent process add these tasks to a queue and then "wait" for each of them to return a "completed" status. Then I have to maintain a separate process for each group of videos, at which point I've lost the advantage of decoupled worker processes as compared to a single ThreadPool concept.
Am I asking for something which is impossible? Or, are there standard widely-adopted solutions to manage the overall state of tasks in a queue that I have missed?
Edit: after searching, I found this similar question: Getting result of a long running task with RabbitMQ
Are there any particular thoughts that people have about this?
Use a "response" queue. I don't know any specifics about RabbitMQ, so this is general:
Have your parent process send out requests and keep track of how many it sent
Make the parent process also wait on a specific response queue (that the children know about)
Whenever a child finishes something (or can't finish for some reason), send a message to the response queue
Whenever numSent == numResponded, you're done
Something to keep in mind is a timeout -- What happens if a child process dies? You have to do slightly more work, but basically:
With every sent message, include some sort of ID, and add that ID and the current time to a hash table.
For every response, remove that ID from the hash table
Periodically walk the hash table and remove anything that has timed out
This is called the Request Reply Pattern.
Based on Brendan's extremely helpful answer, which should be accepted, I knocked up this quick diagram which be helpful to some.
I have implemented a workflow where the workflow state machine is implemented as a series of queues. A worker receives a message on one queue, processes the work, and then publishes the same message onto another queue. Then another type of worker process picks up that message, etc.
In your case, it sounds like you need to implement one of the patterns from Enterprise Integration Patterns (that is a free online book) and have a simple worker that collects messages until a set of work is done, and then processes a single message to a queue representing the next step in the workflow.