Optimization techniques - optimization

In the above diagram, we have a producer and a consumer. The producer takes about 1 unit of time to produce something and the consumer take about 9 units of time (4 to read and compute the data and 5 to write it back to the database). From a design standpoint, what might be my options to ensure that consumer does not start lagging behind? What can I do (like caching, ensure proper indexing in the DB) to make this better?

I don't know the hidden details of what your system is exactly like but the initial suggestion which instantly popped into my mind is to create multiple threads for both consumer and producer and use threadpool to reuse the threads. You must create more threads for consumer than producer as consumer is slow and the control flow is synchronous. You should try to perform tuning to decide what should be the ratio of number of consumer to producer threads so that there will be always some consumer threads available to consume the events created by the producer thread instantly.
Again, I don't know what's the exact requirement. For example, using multiple threads will affect the order of execution of events streams resulting in inconsistency. So if you don't require the events to be processed and persisted in exact order they are coming, you can certainly boost the performance by parallelization (using threadpool).
Good luck!

Related

RabbitMQ with many small queues to enforce sequential execution (pattern or anti-pattern)?

Hypothetical (but simpler) scenario:
I have many orders in my system.
I have external triggers that affect those orders (e.g. webhooks). They may occur in parallel, and are handled by different instances in my cluster.
In the scope of a single order, I would like to make sure that those events are processed in sequential order to avoid race conditions, version conflicts etc.
Events for different orders can (and should) be processed in parallel
I'm currently toying with the idea of leveraging RabbitMQ with a setup similar to this:
use a queue for each order (create on the fly)
if an event occurs, put it in that queue
Those queues would be short-lived, so I wouldn't end up with millions of them, but it should scale anyway (let's say lower one-digit thousands if the project grows substantially). Question is whether that's an absolute anti-pattern as far as RabbitMQ (or similar) systems goes, or if there's better solutions to ensure sequential execution anyway.
Thanks!
In my opinion creating ephemeral queues might not be a great idea as there will considerable overhead of creating and delete queues. The focus should be on message consumption. I can think of following solutions:
You can limit the number of queues by building publishing strategy like all orders with orderId divisible by 2 goes to queue-1, divisible by 3 goes to queue-2 and so forth. That will give you parallel throughput as well as finite number queues but there is some additional publisher logic you have to handle
The same logic can be transferred to consumer side by using single pub-sub style queue and then onus lies on consumer to filter unwanted orderIds
If you are happy to explore other technologies, you can look into Kafka as well where you can use orderId as partitionKey and use multiple partitions to gain parallel throughput.

What are the alternatives to a scheduler that uses a linked list?

I'm reading an article on real-time kernels, and the author explains how to implement a scheduler for tasks with a linked list. He also states that this is not the best way since tasks are inserted and removed based on priority; however, he doesn't explain what those other methods are.
What are the other methods for implementing a scheduler other than a linked list?
Take a good hard look at the Queue Data structure. If you have a queue for each priority level, then you can start at the highest priority queue, and process until the queue is empty, then step to the next priority query, until you have hit all of the priorities.
Having tasks at the same priority level in a queue, allows you to guarentee that each task gets at least one quantum of processing, before it is thrown into the tail of (possibly another) queue.
Of course for real-time processsing, you want quick response to an interrupt. Perhaps some sort of Priority Queue might be applicable.
There's lots, for instance could have been a double linked list, so for inserting a low priority task, could have searched backwards from the tail.
You could implement the schedule as in list of tasks with anything from an array to a B-Tree, which one you use depends on what you are scheduling.
Linked list, if it's fairly short might be the optimal solution.

updating 2 800 000 records with 4 threads

I have a VB.net application with an Access Database with one table that contains about 2,800,000 records, each raw is updated with new data daily. The machine has 64GB of ram and i7 3960x and its over clocked to 4.9GHz.
Note: data sources are local.
I wonder if I use ~10 threads will it finish updating the data to the rows faster.
If it is possiable what would be the mechanisim of deviding this big loop to multiple threads?
Update: Sometimes the loop has to repeat the calculation for some row depending on results also the loop have exacly 63 conditions and its 242 lines of code.
Microsoft Access is not particularly good at handling many concurrent updates, compared to other database platforms.
The more your tasks need to do calculations, the more you will typically benefit from concurrency / threading. If you spin up 10 threads that do little more than send update commands to Access, it is unlikely to be much faster than it is with just one thread.
If you have to do any significant calculations between reading and writing data, threads may show a performance improvement.
I would suggest trying the following and measuring the result:
One thread to read data from Access
One thread to perform whatever calculations are needed on the data you read
One thread to update Access
You can implement this using a Producer / Consumer pattern, which is pretty easy to do with a BlockingCollection.
The nice thing about the Producer / Consumer pattern is that you can add more producer and/or consumer threads with minimal code changes to find the sweet spot.
Supplemental Thought
IO is probably the bottleneck of your application. Consider placing the Access file on faster storage if you can (SSD, RAID, or even a RAM disk).
Well if you're updating 2,800,000 records with 2,800,000 queries, it will definitely be slow.
Generally, it's good to avoid opening multiple connections to update your data.
You might want to show us some code of how you're currently doing it, so we could tell you what to change.
So I don't think (with the information you gave) that going multi-thread for this would be faster. Now, if you're thinking about going multi-thread because the update freezes your GUI, now that's another story.
If the processing is slow, I personally don't think it's due to your servers specs. I'd guess it's more something about the logic you used to update the data.
Don't wonder, test. Write it so you could dispatch as much threads to make the work and test it with various numbers of threads. What does the loop you are talking about look like?
With questions like "if I add more threads, will it work faster"? it is always best to test, though there are rule of thumbs. If the DB is local, chances are that Oded is right.

Pika/RabbitMQ: Correct usage of add_backpressure_callback

I am new to using RabbitMQ and Pika so please excuse if the answer is obvious...
We are feeding some data and passing the results into our rabbitmq message queue. The queue is being consumed by a process that writes the data into elasticsearch.
The data is being produced faster than it can be fed into elastic search and consequently the queue grows and almost never shrinks.
We are using pika and getting the warning:
UserWarning: Pika: Write buffer exceeded warning threshold at X bytes and an estimated X frames behind.
This continues for some time until Pika simply crashes with a strange error message:
NameError: global name 'log' is not defined
We are using the Pika BlockingConnection object (http://pika.github.com/connecting.html#blockingconnection).
My plan to fix this is to use the add_backpressure_callback function to have a function that will call time.sleep(0.5) every time that we need to apply back-pressure. However, this seems like it is too simple of a solution and that there must be a more appropriate way of dealing with something like this.
I would guess that it is a common situation that the queue is being populated faster than it is being consumed. I am looking for an example or even some advice as to what is the best way to slow down the queue.
Thanks!
Interesting problem, and as you rightly point out this is probably quite common. I saw another related question on Stack Overflow with some pointers
Pika: Write buffer exceeded warning
Additionally, may you want to consider scaling up your elasticsearch, this is perhaps the fundamental bottleneck you want to fix. A quick look on the elasticsearch.org website came up with
"Distributed
One of the main features of Elastic Search is its distributed nature. Indices are broken down into shards, each shard with 0 or more replicas. Each data node within the cluster hosts one or more shards, and acts as a coordinator to delegate operations to the correct shard(s). Rebalancing and routing are done automatically and behind the scenes.
"
(...although not sure if insertion is also distributed and scalable)
Afterall, RabbitMQ is not supposed to grow queues infinitely. Also may want to look at scaling up RabbitMQ itself, for example by using things like per-queue processes etc. in the RabbitMQ configuration.
Cheers!

About number of threads

I am reading concurrency programming guide in ios dev site
when move to the section "Moving away from thread" ,Apple said
Although threads have been around for many years and continue to have
their uses, they do not solve the general problem of executing
multiple tasks in a scalable way. With threads, the burden of creating
a scalable solution rests squarely on the shoulders of you, the
developer. You have to decide how many threads to create and adjust
that number dynamically as system conditions change. Another problem
is that your application assumes most of the costs associated with
creating and maintaining any threads it uses.
follow my previous learning,the OS will take care about process-thread management , and programmer just only create and destroy threads in desire ,
is it wrong ?
No it is not wrong. What it is saying is when you are programming with threads, most of the time you dynamically create threads based on certain conditions that the programmer places in their code. For example, finding prime numbers can be split up with threads but the creating and destruction of threads is made by the programmer. You are completely correct, it is just saying what you are saying in a more descriptive and elaborate way.
Oh and for the thread management, sometimes if the developer sees that most of the time the user will need to create a large amount of threads, it is cheaper to spawn a pool of threads and use those.
Say you have 100 tasks to perform, all using independent--for the duration of the task--data. Every thread you start costs quite a bit of overhead. So if you have two cores, you only want to start two threads, because that's all that's going to run anyway. Then you have to feed tasks to each of those threads to keep them both running. If you have 100 cores, you'll launch 100 threads. It's worth the overhead to get the job done 50 times faster.
So in old-fashioned programming, you have to do two jobs. You have to find out how many cores you have, and you have to feed tasks to each of your threads so they keep running and don't waste cores. (This becomes only one job if you have >= 100 cores.)
I believe Apple is offering take over these two awkward jobs for you.
If your jobs share data, that changes things. With two threads running, one can block the other, and even on a 2-core machine it pays to have three or more threads running. You are apt to find letting 100 threads loose at once makes sense because it improves the chances that at least two of them are not blocked. It prevents one blocked task from holding up the rest of the tasks in its thread. You pay a price in thread overhead, but get it back in high CPU usage.
So this feature is sometimes very useful and sometimes not. It helps with parallel programming, but would hinder with non-parallel concurrency (multithreading).