Can OptaPlanner solve multiple problems concurrently - optaplanner

When OptaPlanner is used in web service, which means an OptaPlanner app is required to solve multiple problems in parallel threads, are there any limitations to prevent OptaPlanner from doing this? Is synchronization required in any OptaPlanner functions? Thanks

OptaPlanner supports this: it's a common use case.
In a single JVM, look at SolverManager to solve mulitple datasets of the same use case in parallel. Even if the constraint weights differ per dataset (see ConstraintConfiguration). So even if some datasets disable/enable some of the constraints while others don't.
For different use cases in a single JVM, just create multiple SolverFactory or SolverManager instances. This is uncommon because usually each use case is a different app (= microservice?).
Across multiple JVMs (= pods), there are several good techniques. Our activemq quickstart scales beautifully horizontally. Read Radovan's blog about how the ActiveMQ is used to load balance the work across the solver pods.

Related

Kubernetes + TF serving - how to use hundred of ML models without running hundred of idle pods up and running?

I have hundreds of models, based on categories, projects,s, etc. Some of the models are heavily used while other models are not used very frequently.
How can I trigger a scale-up operation only in case needed (For the models that are not frequently used), instead of running hundreds of pods serving hundreds of models while most of them are not being used - which is a huge waste of computing resources.
What you are trying to do is to scale deployment to zero when these are not used.
K8s does not provide such functionality out of the box.
You can achieve it using Knative Pod Autoscaler.
Knative is probably the most mature solution available at the moment of writing this answer.
There are also some more experimental solutions like osiris or zero-pod-autoscaler you may find interesting and that may be a good fit for your usecase.

Tensorflow Mirror Strategy and Horovod Distribution Strategy

I am trying to understand what are the basic difference between Tensorflow Mirror Strategy and Horovod Distribution Strategy.
From the documentation and the source code investigation I found that Horovod (https://github.com/horovod/horovod) is using Message Passing Protocol (MPI) to communicate between multiple nodes. Specifically it uses all_reduce, all_gather of MPI.
From my observation (I may be wrong) Mirror Strategy is also using all_reduce algorithm (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute).
Both of them are using data-parallel, synchronous training approach.
So I am a bit confused how they are different? Is the difference only in implementation or there are other (theoretical) difference?
And how is the performance of mirror strategy compared to horovod?
Mirror Strategy has its own all_reduce algorithm which use remote procedural calls (gRPC) under the hood.
Like you mentioned Horovod uses MPI/GLOO to communicate between multiple processes.
Regarding the performance, one of my colleagues have performed experiments before using 4 Tesla V100 GPUs using the codes from here. The results suggested that 3 settings work the best: replicated with all_reduce_spec=nccl, collective_all_reduce with properly tuned allreduce_merge_scope (e.g. 32), and horovod. I did not see significant differences among these 3.

Configuration/Flags for TF-Slim across multiple GPU/Machines

I am curios if there are examples on how to run TF-Slim models/slim using deployment/model_deploy.py across multiple GPU’s on multiple machines. The documentation is pretty good but I am missing a couple of pieces. Specifically what needs to be put in for worker_device and ps_device and what additionally needs to be run on each machine?
An example like the one at the bottom of the distributed page would be awesome.
https://www.tensorflow.org/how_tos/distributed/

redis sharding, pipelining, and round-trips

Suppose in your web application you need to do a number of redis calls to render a page, like, getting a bunch of user hashes. To speed this up you could wrap up your redis commands in a MULTI/EXEC section, thus using pipelining, so that you avoid doing many round-trips. But you also want to shard your data, because you have lots of it and/or you want to distribute writes. Then pipelining wouldn't work, because different keys would potentially live on different nodes, unless you have a clear idea of the data layout of your application and shard based on roles rather than using a hash function. So, what are the best practices to shard data across different servers without compromising performance too much due to many servers being contacted to complete a "conceptually unique" job? I believe the answer depends on the web application one is developing, and I'll eventually run some tests, but it'd be helpful to know how others have coped with the trade-offs I mentioned.
MULTI/EXEC and pipelining are two different things. You can do MULTI/EXEC without any pipelining and vice versa.
If you want to shard and pipeline at the same time, you need to group the operations to pipeline per Redis instance, and then use pipelining for each instance.
Here is a simple example using Ruby: https://gist.github.com/2587593
One way to further improve performance is to parallelize the traffic on the Redis instances once the operations have been grouped (i.e. you group the operations, you send them to all instances in parallel, you wait for the answers from all instances).
This is a bit more complex, because an asynchronous non blocking client is required. For maximum performance, C/C++ should be used on client side. This can be easily implemented by using hiredis + the event loop of your choice.

ZooKeeper and RabbitMQ/Qpid together - overkill or a good combination?

Greetings,
I'm evaluating some components for a multi-data center distributed system. We're going to be using message queues (via either RabbitMQ or Qpid) so agents can make asynchronous requests to other agents without worrying about addressing, routing, load balancing or retransmission.
In many cases, the agents will be interacting with components that were not designed for highly concurrent access, so locking and cross-agent coordination will be needed to avoid race conditions. Also, we'd like the system to automatically respond to agent or data center failures.
With the above use cases in mind, ZooKeeper seemed like it might be a good fit. But I'm wondering if trying to use both ZK and message queuing is overkill. It seems like what Zookeeper does could be accomplished by my own cluster manager using AMQP messaging, but that would be hard to get really right. On the other hand, I've seen some examples where ZooKeeper was used to implement message queuing, but I think RabbitMQ/Qpid are a more natural fit for that.
Has anyone out there used a combination like this?
Thanks in advance,
-Chris
Coming into this late, but maybe it will be of some use. The primary consideration should be the performance characteristics of your system. ZooKeeper, like you said, is more than capable of implementing a task distribution system using a distributed queue, but zk currently, is more optimized for reads than it is for writes (this only comes into play in the 1000's of ops per second range). If your throughput needs are less than this, then using just zk to implement your system would reduce number of runtime components and make it simpler. Of course, you should always run your performance tests before deciding.
Distributed coordination is really hard to get right, so I would definitely recommend using zookeeper for that and not rolling your own.
Not quite sure what ZooKeeper exactly is, but I guess that using a component from Apache (if it does fit your needs well) is preferred before managing such things as distributed synchronization and group services at your own. You could of course hire a team of developers especially for that purpose, but that doesn't guarantee you a better implementation.
I guess, that it would be anyways implemented as a separate component, cuz other way could bring much complexity and decelerate the workflow; so the preference of ZooKeeper or anything similar is kind of obvious (to me).
And surely, unless you're in the global optimization phase of your project workflow, I guess it would be better to use RabbitMQ or such (I would even stress that, cuz implementations (especially commercial) of the AMQP would be more reliable than everything that you'd come up with).
So I would go for both, carefully chosing the appropriate thirdparty products, but using as much of them as it is needed. And that's just my opinion; thanks for reading :)