We are currently trying to improve the performance of a planning problem we've implemented in OptaPlanner. Our model has ~45,000 chained variables and after profiling the application it seems like the main bottleneck is around the cloning. Approximately 90% of the CPU run-time is consumed by the FieldAccessingSolutionCloner method calls.
We've already tried to make our object model more lightweight by reducing the number of Maps and Sets within the PlanningEntities, changing fields to primitives where possible, but from your own OptaPlanner experience have you any advice about how speed up cloning performance?
Have you tried writing a custom cloner? See docs.
The default one needs to rely on reflection, so it's slower.
Also, the structure of your domain model influences how much you need to clone (regardless if you go custom or not):
If you delete your Solution and Planning Entities classes, do your other domain classes still compile?
If yes, then the clone is minimal. If no, it's not.
Related
In the context of the vehicle-routing- or TSP-problem: Let's say we want to externalise the travel-time between two locations into a cost-matrix problem fact.
We can rewrite the distanceTo-method of the GeoLocation-class to a simple lookup of the value in the matrix. But to do that we need to store a reference of the matrix-instance in the GeoLocation-instances.
What impact does this have on the cloning of the solution and associated planning entities? Is the matrix going to be deeply cloned / will different planning-entities point to different matrix-instances during planning? Of course this should be avoided as the matrix does not change during planning and deep-cloning it might inflict a performance decrease. Instead, each GeoLocation's matrix-reference should point to the same matrix-object in the memory.
Is the FieldAccessingSolutionCloner handling this appropriately or do we need to provide our own SolutionCloner?
The SolutionCloner does a planning clone, which doesn't clone problem facts, unless the problem facts references the planning solution or a planning entity.
Your class model should be designed such at there is no need to planning clone your distance matrix.
The VRP example in optaplanner-examples doesn't clone it's distance matrix (the Location instances aren't planning cloned).
It's important to understand that anything that directly or indirectly references a planning entity or the planning solution, must be planning cloned, or changes to the working solution will affect the best solution, corrupting it.
I'm currently working on a large scale timetabling problem from my university. I'm using CPLEX to create the model and solve it, but due to it's size and processing time, I'm considering trying out a local search algorithm like G.A to solve it, but I'm lost on how to properly do it. Is there a way of applying a local search on it without having to reformulate the whole model?
one possible manner to tackle your problem is to use the CPLEX callbacks.
You may implement a heuristic callback. In this callback, you can implement your GA within the CPLEX model and use it to find a feasible solution (which I think is very difficult in various timetabling problems) or to improve your current solution.
It isn't clear to me when it's a good idea to use VK_IMAGE_LAYOUT_GENERAL as opposed to transitioning to the optimal layout for whatever action I'm about to perform. Currently, my policy is to always transition to the optimal layout.
But VK_IMAGE_LAYOUT_GENERAL exists. Maybe I should be using it when I'm only going to use a given layout for a short period of time.
For example, right now, I'm writing code to generate mipmaps using vkCmdBlitImage. As I loop through the sub-resources performing the vkCmdBlitImage commands, should I transition to VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL as I scale down into a mip, then transition to VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL when I'll be the source for the next mip before finally transitioning to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL when I'm all done? It seems like a lot of transitioning, and maybe generating the mips in VK_IMAGE_LAYOUT_GENERAL is better.
I appreciate the answer might be to measure, but it's hard to measure on all my target GPUs (especially because I haven't got anything running on Android yet) so if anyone has any decent rule of thumb to apply it would be much appreciated.
FWIW, I'm writing Vulkan code that will run on desktop GPUs and Android, but I'm mainly concerned about performance on the latter.
You would use it when:
You are lazy
You need to map the memory to host (unless you can use PREINITIALIZED)
When you use the image as multiple incompatible attachments and you have no choice
For Store Images
( 5. Other cases when you would switch layouts too much (and you don't even need barriers) relatively to the work done on the images. Measurement needed to confirm GENERAL is better in that case. Most likely a premature optimalization even then.
)
PS: You could transition all the mip-maps together to TRANSFER_DST by a single command beforehand and then only the one you need to SRC. With a decent HDD, it should be even best to already have them stored with mip-maps, if that's a option (and perhaps even have a better quality using some sophisticated algorithm).
PS2: Too bad, there's not a mip-map creation command. The cmdBlit most likely does it anyway under the hood for Images smaller than half resolution....
If you read from mipmap[n] image for creating the mipmap[n+1] image then you should use the transfer image flags if you want your code to run on all Vulkan implementations and get the most performance across all implementations as the flags may be used by the GPU to optimize the image for reads or writes.
So if you want to go cross-vendor only use VK_IMAGE_LAYOUT_GENERAL for setting up the descriptor that uses the final image and not image reads or writes.
If you don't want to use that many transitions you may copy from a buffer instead of an image, though you obviously wouldn't get the format conversion, scaling and filtering that vkCmdBlitImage does for you for free.
Also don't forget to check if the target format actually supports the BLIT_SRC or BLIT_DST bits. This is independent of whether you use the transfer or general layout for copies.
Can anyone give me some tips to make a binary integer programming model faster?
I currently have a model that runs well with very small amount of variables but as soon as I increase the number of variables in my model SCIP keeps running without giving me an optimal solution. I'm currently using SCIP with Soplex to find an optimal solution.
You should have a look at the statistics (type display statistics in the interactive shell). Watch out for time consuming heuristics that don't find a solution and try disabling them. You should also play around with the parameters to find better suited settings for your instances (different branching rule or node selection). Without further information, though, we won't be able to help you.
I have a CoreData-based application that retrieves data about past events from an SQLite persistence store. Once I have the past events my application does some statistical analysis to predict future events based on the data it has about past events. Once my application has made a prediction about future events I want to run another algorithm that does some evaluation of that prediction. I'm expecting to do a lot of these evaluations, so performance optimization for each evaluation is likely to be critical.
Now, all of the classes I need to represent my future event predictions exist in my data model, and I have NSManagedObject subclasses for most of the important entities. The easiest way for me to implement my algorithms is to "fill in" the results for future events based on the prediction, and then run my evaluation using NSManagedObject instances for both the past events and the predictions for future events. However, I typically don't want to save these future event predictions in my persistent store: Once I have performed my evaluation of the prediction I want to throw away the predictions and just keep the evaluation results. I can do this pretty easily, I think, by just sending the rollback: message to my managed object context once my evaluation is complete.
That will all work fine, and from a coding perspective it seems like it will be quite easy to implement. However, I am wondering if I should expect performance concerns making such heavy use of managed objects when I have no intention of ever saving the changes I'm making. Given that performance is likely to be a factor, does using NSManagedObject instances for this make sense? Surely all the things it's doing to keep track of changes and support things like undo and complex entity relationships come with some amount of overhead. Should I be concerned about this overhead?
I could of course create non-NSManagedObject classes that implement an optimized version of my model classes for use when making predictions and evaluating them. That would involve a lot of additional work, including the work necessary to copy data back and forth between the NSManagedObject instances for past events and the optimized class instances for future events: I'd rather not create that code if it is not needed.
Surely all the things it's doing to keep track of changes and support
things like undo and complex entity relationships come with some
amount of overhead.
Core Data doesn't have the overhead that people expect owing to its optimizations. In general, using managed objects in memory is as fast or faster than any custom objects and management code you write yourself.
Should I be concerned about this overhead?
Can't really say without implementation details but most likely not. You can hand tweak Core Data for specific circumstances to get better performance.
The best approach is always to start with the most simple solution and then move to a more complex only when testing reveals that the simple solution does not perform well.
Premature optimization is the root of all evil.