For the construction phase why is selectionOrder=RANDOM so much faster than the default option? - optaplanner

Given the configuration below why is it so much faster when selectionOrder = RANDOM (7s) than when the default is used i.e. selectionOrder commented out (20min). Also when is the RANDOM order applied? Once at the start or at every step?
<constructionHeuristic>
<changeMoveSelector>
<selectionOrder>RANDOM</selectionOrder>
<selectedCountLimit>300</selectedCountLimit>
<valueSelector variableName="user"/>
</changeMoveSelector>
<changeMoveSelector>
<valueSelector variableName="startDate"/>
</changeMoveSelector>
</constructionHeuristic>

To answer your first question: RANDOM is so much faster, because it doesn't need to generate any sort of collection of items to iterate over. It simply generates a random item every time it is called. We do not need to know what comes before, or what comes after. If instead you want a curated selection with guarantees on repetition, we have to first enumerate all of the options. And that can take considerable amount of time. See selection order for details.
To answer your second and third question, please check out selector cache types.

Related

How to use min-conflict algorithm with optaplanner

Is there a min-conflict algorithm in optaplanner? or how to do it?
What about using it as part of the neighborhood selection like:
Define custom swap factory that construct neighborhood as follow
Get all violations per variable to optimize, thus requires a call to scoreDirector.calculateScore then parse/process constraintMatches
Order by variables lowest score or highest violations
Construct neighborhood via swapping those variables first
If that's viable, is there a way to get the constraintMatches without the need to re-call the calculateScore in order to speed up the process
This algorithm isn't supported out of the box yet by OptaPlanner. I 'd call it Guided Local Search. But it's not that hard to add yourself. In fact, it's not a matter of changing the algorithm, but changing the entity selectors.
Something like this should work:
<swapMoveSelector>
<entitySelector>
<cacheType>STEP</cacheType>
<probabilityWeightFactoryClass>...MyProbabilityWeightFactory</probabilityWeightFactoryClass>
</entitySelector>
</swapMoveSelector>
Read about the advanced swapMoveSelector configuration, entity selector, sorted selection and probability selection.
The callback class you implement for the probabilistic selection or sorted selection should prioritize entities that are part of a conflict.
I would definitely use sorted or probabilistic selection on the entity selector, not the entire swapMoveSelector because that is overkill, cpu hungry and memory hungry.
I would prefer probabilistic selection over sorted selection. Even though sorted selection better reflects your pseudo code, I believe (but haven't proven) that probabilistic selection will do better, given the nature of Metaheuristics. Try both, run some benchmarks with the Benchmarker and let us know what works best ;)
Not sure about how to solve your overall problem but for your last point:
You can create a PhaseLifecycleListener and attach it via ((DefaultSolver) solver).addPhaseLifecycleListener
In the stepStarted or stepEnded(depending on your need) you can then call
stepScope.getScoreDirector().getConstraintMatchTotals()
to get the constraint totals.
Hope this somewhat helps.

Hyperopt set timeouts and modify space during execution

if someone can help on:
How to set a timeout for each individual test ? a timeout for the total experiment ?
How to setup a progressive strategy which would eliminate/prune a % of worst scoring branches of search space at different stage of the experiment (while using current optimization algorithms) ? ie. at 30% of the max total experiment, it could remove 50% of the worst scoring classifiers and all its branch of hyperparameters to remove it from upcoming tests. Then, same process at 60%...
Thanks a lot!
Following my exchange on hyperopt's github:
there is not a per-trial timeout but hyperopt-sklearn implements its own solution by just wrapping the function. Please look for "fn_with_timeout" at https://github.com/hyperopt/hyperopt-sklearn/ .
from issue 210: "the optimizers are stateless, and fmin stores all state of the experiment in the trials object. So if you remove some experiments from the trials object, it's as if they never happened. use fmin's "max_evals" parameter to interrupt search as often as you need to make these sorts of modifications. It should be fine to use repeated calls with e.g. max_evals increasing by 1 every time if you want really fine grained control."
Thanks for looking into this, #doxav. I've written some code that addresses question 1, taking part of fn_with_timeout from hyperopt-sklearn and adapting it for standard Hyperopt cost functions.
You can find it here:
https://gist.github.com/hunse/247d91d14aaa8f32b24533767353e35d

GPUImage capturePhotoAsImageProcessedUpToFilter only working for the last filter

In my application I am using a stack of 3 filters and adding that to a stillCamera. I am trying to take the image from filter1, its an empty filter so it returns the actual image.
[stillCamera addTarget:filter1];
[filter1 addTarget:filter2];
[filter2 addTarget:filter3];
[filter3 addTarget:cameraView];
When I call capturePhotoAsImageProcessedUpToFilter, it only ever returns an image when I pass it filter3 like below.
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter3 with...
The two examples below never return images
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter1 with...
[stillCamera capturePhotoAsImageProcessedUpToFilter:filter2 with...
Am I doing something wrong? As a fix I am using:
[filter1 imageFromCurrentlyProcessedOutput]
Is there any difference between calling capturePhotoAsImageProcessedUpToFilter and imageFromCurrentlyProcessedOutput?
I think this is a side effect of a memory conservation optimization I tried to put in place last year. For very large images, like photos, what I try to do is destroy the framebuffer that backs each filter as the filtered image progresses through the filter chain. The idea is to try to minimize memory spikes by only having one or two copies of the large image in memory at any point in time.
Unfortunately, that doesn't seem to work as intended much of the time, and because the framebuffers are deleted as the image progresses, only the last filter in the chain ends up having a valid framebuffer to read from. I'm probably going to yank this optimization out at some point in the near future in favor of an internal framebuffer and texture cache, but I'm not sure what can be done in the meantime to read from these intermediary filters in a chain.

Methods for incremental score calculations

can someone explain the purpose of the methods that need to be implemented for incremental score calculation? I understand all the after... methods, but why should I adjust the score before an entity is added, removed or a variable is changed (beforeEntityAdded, beforeVariableChanged, beforeEntityRemoved)?
See this image from the 6.0.0.Final docs:
Also see the section "incremental score calculation" (which also explains why this is so much faster than SimpleScoreCalculator). Look at the example implementations. You'll see that beforeVariableChanged() is needed to retract the violated constraint matches that no longer match.
In the diagram above, the ChangeMove needs to get +1 because AB no longer match during the beforeVariableChanged() method and -1 becaues AC now match during the afterVariableChanged method.

VB.NET, best practice for sorting concurrent results of threadpooling?

In short, I'm trying to "sort" incoming results of using threadpooling as they finish. I have a functional solution, but there's no way in the world it's the best way to do it (it's prone to huge pauses). So here I am! I'll try to hit the bullet points of what's going on/what needs to happen and then post my current solution.
The intent of the code is to get information about files in a directory, and then write that to a text file.
I have a list (Counter.ListOfFiles) that is a list of the file paths sorted in a particular way. This is the guide that dictates the order I need to write to the text file.
I'm using a threadpool to collect the information about each file, create a stringbuilder with all of the text ready to write to the text file. I then call a procedure(SyncUpdate, inlcluded below), send the stringbuilder(strBld) from that thread along with the name of the path of the file that particular thread just wrote to the stringbuilder about(Xref).
The procedure includes a synclock to hold all the other threads until it finds a thread passing the correct information. That "correct" information being when the xref passed by the thread matches the first item in my list (FirstListItem). When that happens, I write to the text file, delete the first item in the list and do it again with the next thread.
The way I'm using the monitor is probably not great, in fact I have little doubt I'm using it in an offensively wanton manner. Basically while the xref (from the thread) <> the first item in my list, I'm doing a pulseall for the monitor. I originally was using monitor.wait, but it would eventually just give up trying to sort through the list, even when using a pulse elsewhere. I may have just been coding something awkwardly. Either way, I don't think it's going to change anything.
Basically the problem comes down to the fact that the monitor will pulse through all of the items it has in the queue, when there's a good chance the item I am looking for probably got passed to it somewhere earlier in the queue or whatever and it's now going to sort through all of the items again before looping back around to find a criteria that matches. The result of this is that my code will hit one of these and take a huge amount of time to complete.
I'm open to believing I'm just using the wrong tool for the job, or just not using tool I have correctly. I would strongly prefer some sort of threaded solution (unsurprisingly, it's much faster!). I've been messing around a bit with the Parallel Task functionality today, and a lot of the stuff looks promising, but I have even less experience with that vs. threadpool, and you can see how I'm abusing that! Maybe something with queue? You get the idea. I am directionless. Anything someone could suggest would be much appreciated. Thanks! Let me know if you need any additional information.
Private Sub SyncUpdateResource(strBld As Object, Xref As String)
SyncLock (CType(strBld, StringBuilder))
Dim FirstListitem As String = counter.ListOfFiles.First
Do While Xref <> FirstListitem
FirstListitem = Counter.ListOfFiles.First
'This makes the code much faster for reasons I can only guess at.
Thread.Sleep(5)
Monitor.PulseAll(CType(strBld, StringBuilder))
Loop
Dim strVol As String = Form1.Volname
Dim strLFPPath As String = Form1.txtPathDir
My.Computer.FileSystem.WriteAllText(strLFPPath & "\" & strVol & ".txt", strBld.ToString, True)
Counter.ListOfFiles.Remove(Xref)
End SyncLock
End Sub
This is a pretty typical multiple producer, single consumer application. The only wrinkle is that you have to order the results before they're written to the output. That's not difficult to do. So let's let that requirement slide for a moment.
The easiest way in .NET to implement a producer/consumer relationship is with BlockingCollection, which is a thread-safe FIFO queue. Basically, you do this:
In your case, the producer threads get items, do whatever processing they need to, and then put the item onto the queue. There's no need for any explicit synchronization--the BlockingCollection class implementation does that for you.
Your consumer pulls things from the queue and outputs them. You can see a really simple example of this in my article Simple Multithreading, Part 2. (Scroll down to the third example if you're just interested in the code.) That example just uses one producer and one consumer, but you can have N producers if you want.
Your requirements have a little wrinkle in that the consumer can't just write items to the file as it gets them. It has to make sure that it's writing them in the proper order. As I said, that's not too difficult to do.
What you want is a priority queue of some sort onto which you can place an item if it comes in out of order. Your priority queue can be a sorted list or even just a sequential list if the number of items you expect to get out of order isn't very large. That is, if you typically have only a half dozen items at a time that could be out of order, then a sequential list could work just fine.
I'd use a heap because it performs well. The .NET Framework doesn't supply a heap, but I have a simple one that works well for jobs like this. See A Generic BinaryHeap Class.
So here's how I'd write the consumer (the code is in pseudo-C#, but you can probably convert it easily enough).
The assumption here is that you have a BlockingCollection called sharedQueue that contains the items. The producers put items on that queue. Consumers do this:
var heap = new BinaryHeap<ItemType>();
foreach (var item in sharedQueue.GetConsumingEnumerable())
{
if (item.SequenceKey == expectedSequenceKey)
{
// output this item
// then check the heap to see if other items need to be output
expectedSequenceKey = expectedSequenceKey + 1;
while (heap.Count > 0 && heap.Peek().SequenceKey == expectedSequenceKey)
{
var heapItem = heap.RemoveRoot();
// output heapItem
expectedSequenceKey = expectedSequenceKey + 1;
}
}
else
{
// item is out of order
// put it on the heap
heap.Insert(item);
}
}
// if the heap contains items after everything is processed,
// then some error occurred.
One glaring problem with this approach as written is that the heap could grow without bound if one of your consumers crashes or goes into an infinite loop. But then, your other approach probably would suffer from that as well. If you think that's an issue, you'll have to add some way to skip an item that you think won't ever be forthcoming. Or kill the program. Or something.
If you don't have a binary heap or don't want to use one, you can do the same thing with a SortedList<ItemType>. SortedList will be faster than List, but slower than BinaryHeap if the number of items in the list is even moderately large (a couple dozen). Fewer than that and it's probably a wash.
I know that's a lot of info. I'm happy to answer any questions you might have.