Additional PlanningEntity in CloudBalancing - bounded-space situation - optaplanner

I successfully amended the nice CloudBalancing example to include the fact that I may only have a limited number of computers open at any given time (thanx optaplanner team - easy to do). I believe this is referred to as a bounded-space problem. It works dandy.
The processes come in groupwise, say 20 processes in a given order per group. I would like to amend the example to have optaplanner also change the order of these groups (not the processes within one group). I have therefore added a class ProcessGroup in the domain with a member List<Process>, the instances of ProcessGroup being stored in a List<ProcessGroup>. The desired optimisation would shuffle the members of this List, causing the instances of ProcessGroup to be placed at different indices of the List List<ProcessGroup>. The index of ProcessGroup should be ProcessGroup.index.
The documentation states that "if in doubt, the planning entity is the many side of the many-to-one relationsship." This would mean that ProcessGroup is the planning entity, the member index being a planning variable, getting assigned to (hopefully) different integers. After every new assignment of indices, I would have to resort the list List<ProcessGroup in ascending order of ProcessGroup.index. This seems very odd and cumbersome. Any better ideas?
Thank you in advance!
Philip.

The current design has a few disadvantages:
It requires 2 (genuine) entity classes (each with 1 planning variable): probably increases search space (= longer to solve, more difficult to find a good or even feasible solution) + it increases configuration complexity. Don't use multiple genuine entity classes if you can avoid it reasonably.
That Integer variable of GroupProcess need to be all different and somehow sequential. That smelled like a chained planning variable (see docs about chained variables and Vehicle Routing example), in which case the entire problem could be represented as a simple VRP with just 1 variable, but does that really apply here?
Train of thought: there's something off in this model:
ProcessGroup has in Integer variable: What does that Integer represent? Shouldn't that Integer variable be on Process instead? Are you ordering Processes or ProcessGroups? If it should be on Process instead, then both Process's variables can be replaced by a chained variable (like VRP) which will be far more efficient.
ProcessGroup has a list of Processes, but that a problem property: which means it doesn't change during planning. I suspect that's correct for your use case, but do assert it.
If none of the reasoning above applies (which would surprise me) than the original model might be valid nonetheless :)

Related

Cocoa Scripting: Deletion of elements in a loop getting out of sync

While adding scriptability to my Mac program, I am struggling with the common programming problem of deleting items from an indexed array where the item indexes shift due to removal of items.
Let's say my app maintains a data store in which objects of type "Person" are stored. In the sdef, I've define the Cocoa Key allPersons to access these elements. My app declares an NSArray *allPersons.
That far, it works well. E.g, this script works well:
repeat with p in every person
get name of p
end repeat
The problem starts when I want to support deletion of items, like this:
repeat with p in (get every person)
delete p
end repeat
(I realize that I could just write "delete every person", which works fine, but I want to show how "repeat" makes things more complicated).
This does not work because AppleScript keep using the original item numbers to reference the items even after deleting some of them, which naturally shifts the items and their numbering.
So, considering we have 3 Persons, "Adam", "Bonny" and "Clyde", this will happen:
get every person
--> {person 1, person 2, person 3}
delete person 1
delete person 2
delete person 3
--> error number -1719 from person 3
After deleting item 1 (Adam), the other items get renumbered to item 1 and 2. The second iteration deletes item 2 (which is now Clyde), and the third iteration attempts to delete item 3, which doesn't exist any more at that point.
How do I solve this?
Can I force the scripting engine to not address the items by their index number but instead by their unique ID so that this won't happen?
It's not your ObjC code, it's your misunderstanding of how repeat with VAR in EXPR loops work. (Not really your fault either: they're 1. counterintuitive, and 2. poorly explained.) When it first encounters your repeat statement, AppleScript sends your app a count event to get the number of items specified by EXPR, which in this case is an object specifier (query) that identifies all of the person elements in whatever. It then uses that information to generate its own sequence of by-index object specifiers, counting from 1 up to the result of the aforementioned count:
person 1 of whatever
person 2 of whatever
...
person N of whatever
What you need to realize is that an object specifier is a first-class query, not an object pointer (not that Apple tell you this either): it describes a request, not an object. Ignore the purloined jargon: Apple event IPC's nearest living relatives are RDBMSes, not Cocoa or SOAP or any of the OO messaging crud that modern developers so fixate on as The One True Way To Do... well, EVERYTHING.
It's only when that query is sent to your application in an Apple event that it's evaluated against the relational graph your Apple event IPC View-Controller – aka "Apple Event Object Model" – presents as an idealized, user-friendly representation of your Model's user date that it actually resolves to a specific Model object, or objects, with which the event handler should perform the requested operation.
Thus, when the delete command in your repeat loop tells your app to delete person 1 of whatever, all your remaining elements move down by one. But on the next iteration the repeat loop still generates the object specifier person 2 of whatever, which your script then sends off to your app, which resolves it to the second item in the collection – which was originally the third item, of course, until you shifted them all about.
Or, to borrow a phrase:
Nothing in AppleScript makes sense except in light of relational queries.
..
In fact, Apple events' query-based approach it actually makes a lot of sense considering it was originally designed to be efficient over very high-latency connections (i.e. System 7's abysmally inefficient process switcher), allowing a single Apple event carrying one or more complex queries to manipulate many objects at once. It's even quite elegant [when it works right], but is quite undone by idiots at Cupertino who think the best way to make programmers not hate the technology is to lie even harder about how it actually works.
So here, I suggest you go read this, which is not the best explanation either but still a damn sight better than anything you'll get from those muppets. And this, by its original designer that explains a lot of the rationale for creating a high-level coarse-grained query-based IPC system instead of the usual low-level fine-grained OO message passing crap.
Oh, and once you've done that, you might want to consider try running this instead:
delete every person whose name is "bob"
which is pretty much the whole point of creating a thick declarative-y abstraction that does all the work so the user doesn't have to.
And when nothing but an imperative client-side loop will do, you either want to get a list of by-ID object specifiers (which are the closest things to safe, persistent pointers that AEOMs can do) from the app first and then iterate over that, or at least use your own iterator loop that counts over elements in reverse:
repeat with i from (count every person) to 1 by -1
tell person i
..
end tell
end repeat
so that, assuming it's iterating over an ordered array on the server side, will delete from last to first, and so avoid the embarrassing off-by-N errors of your original script.
HTH
re: "If you want your scripable elements to be deletable, make sure you use NSUniqueIDSpecifiers to identify them."
Yes, Apple recommends using formUniqueId or formName for object specifiers, but you can't always do that. For instance, in the Text Suite, you really only have indexing to work with; e.g. character 1, word 3, paragraph 7, etc. You don't have unique IDs for text elements. In addition to deletion, ordering can be affected by other Standard Suite commands: open, close, duplicate, make, and move.
The app implementer is a programmer, but so is the scripter. So it is reasonable to expect the scripter to solve some problems themselves. For instance, if the app has 5 persons, and the scripter wants to delete persons 2 and 4, they can easily do so even with indexed deletion:
delete person 4
delete person 2
Deleting from the end of an ordered list forward solves the problem. AS also supports negative indexes, which can be used for the same purpose:
delete person -2
delete person -4
The key to solving this lies in implementing the objectSpecifier method correctly so that it does return an NSUniqueIDSpecifier.
My code did so far only return an index specifier and that was wrong for this purpose. I guess that had I posted my code (which is, unfortunately, too complex for that), someone may have noticed my mistake.
So, I guess the rule is: If you want your scripable elements to be deletable, make sure you use NSUniqueIDSpecifiers to identify them. For read-only element arrays, using an NSIndexSpecifier is (probably) safe, though, if your element array has persistent ordering behavior.
Update
As #foo points out, it's also important that the repeat command fetches the references to the items by using … in (get every person) and not just … in every person, because only the former leads to addressing the items by their id whereas the latter keeps indexing them as item N.

Fast, efficient method of assigning large array of data to array of clusters?

I'm looking for a faster, more efficient method of assigning data gathered from a DAQ to its proper location in a large cluster containing arrays of subclusters.
My current method 1 relies heavily on the OpenG cluster manipulation tools, but with a large data-set the performance is far too slow.
The array and cluster location of each element of data from the DAQ is determined during an initialization phase and doesn't change during acquisition.
Because the data element origin and end points are the same throughout acquisition, I would think an array of memory locations could be created and the data directly assigned to its proper place. I'm just not sure how to implement such a thing.
The following code does what you want:
For each of your cluster elements (AMC, ANLG_PM and PA) you should add a case in the string case structure, for the elements AMC and PA you will need to place a second case structure.
This is really more of a comment, but I do not have the reputation to leave those yet, so here it is:
Regarding adding cases for every possible value of Array name, is there any reason why you cannot use an enum here? Since you are placing it into a cluster anyway, I would suggest making a type-defined enum of your possible array names. That way, when you want to add or remove one, you only have to do it in one place.
You will still need to right-click on your case structures that use this enum and select Add item for every value if you are adding a value, or manually delete the obsolete value if you are removing one. I suppose some maintenance is required either way...

Does OptaPlanner support optimizations and constraints on continuous variables?

I'm reading contradictory things in the documentation.
On one hand, this passage seems to indicate that continuous planning variables are possible:
A planning value range is the set of possible planning values for a
planning variable. This set can be a discrete (for example row 1, 2, 3
or 4) or continuous (for example any double between 0.0 and 1.0).
On the other hand, when defining a Planning Variable, you must specify a ValueRangeProvider annotation on a field to use for the value set:
The Solution implementation has method which returns a Collection. Any
value from that Collection is a possible planning value for this
planning variable.
Both of these snippets are in the same section of the documentation (http://docs.jboss.org/drools/release/latest/optaplanner-docs/html_single/#d0e2518)
So, which is it? Can I use a full double as my planning variable, or do I need to restrict its range to the values in a specific Collection?
Looking at the actual algorithms are provided, I don't see any that are actually suitable for optimizing continuous variables, so I doubt it's possible, but it'd be nice to have that clarified and made explicit.
We're working towards fully supporting continuous variables. But currently (in 6.0.0.CR2) it's not decently supported yet.
Value ranges can indeed be continuous ranges, but the plumbing to actually use them isn't there yet. We have made good progress recently, see https://issues.jboss.org/browse/PLANNER-160.
Here's how it will work:
You 'll be able to use a #ValueRangeProvider annotation on a method that returns a ValueRange (instead of a Collection) too.
A ValueRange will be an interface supports selecting a random value, getting a size, ...
Out-of-the-box we will support IntValueRange, DoubleValueRange, BigDecimalValueRange, ...
(Implementation detail: we'll retro-fit those Collection-returning methods into a CollectionValueRange.)
Then the ValueSelector implementations will use that directly.
As for the suitability to optimize continuous variables:
JIT random selection will be blazing fast and be very memory-efficient.
If you have an NP-complete/NP-hard problem, then OptaPlanner will be a great match. If you have only continuous variables (and not a single discrete variable), then it's unlikely that your problem is NP-complete (unless your constraints counterprove that) and in that case you're better off with a custom, handmade, polynomial algorithm anyway (because it's not NP-complete, so there's an "easy" solution).

Cplex/OPL local search

I have a model implemented in OPL. I want to use this model to implement a local search in java. I want to initialize solutions with some heuristics and give these initial solutions to cplex find a better solution based on the model, but also I want to limit the search to a specific neighborhood. Any idea about how to do it?
Also, how can I limit the range of all variables? And what's the best: implement these heuristics and local search in own opl or in java or even C++?
Thanks in advance!
Just to add some related observations:
Re Ram's point 3: We have had a lot of success with approach b. In particular it is simple to add constraints to fix the some of the variables to values from a known solution, and then re-solve for the rest of the variables in the problem. More generally, you can add constraints to limit the values to be similar to a previous solution, like:
var >= previousValue - 1
var <= previousValue + 2
This is no use for binary variables of course, but for general integer or continuous variables can work well. This approach can be generalised for collections of variables:
sum(i in indexSet) var[i] >= (sum(i in indexSet) value[i])) - 2
sum(i in indexSet) var[i] <= (sum(i in indexSet) value[i])) + 2
This can work well for sets of binary variables. For an array of 100 binary variables of which maybe 10 had the value 1, we would be looking for a solution where at least 8 have the value 1, but not more than 12. Another variant is to limit something like the Hamming distance (assume that the vars are all binary here):
dvar int changed[indexSet] in 0..1;
forall(i in indexSet)
if (previousValue[i] <= 0.5)
changed[i] == (var[i] >= 0.5) // was zero before
else
changed[i] == (var[i] <= 0.5) // was one before
sum(i in indexSet) changed[i] <= 2;
Here we would be saying that out of an array of e.g. 100 binary variables, only a maximum of two would be allowed to have a different value from the previous solution.
Of course you can combine these ideas. For example, add simple constraints to fix a large part of the problem to previous values, while leaving some other variables to be re-solved, and then add constraints on some of the remaining free variables to limit the new solution to be near to the previous one. You will notice of course that these schemes get more complex to implement and maintain as we try to be more clever.
To make the local search work well you will need to think carefully about how you construct your local neighbourhoods - too small and there will be too little opportunity to make the improvements you seek, while if they are too large they take too long to solve, so you don't get to make so many improvement steps.
A related point is that each neighbourhood needs to be reasonably internally connected. We have done some experiments where we fixed the values of maybe 99% of the variables in a model and solved for the remaining 1%. When the 1% was clustered together in the model (e.g. all the allocation variables for a subset of resources) we got good results, while in comparison we got nowhere by just choosing 1% of the variables at random from anywhere in the model.
An often overlooked idea is to invert these same limits on the model, as a way of forcing some changes into the solution to achieve a degree of diversification. So you could add a constraint to force a specific value to be different from a previous solution, or ensure that at least two out of an array of 100 binary variables have a different value from the previous solution. We have used this approach to get a sort-of tabu search with a hybrid matheuristic model.
Finally, we have mainly done this in C++ and C#, but it would work perfectly well from Java. Not tried it much from OPL, but it should be fine too. The key for us was being able to traverse the problem structure and use problem knowledge to choose the sets of variables we freeze or relax - we just found that easier and faster to code in a language like C#, but then the modelling stuff is more difficult to write and maintain. We are maybe a bit "old-school" and like to have detailed fine-grained control of what we are doing, and find we need to create many more arrays and index sets in OPL to achieve what we want, while we can achieve the same effect with more intelligent loops etc without creating so many data structures in a language like C#.
Those are several questions. So here are some pointers and suggestions:
In Cplex, you give your model an initial solution with the use of IloOplCplexVectors()
Here's a good example in IBM's documentation of how to alter CPLEX's solution.
Within OPL, you can do the same. You basically set a series of values for your variables, and hand those over to CPLEX. (See this example.)
Limiting the search to a specific neighborhood: There is no easy way to respond without knowing the details. But there are two ways that people do this:
a. change the objective to favor that 'neighborhood' and make other areas unattractive.
b. Add constraints that weed out other neighborhoods from the search space.
Regarding limiting the range of variables in OPL, you can do it directly:
dvar int supply in minQty..maxQty;
Or for a whole array of decision variables, you can do something along the lines of:
range CreditsAllowed = 3..12;
dvar int credits[student] in CreditsAllowed;
Hope this helps you move forward.

Complex derived attributes in Django models

What I want to do is implement submission scoring for a site with users voting on the content, much like in e.g. reddit (see the 'hot' function in http://code.reddit.com/browser/sql/functions.sql). Edit: Ultimately I want to be able to retrieve an arbitrarily filtered list of arbitrary length of submissions ranked according to their score.
My submission model currently keeps track of up and down vote totals. Currently, when a user votes I create and save a related Vote object and then use F() expressions to update the Submission object's voting totals. The problem is that I want to update the score for the submission at the same time, but F() expressions are limited to only simple operations (it's missing support for log(), date_part(), sign() etc.)
From my limited experience with Django I can see 5 options here:
extend F() somehow (haven't looked at the code yet) to support the missing SQL functions; this is my preferred option and seems to fit within the Django framework the best
define a scoring function (much like reddit's 'hot' function) in my database, and have Django use the value of that function for the value of the score field; as far as I can tell, #2 is not possible
wrap my two step voting process in a suitably isolated transaction so that I can calculate the voting totals in Python and then update the Submission's voting totals without fear that another vote against the submission could be added/changed in the meantime; I'm hesitant to take this route because it seems overly complex - what is a "suitably isolated transaction" in this case anyway?
use raw SQL; I would prefer to avoid this entirely -- what's the point of an ORM if I have to revert to SQL for such a common use case as this! (Note that this coming from somebody who loves sprocs, but is using Django for ease of development.)
(edit: added this after further discussion) compute the score using an extra select parameter containing a call to my function; this would work but impose unnecessary load on the DB (would be forced to calculate the score for every submission ever made every time the query ran; caching could help here, but it still seems like a bit of lame workaround)
Before I embark on this mission to extend F() (which I'm not sure is even possible), am I about to reinvent the wheel? Is there a more standard way to do this? It seems like such a common use case and yet in an hour of searching I have yet to find a common solution...
EDIT: There is another option: set the default value of the field in the database script to be an expression containing my function. This is not as flexible as #1, but probably the quickest and cleanest approach to solving the problem (although my initial investigation into extending F() looks promising).
Why can't you just denormalize the score and reconstruct it with the Vote objects every once and a while?
If you can't do that, it is very easy to make a 'property' function that acts as an object attribute for scoring.
#property
def score(self):
... calculate score from Vote objects ...
return score
I've never used F() on a property like this, but it's Python, so I bet it works.
If you are using django-voting (which I recommend), you can put #3 in the manager's record_vote function since that's how all vote transactions take place.