Does shadow variable help optaplanner to make a better initial solution? - optaplanner

I’m using Optaplanner to make a schedule and it works quite good.
After reading the documentation I have realised that I should use at least 1 (or more) shadow variables since my drool-file is calling methods that does a lot of calculations based on the value of the planningVariable.
I spent a couple of hours rewriting my code to have a shadow variable, but then I notice that the initial solution was really bad (compared to not having shadow variables) and I had to wait severals of minutes just to get an OK result. Is this normal? It did not look like the initial solution used the shadow variable at all.

The question is very generic, and so my answer will be, too.
Sometimes you can simplify the problem by introducing shadow variables or other forms of caching. If you find the right balance, you can indeed speed up the Drools calculation and - as a result - get to the same solution in a shorter amount of time. And therefore, reach better solutions in the same amount of time.
That said, introducing shadow variables shouldn't really change your scores - only how quickly they're calculated. If you're seeing different scores for the same #PlanningSolution, you have in fact changed your problem and the relative performance is no longer comparable.
Also, you may want to check out environment modes to make sure you haven't inadvertently introduced score corruptions into your problem.

Related

Is it better to use a boolean variable to replace an if condition for readability or not?

I am in the second year of my bachelor study in information technology. Last year in one of my courses they taught me to write clean code so other programmers have an easier time working with your code. I learned a lot about writing clean code from a video ("clean code") on pluralsight (paid website for learning which my school uses). There was an example in there about assigning if conditions to boolean variables and using them to enhance readability. In my course today my teacher told me it's very bad code because it decreases performance (in bigger programs) due to increased tests being executed. I was wondering now whether I should continue using boolean variables for readability or not use them for performance. I will illustrate in an example (I am using python code for this example):
example boolean variable
Let's say we need to check whether somebody is legal to drink alcohol we get the persons age and we know the legal drinking age is 21.
is_old_enough = persons_age >= legal_drinking_age
if is_old_enough:
do something
My teacher told me today that this would be very bad for performance since 2 tests are performed first persons_age >= legal_drinking_age is tested and secondly in the if another test occurs whether the person is_old_enough.
My teacher told me that I should just put the condition in the if, but in the video they said that code should be read like natural language to make it clear for other programmers. I was wondering now which would be the better coding practice.
example condition in if:
if persons_age >= legal_drinking_age:
do something
In this example only 1 test is tested whether persons_age >= legal_drinking_age. According to my teacher this is better code.
Thank you in advance!
yours faithfully
Jonas
I was wondering now which would be the better coding practice.
The real safe answer is : Depends..
I hate to use this answer, but you won't be asking unless you have faithful doubt. (:
IMHO:
If the code will be used for long-term use, where maintainability is important, then a clearly readable code is preferred.
If the program speed performance crucial, then any code operation that use less resource (smaller dataSize/dataType /less loop needed to achieve the same thing/ optimized task sequencing/maximize cpu task per clock cycle/ reduced data re-loading cycle) is better. (example keyword : space-for-time code)
If the program minimizing memory usage is crucial, then any code operation that use less storage and memory resource to complete its operation (which may take more cpu cycle/loop for the same task) is better. (example: small devices that have limited data storage/RAM)
If you are in a race, then you may what to code as short as possible, (even if it may take a slightly longer cpu time later). example : Hackathon
If you are programming to teach a team of student/friend something.. Then readable code + a lot of comment is definitely preferred .
If it is me.. I'll stick to anything closest to assembly language as possible (as much control on the bit manipulation) for backend development. and anything closest to mathematica-like code (less code, max output, don't really care how much cpu/memory resource is needed) for frontend development. ( :
So.. If it is you.. you may have your own requirement/preference.. from the user/outsiders/customers point of view.. it is just a working/notWorking program. YOur definition of good program may defer from others.. but this shouldn't stop us to be flexible in the coding style/method.
Happy exploring. Hope it helps.. in any way possible.
Performance
Performance is one of the least interesting concerns for this question, and I say this as one working in very performance-critical areas like image processing and raytracing who believes in effective micro-optimizations (but my ideas of effective micro-optimization would be things like improving memory access patterns and memory layouts for cache efficiency, not eliminating temporary variables out of fear that your compiler or interpreter might allocate additional registers and/or utilize additional instructions).
The reason it's not so interesting is, because, as pointed out in the comments, any decent optimizing compiler is going to treat those two you wrote as equivalent by the time it finishes optimizing the intermediate representation and generates the final results of the instruction selection/register allocation to produce the final output (machine code). And if you aren't using a decent optimizing compiler, then this sort of microscopic efficiency is probably the last thing you should be worrying about either way.
Variable Scopes
With performance aside, the only concern I'd have with this convention, and I think it's generally a good one to apply liberally, is for languages that don't have a concept of a named constant to distinguish it from a variable.
In those cases, the more variables you introduce to a meaty function, the more intellectual overhead it can have as the number of variables with a relatively wide scope increases, and that can translate to practical burdens in maintenance and debugging in extreme cases. If you imagine a case like this:
some_variable = ...
...
some_other_variable = ...
...
yet_another_variable = ...
(300 lines more code to the function)
... in some function, and you're trying to debug it, then those variables combined with the monstrous size of the function starts to multiply the difficulty of trying to figure out what went wrong. That's a practical concern I've encountered when debugging codebases spanning millions of lines of code written by all sorts of people (including those no longer on the team) where it's not so fun to look at the locals watch window in a debugger and see two pages worth of variables in some monstrous function that appears to be doing something incorrectly (or in one of the functions it calls).
But that's only an issue when it's combined with questionable programming practices like writing functions that span hundreds or thousands of lines of code. In those cases it will often improve everything just focusing on making reasonable-sized functions that perform one clear logical operation and don't have more than one side effect (or none ideally if the function can be programmed as a pure function). If you design your functions reasonably then I wouldn't worry about this at all and favor whatever is readable and easiest to comprehend at a glance and maybe even what is most writable and "pliable" (to make changes to the function easier if you anticipate a future need).
A Pragmatic View on Variable Scopes
So I think a lot of programming concepts can be understood to some degree by just understanding the need to narrow variable scopes. People say avoid global variables like the plague. We can go into issues with how that shared state can interfere with multithreading and how it makes programs difficult to change and debug, but you can understand a lot of the problems just through the desire to narrow variable scopes. If you have a codebase which spans a hundred thousand lines of code, then a global variable is going to have the scope of a hundred thousands of lines of code for both access and modification, and crudely speaking a hundred thousand ways to go wrong.
At the same time that pragmatic sort of view will find it pointless to make a one-shot program which only spans 100 lines of code with no future need for extension avoid global variables like the plague, since a global here is only going to have 100 lines worth of scope, so to speak. Meanwhile even someone who avoids those like the plague in all contexts might still write a class with member variables (including some superfluous ones for "convenience") whose implementation spans 8,000 lines of code, at which point those variables have a much wider scope than even the global variable in the former example, and this realization could drive someone to design smaller classes and/or reduce the number of superfluous member variables to include as part of the state management for the class (which can also translate to simplified multithreading and all the similar types of benefits of avoiding global variables in some non-trivial codebase).
And finally it'll tend to tempt you to write smaller functions as well, since a variable towards the top of some function spanning 500 lines of code is going to also have a fairly wide scope. So anyway, my only concern when you do this is to not let the scope of those temporary, local variables get too wide. And if they do, then the general answer is not necessarily to avoid those variables but to narrow their scope.

Optaplanner select only entities in conflict

In the change and swap move selector, I would like to only consider moves that involve entities in conflict as they are more likely to improve the heuristic score.
How should this be done? What classes and interfaces do I have to reuse/extend? I looked at ScoreDirector and PhaseLifecycleListener.
A MoveFilter might do that (if it's not in phase or solver cached as it changes ever step). See the course scheduling example and docs for how to use a filter.
I wouldn't recommend it though, as you still want to move non-conflicting entities at times. You might just want to focus more on those conflicting lectures. So I would keep a vanilla move selector in the mix.
The move filter isn't perfect either - the Guided Local Search feature (not yet available) is a better way to deal with this.
However, given the other question about the model and similar cases I 've seen, I 'd say moves are not your problem. A better model will make all these kinds of move tweaking obsolete.

How bad is it to have tonnes of unused variables

I am using VBA and I have downloaded a tool called MZ-Tools, it helps me find all the unused variables in all the code, now I have almost 300 objects which roughly 500 lines in each.
Overall it has found almost 500 unused variables/procedures
Would removing these variables speed up the program a lot or would it just be a waste of time to clean up code which doesn't have much effect on the program?
Short answer: It is never a waste of time to clean up code. You or someone else will be so happy when you have to revise it a year later or so.
Longer answer: The application probably wont speed up a lot. At least you probably will not feel a change. This depends on how heavy it already is. Also it depends on the kind of objects that are created, how 'big' and complex they are. If there is some of those objects running methods every couple of seconds for example in a loop, it will affect the performance of the application considerably.
More: As result of cleaning up your application you will get a better performance. If it is perceptible or not, depends on a variety of stuff. The bigger problem is that you will not know if the objects used wont cause errors in the future. Maybe some of them will get discontinued at some time, or they could cause other kind of unexpected exceptions. This is, I think the biggest threat.
Have fun going trough the code sooner or later!
Based on your question and comments, my impression is your focus is exclusively on execution speed. If that's all you and the team care about for that project, don't invest any time cleaning up those items because I doubt you will notice any runtime performance improvement.
However, I suggest you look beyond only execution speed. How challenging is this project to debug/troubleshoot for the current maintainer(s)? How difficult to add new features, if needed? How about if someone new has to take over responsibility? How much easier would those tasks be without the distractions of unused variables and procedures?
A related consideration is just how much time are we talking about for that cleanup effort? I wonder whether someone has over-estimated the workload.
Make a copy of the db file. From the Mz-Tools code review panel, choose "export" and save the analysis report as a text file. Print the text file. Then move though that printed list, fix each item, and cross it off the list. If you're really slow, you may only average 2 per minute. And for 500 items, that means 250 minutes. But realistically, the task should take less than 4 hours. Running the Mz-tools code review again will show you if you missed anything. And compiling will tell you whether you removed something by mistake.

Programming Principles: Assignement vs Conditions

I did some research but couldn't find the answer I was looking for so I figured I'd address this issue here. I guess it's better to demonstrate it using examples, so consider the following snippets of code:
int delta = 0;
if (some_condition)
delta = 42;
x1 = regular_value1 + delta;
x2 = regular_value2 + delta;
// ...
// where delta is used a lot of times
// basically - if (some_condition == true) => add delta to all variables
// if FALSE - add 0 (thus, effectively not changing anything)
versus
int delta = 42;
if (some_condition)
{
x1 = regular_value1 + delta;
x2 = regular_value2 + delta;
// ...
}
else
{
x1 = regular_value1;
x2 = regular_value2;
// ...
}
For example, a very simple real-world scenario would be: Let's say I'm creating a windows form that might contain an image on the left and might not. If there's no image - create all the rest of form controls on the left, and if there's an image, shift all other controls to the right of the image (add delta to every control's X location).
I'm programming a C# XNA game (therefore performance is somewhat relevant, but OOP principles shouldn't be omitted by any means), thus my question is - which code would run faster provided that "some_condition" would be TRUE 50% of time? As well as, which code block is easier to maintain/read?
I'm aware that this isn't a huge issue, but I'm just trying to get into a habit of writing "the best" code possible. Any input and even personal experiences would be appreciated.
Thanks.
The latter might be slightly faster, or they may both be optimized to the same thing. That doesn't matter, you can always change it later if you find that it becomes an issue (which this particular case certainly won't - this is just general advice). But the former, I find at least, is easier to read and maintain. If you wanted to change regular_value1 or regular_value2, you only have to change it in one place, not two. Go for that.
This is certainly a trivial case, and both might indeed optomize to the same code. I'd suggest using whichever is easiest to understand--a decision that ought to be based on more of the program than you've shown here.
But the second solution is apt to be faster, particularly if "delta" is turned into a constant. If "some_condition" is false, no adds need to be done, and an optmizer might be able to come up with some way to speed the actual assignments, so I see a performance edge there. I think it's best to write the fastest code you can if you don't hurt maintainability. Even with something with a much bigger performance difference than this, you are realistically never going to come back later looking for ways to speed it up. You may as well code for performance now and forget about it. Keep this up over time and your code will run faster at no cost to you.
Maybe someday profiling will point to this code as being the slowest point in your program. But most likely you--and others--will just accept the fact that the code runs at a certain speed and use it accordingly. The fact that it could be speeded up and used in new places--say between keystrokes--without annoying the user will never be thought of.
The difference in performance here is minor and the code will probably never be used in speed-critical conditions anyway, but I do think coding for speed is a good idea. It doesn't really take any time once you get into the habit, and you know what your code can do if tweaked because its already tweaked and doing it.
(It's been my experience that fast code and readable code are much the same thing. If a human can understand the code easily then so can the optimizer, and these days the optimizer is king. On those rare occasions where I have to choose, I do go for clarity rather than speed unless I know everything depends on the code being fast. Then I document like crazy to atone.)
This is a bit of a subjective question, but one the things that affects readability (and also importantly maintainability) is how much code is repeated. In this case the first option results in less code to maintain and read through (less redundancy) and would be my preferred method
As pointed out by #minitech, the second approach can and will certainly cause problems in maintainability (points to ponder: how long is the life cycle of the application? Is the code you are writing going to be reused or consumed by others?) and readability. Given, that you have already considered these factors, I would recommend that you decide and set some performance numbers that your application should conform to.
Once the performance requirements have been clearly established, then you can decide on which approach to follow. OOP principles and readability are every programmers dream, but remember that once you ship the application to the customer, it does not matter which approach you had followed, the only thing that matters is your application runs as expected (from the point of view of both functionality and performance).
You will need to gauge with the stakeholders on the implications of writing the 'best' vs actual performance impact that the 'best' code might cause.

How often are values duplicated on the stack?

When you have a function that accepts an array as an argument and calls another function with that array and that calls another function with it and so forth the stack will contain many copies of the pointer to that array. I just thought of an interesting way to alleviate this problem but I'm wondering whether or not it is worth implementing.
Does anyone have any idea how often stacks contain duplicate pointers in practice?
EDIT
Just to clarify, I am not optimizing a given program but, rather, am considering writing a new kind of optimization pass for my VM. My benchmarks have indicated that my current solution causes up to 70% of the total running time to be spent in stack manipulations. The optimization pass I am thinking of would generate code at compile time that would perform the same actions but pointers would (potentially) be duplicated on the stack less often. I am interested in any prior studies that have measured the number of duplicates on the stack because this would help me to quantify my optimization's potential. For example, if it is known that real programs do not push pointers already on the stack in practice then my optimization is worthless.
Moreover, these stack manipulations are due to the code generated by my VM making sure locally-held pointers are visible to the garbage collector and not due only to function parameters as both answerers have currently assumed. And they are actually operations on a shadow stack rather than the main stack.
First of all, the answer will depend on your application.
Secondly, even with high duplication, I doubt there is much sense in implementing the mechanism you describe, or even that it is possible in a general case. If you call a method and you pass it parameters, you must do it either one way or another.
There may be advantages to doing it in some specific way - for example there are several function calling conventions and many C/C++ compilers (e.g. gcc) let you choose between passing parameters on the stack or via registers. In certain cases, the latter may be faster - you can try and benchmark if it helps your application.
But in a general case, the cost of detecting duplicated values on the stack and "reusing" them would probably much exceed any gains from having a smaller stack. The code for pushing and popping values is really simple (just a few CPU instructions in an optimized case), code for finding and reusing duplicates - hardly so. You would also have to somehow store the information about which values are already on the stack and how to find them - a nontrivial data structure. Except for some really weird cases, I don't think this would be smaller than the actual copied data itself.
What you could do, would be to rewrite your algorithm in such way that some function calls are eliminated. For example, if your function's result only depends on the input arguments, you could somehow cache or memoize the results, thus avoiding repeated calls with the same values. This may indeed bring some gains, though it's usually a memory vs CPU time tradeoff. Getting an advantage both in memory and in CPU time is rarely possible. Also, rewriting your algorithm is not really "avoiding duplication of data on the stack".
Any way, for the original question, I think the idea is not viable and you should look at optimizations elsewhere.
PS: You use case may somewhat resemble tail-call optimization, so perhaps that's a direction worth looking at - but if you implement it yourself, I would also consider this to fall into the "change your algorithm" category. Maybe changing from a recursive algorithm to an iterative one could help also.
Can I suggest getting some exposure to actual performance tuning?
(Here's my canonical example.)
Between the time a program starts and the time it ends, of the cycles it uses, it obviously uses 100% of those cycles.
If it goes in and out of functions, and passes pointers to an array, but does nothing else, then there's no surprise that a high percent of time goes into function entry and exit, and passing arguments.
If a program P is written to do task T, there are a multitude of other programs P' which could also do task T. Some of them take fewer cycles than all the others, and those are the optimal ones.
The way the optimal ones differ from the non-optimal ones is that the non-optimal ones are doing things that can be done without.
So, to optimize any program, find out what cycles are being spent that don't have to be, and get rid of those activities. That link shows in great detail how I do it.
Trying to pass fewer arguments to functions might or might not be necessary, depending on what your diagnostics tell you.