Default hyperparameter values in Neuraxle - neuraxle

Implementing pipeline components in Neuraxle, i wonder if it is possible and/or advisable to have default values for hyperparameters. Looking at code and documentation, my guess is that it is not supported, but i cannot find any mention of it in the docs. I notice here that hyperparameters are set before the setup phase, which makes me suspect setting defaults in code is not "possible".
It would be nice with default values, as it would allow much more hyperparameter options without explicitly defining them when training. It would also allow adding hyperparameters without breaking existing training code. A downside with defaults is increased complexity and perhaps issues with reproducibility if defaults change.
Any insight here would be appreciated.

If I understand your question well, it is entirely possible to have default value for a hyperparameter. You can do so using by using your step class constructor function. To do so, your parameter simply needs to have a corresponding FixedHyperparameter instance entry in the hyperparameter space.
e.g.
class MyStep(BaseStep):
def __init__(self, default_hyperparam_value):
BaseStep.__init__(self, hyperparams = {"my_hyperparam_name":default_hyperparam_value},
hyperparams_space={"my_hyperparam_name":FixedHyperparameter(default_hyperparam_value)})
Alternatively, you could exclude it entirely from the hyperparameter dictionaries and simply set it as a step attribute. They are, of course, many other way of achieving similar behaviour.
Let me know if I've misunderstood your question, I'll be glad to provide any further needed insight :)

Related

Are the optimization and parameters variation experiments in AnyLogic limited to 255 parameters?

In AnyLogic
You may vary only the parameters of the top-level agent.
(https://anylogic.help/anylogic/experiments/optimization.html#:~:text=Optimization%20experiment&text=If%20you%20need%20to%20run,the%20optimization%20capability%20of%20AnyLogic.)
(https://anylogic.help/anylogic/experiments/parameter-variation.html)
The top-level agent can not have more than 255 parameters.
The number of method parameters is limited to 255 (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.11)
The question here is not why 255 parameters are required for an optimization problem or if simulation-based optimization is the best way to handle a problem with more than 255 parameters (Decision Variables). The question is about ways to overcome this limitation.
I thought
The best option would be to follow Java best practices and have a Java
class (which has almost no limitations) (Maximum number of parameters in a Agent-Type)
However,
AnyLogic provides Agents, that are basically predefined classes
with several built-in functionalities, (https://noorjax.com/2018/11/12/an-example-on-the-use-of-classes-in-anylogic/)
Therefore, it seems that using Java Class would not help. I'm I missing a Java trick here? would it be possible in any way to perform an optimization experiment in AnyLogic with more than 255 parameters?
Sorry if this question is not under the scope of SOF. I'm still trying to distinguish between what can be asked and what not.
There are several ways to avoid the limit:
Structure your parameters in 'raw' Java classes so, for example, your Main agent may have 3 parameters of type CoreParameters, ClimateVariables and EconomicVariables. (Just look at any Java tutorials for how to define simple classes which are effectively just data structures.) But now your experiments have to create appropriate instances of those classes as parameter values (and this makes things like a Parameter Variation experiment harder to define; you'd typically use a Custom Experiment instead since then you have full control of how you setup the parameters for each run). For optimisation, you'd also have to use a Custom Experiment to define an optimisation where, for example, you might have 500 optimisation variables but your code to setup the model from them sets up your 3 model parameter class instances with those 500 values. The Custom Experiment help page has examples of this.
Use external data (e.g., Excel loaded into the AnyLogic DB) to provide some/all of the 'parameters' for your model. The issue here is that AnyLogic's multi-run experiments (including optimisation-based ones) expect to be varying top-level agent parameters. But often
A load of 'parameters' will stay fixed so those can be supplied from this external data.
'Parameters' may be varied in related sets, so this can boil down to a single parameter which provides, say, the filename to load the relevant external data from (and you vary that across a pre-prepared set). But this requires writing some specific Java to allow you to import external data from a dynamically-defined filename into the AnyLogic DB, or the equivalent but reading directly from the Excel file. (But this is simple boilerplate code you can copy and reuse once you've 'learnt' it.)
P.S. I'd reiterate though that any optimisation involving 255+ parameters is probably pointless, with little likelihood of finding a near-optimum (and if you have a model with that many parameters --- given that you might genuinely want to vary all of them independently --- you have a model design problem).
P.P.S. Your two quoted bits of text don't contradict each other. You can write raw Java classes in AnyLogic or use, say, an Agent which just contains a set of parameters as a 'data structure class'. Agents (together with everything else) are Java classes, but that's not relevant to your question.

Setting Integrality tolerance in CPLEX and forcing decision variables to take rounded values

I am currently trying to solve a linear program in CPLEX that has three decision variables, one which is binary and the other two are continuous.
The problem I have is that instead of giving results for the continuous variables like '10' or '0' it sets them to '9.99999' and '0.000001'.
So with a bit of googling I found out that there is a parameter in CPLEX called Integrality tolerance that helps achieving this goal. The problem is, nowhere have I found how I can actually set this parameter in OPL, but instead only with using different APIs. The thing is I'm only using CPLEX to solve my model.
Can anyone guide me on this?
have you tried in OPL
execute
{
cplex.epint=0.0001;
}
?
And in the IDE you can use

Storing feasible solutions in terms of original variables

I want to store a feasible solution from an event handler that catches the SCIP_EVENTTYPE_BESTSOLFOUND event, and later I would like to give this solution as an heuristic solution to another SCIP instance that is optimizing the same problem but with different parameter settings (this could be in a subsequent optimization or in parallel).
My problem is that the solution I get from using SCIPgetBestSol() will be in terms of the transformed problem, which can be different from the transformed problem in the second SCIP instance.
Would turning presolve off (using SCIPsetPresolving()) be enough for ensuring that SCIP is always referring to the original variables within callback functions?
Is there a particular way that you would recomend for doing this?
Thanks!
Make sure that your event handler can access the array of original variables (SCIPget(N)OrigVars() does the trick). You can always query solution values of original variables, even from transformed solutions, using SCIPgetSolVal(), and store the values in a solution created via SCIPcreateOrigSol().
In order to feed this solution into a different SCIP instance, you have to get the mapping between variables of the primary and secondary SCIP instance right. Create a new solution for the secondary SCIP instance, and set the solution value of a variable to the value of its (pre-)image variable in the primary SCIP.

Using dfs and calculate_feature_matrix?

You could use ft.dfs to get back feature definitions as input to ft.calculate_feature_matrix or you could just use ft.dfs to compute the feature matrix. Is there a recommended way of using ft.dfs and ft.calculate_feature_matrix for best practice?
If you're in a situation where you might use either, the answer is to use ft.dfs to create both features and a feature matrix. If you're starting with a blank slate, you'll want to be able to examine and use a feature matrix for data analysis and feature selection. For that purpose, you're better off doing both at once with ft.dfs.
There are times when calculate_feature_matrix is the tool to use as well, though you'll often be able to tell if you're in that situation. The main cases are:
You've loaded in features that were previously saved
You want to rebuild the same features on new data

Metrics & Object-oriented programming

I would like to know if somebody often uses metrics to validate its code/design.
As example, I think I will use:
number of lines per method (< 20)
number of variables per method (< 7)
number of paremeters per method (< 8)
number of methods per class (< 20)
number of field per class (< 20)
inheritance tree depth (< 6).
Lack of Cohesion in Methods
Most of these metrics are very simple.
What is your policy about this kind of mesure ? Do you use a tool to check their (e.g. NDepend) ?
Imposing numerical limits on those values (as you seem to imply with the numbers) is, in my opinion, not very good idea. The number of lines in a method could be very large if there is a significant switch statement, and yet the method is still simple and proper. The number of fields in a class can be appropriately very large if the fields are simple. And five levels of inheritance could be way too many, sometimes.
I think it is better to analyze the class cohesion (more is better) and coupling (less is better), but even then I am doubtful of the utility of such metrics. Experience is usually a better guide (though that is, admittedly, expensive).
A metric I didn't see in your list is McCabe's Cyclomatic Complexity. It measures the complexity of a given function, and has a correlation with bugginess. E.g. high complexity scores for a function indicate: 1) It is likely to be a buggy function and 2) It is likely to be hard to fix properly (e.g. fixes will introduce their own bugs).
Ultimately, metrics are best used at a gross level -- like control charts. You look for points above and below the control limits to identify likely special cases, then you look at the details. For example a function with a high cyclomatic complexity may cause you to look at it, only to discover that it is appropriate because it a dispatcher method with a number of cases.
management by metrics does not work for people or for code; no metrics or absolute values will always work. Please don't let a fascination with metrics distract from truly evaluating the quality of the code. Metrics may appear to tell you important things about the code, but the best they can do is hint at areas to investigate.
That is not to say that metrics are not useful. Metrics are most useful when they are changing, to look for areas that may be changing in unexpected ways. For example, if you suddenly go from 3 levels of inheritance to 15, or 4 parms per method to 12, dig in and figure out why.
example: a stored procedure to update a database table may have as many parameters as the table has columns; an object interface to this procedure may have the same, or it may have one if there is an object to represent the data entity. But the constructor for the data entity may have all of those parameters. So what would the metrics for this tell you? Not much! And if you have enough situations like this in the code base, the target averages will be blown out of the water.
So don't rely on metrics as absolute indicators of anything; there is no substitute for reading/reviewing the code.
Personally I think it's very difficult to adhere to these types of requirements (i.e. sometimes you just really need a method with more than 20 lines), but in the spirit of your question I'll mention some of the guidelines used in an essay called Object Calisthenics (part of the Thoughtworks Anthology if you're interested).
Levels of indentation per method (<2)
Number of 'dots' per line (<2)
Number of lines per class (<50)
Number of classes per package (<10)
Number of instance variances per class (<3)
He also advocates not using the 'else' keyword nor any getters or setters, but I think that's a bit overboard.
Hard numbers don't work for every solution. Some solutions are more complex than others. I would start with these as your guidelines and see where your project(s) end up.
But, regarding these number specifically, these numbers seem pretty high. I usually find in my particular coding style that I usually have:
no more than 3 parameters per method
signature about 5-10 lines per method
no more than 3 levels of inheritance
That isn't to say I never go over these generalities, but I usually think more about the code when I do because most of the time I can break things down.
As others have said, keeping to a strict standard is going to be tough. I think one of the most valuable uses of these metrics is to watch how they change as the application evolves. This helps to give you an idea how good a job you're doing on getting the necessary refactoring done as functionality is added, and helps prevent making a big mess :)
OO Metrics are a bit of a pet project for me (It was the subject of my master thesis). So yes I'm using these and I use a tool of my own.
For years the book "Object Oriented Software Metrics" by Mark Lorenz was the best resource for OO metrics. But recently I have seen more resources.
Unfortunately I have other deadlines so no time to work on the tool. But eventually I will be adding new metrics (and new language constructs).
Update
We are using the tool now to detect possible problems in the source. Several metrics we added (not all pure OO):
use of assert
use of magic constants
use of comments, in relation to the compelxity of methods
statement nesting level
class dependency
number of public fields in a class
relative number of overridden methods
use of goto statements
There are still more. We keep the ones that give a good image of the pain spots in the code. So we have direct feedback if these are corrected.