What do the warnings written upon SARIMAX summary mean? Does it heavily affect my forecasting performance? and How to resolve it? - data-science

These are the warnings. I can provide additional info of the SARIMAX summary should it be needed.
Warnings:
Covariance matrix calculated using the outer product of gradients (complex-step).
Covariance matrix is singular or near-singular, with condition number 3.39e+17. Standard errors may be unstable.

The first one is actually more like a "note" than a "warning". It's just letting you know how the covariance matrix was computed.
The second one is letting you know that parameter estimates may be unstable. Sometimes this is an indication of overfitting, but it can also arise from other things. This may indicate that you should try a simpler model (which then might forecast better), but it does not mean that you have to do that.

Related

How do I handle variability of output in Anylogic?

I have been working on a simulation model for battery swapping in Anylogic. So far I have developed the simulation model, optimization experiment and parameters variation experiment.
There are no errors in the model but the output values are unsatisfactory. Small changes such as changing the step size of the decision variables results in a drastic change in the best value obtained after every experiment. Though the objective does not change much but I am concerned about the other variables that are changing with each run. Even with multiple optimization runs it is difficult to come to a conclusion.
For reference I am posting an output of parameters variation experiment here. I ran the experiment with an optimized value but I was getting feasible results (percentile > 95%) far off the expected input values. Although, the overall result is correct (decreasing percentile with increasing charging time) but it is difficult to understand the variability.
Can anyone help?enter image description here
When building a model, this is a common problem you will have when looking at high level overall outputs. You could have a model bug, but it is just as likely (if not more likely) that there is some dynamic to your system that was not clear in simple Excel spreadsheets or mental models. The DES may be telling us something truly interesting about the system behavior, but without additional outputs, there is no way to understand what that is.
A few suggestions:
Run this as a simple single scenario, where you manually update inputs. When you run this with the low range of input values and then the high range of input values, what do you see on the animation or additional outputs that is different than you expected or could explain the overall output trend? Try running several intermediate points.
Add additional output metrics. If you look at queue sizes, resource utilizations, turn-around-times, etc; do you see anything at that level that is different than expected?
Add a "replication" log. When you run a set of inputs for multiple scenarios, does any single replication stand out as an outlier? If so, re-run the scenario with that set of inputs and that random seed.
There is no substitute for understanding underlying system behavior, and without understanding those dynamics, looking at overall correlation with optimization or parameter variation experiments will often lead companies to make the wrong policies decisions.

Algebraic/implicit loops handling by Gekko

I have got a specific question with regards to algebraic / implicit loops handling by Gekko.
I will give examples in the field of Chemical Engineering, as this is how I found the project and its other libraries.
For example, when it comes to multicomponent chemical equilibrium calculations, it is not possible to explicitly work out the equations, because the concentration of one specie may be present in many different equations.
I have been using other paid software in the past and it would automatically propose a resolution procedure based on how the system is solvable (by analyzing dependency and creating automatic algebraic loops).
My question would be:
Does Gekko do that automatically?
It is a little bit tricky because sometimes one needs to add tear variables and iterate from a good starting value.
I know this message may be a little bit abstract, but I am trying to decide which software to use for my work and this is a pragmatic bottle neck that I have happened to find.
Thanks in advance for your valuable insight.
Python Gekko uses a simultaneous solution strategy so that all units are solved together instead of sequentially. Therefore, tear variables are not needed but large flowsheet problems with recycle can be difficult to converge to a feasible solution. Below are three methods that are in Python Gekko to assist in efficient solutions and initialization.
Method 1: Intermediate Variables
Intermediate variables are useful to decrease the complexity of the model. In many models, the temporary variables outnumber the regular variables. This model reduction often aides the solver in finding a solution by reducing the problem size. Intermediate variables are declared with m.Intermediates() in Python Gekko. The intermediate variables may be defined in one section or in multiple declarations throughout the model. Intermediate variables are parsed sequentially, from top to bottom. To avoid inadvertent overwrites, intermediate variable can be defined once. In the case of intermediate variables, the order of declaration is critical. If an intermediate is used before the definition, an error reports that there is an uninitialized value. Here is additional information on Intermediates with an example problem.
Method 2: Lower Block Triangular Decomposition
For large problems that have trouble with initialization, there is a mode that is activated with the option m.options.COLDSTART=2. This mode performs a lower block triangular decomposition to automatically identify independent blocks that are then solved independently and sequentially.
This decomposition method for initialization is discussed in the PhD dissertation (chapter 2) of Mostafa Safdarnejad or also in Safdarnejad, S.M., Hedengren, J.D., Lewis, N.R., Haseltine, E., Initialization Strategies for Optimization of Dynamic Systems, Computers and Chemical Engineering, 2015, Vol. 78, pp. 39-50, DOI: 10.1016/j.compchemeng.2015.04.016.
Method 3: Automatic Model Reduction
Model reduction requires more pre-processing time but can help to significantly reduce the solver time. There is additional documentation on m.options.REDUCE.
Overall Strategy for Initialization
The overall strategy that we use for initializing hard problems, such as flowsheets with recycle, is shown in this flowchart.
Sometimes it does mean breaking recycles to get an initialized solution. Other times, the initialization strategies detailed above work well and no model rearrangement is necessary. The advantage of working with a simultaneous solution strategy is degree of freedom swapping such as downstream variables can be fixed and upstream variables calculated to meet that value.

Higher precision eigenvalues with numpy

I'm currently computing several eigenvalues of different matrices and trying to find their closed-form solutions. The matrix is hermitian/self-adjoint, and tri-diagonal. Additionally, every diagonal element is positive and every off-diagonal is negative.
Due to what I suspect is an issue with trying to algebraically solve the quintic, sympy cannot solve the eigenvalues of my 14x14 matrix.
Numpy has given me great results that I'm sometimes able to use via wolfram-alpha, but other times the precision is lacking to be able to determine which of several candidates the closed form solution could take. As a result, I'm wishing to increase the precision with which numpy.linalg.eigenvaluesh outputs eigenvalues. Any help would be greatly appreciated!
Eigenvalue problems of size>=5 have no general closed form solution (for the reason you mention), and so all general eigensolvers are iterative. As a result, there are a few sources of error.
First, there are the errors with the convergence of the algorithm itself. I.e. even if all your computations were exact, you would need to run a certain number of iterations to get a certain accuracy.
Second, finite precision limits the overall accuracy.
Numerical analysts study how accurate a solution you can get for a given algorithm and precision and there are results on this.
As for your specific problem, if you are not getting enough accuracy there are a few things you can try to do.
The first, is make sure you are using the best solvers for your method. I.e. since your matrix is symmetric and tridiagonal, make sure you are using solvers for this type (as suggested by norok2).
If that still doesn't give you enough accuracy, you can try to increase the precision.
However, the main issue with doing this in numpy is that the LAPACK functions under the hood are compiled for float64.
Thus, even if the numpy function allows inputs of higher precision (float128), it will round them before calling the LAPACK functions.
It might be possible to recompile those functions for higher precision, but that may not be worth the effort for your particular problem.
(As a side note, I'm not very familiar with scipy, so it may be the case that they have eigensolvers written in python which support all different types, but you need to be careful that they are actually doing every step in the higher precision and not silently rounding to float64 somewhere.)
For your problem, I would suggest using the package mpmath, which supports arbitrary precision linear algebra.
It is a bit slower since everything is done in software, but for 14x14 matrices it should still be pretty quick.

Encoding invariance for deep neural network

I have a set of data, 2D matrix (like Grey pictures).
And use CNN for classifier.
Would like to know if there is any study/experience on the accuracy impact
if we change the encoding from traditionnal encoding.
I suppose yes, question is rather which transformation of the encoding make the accuracy invariant, which one deteriorates....
To clarify, this concerns mainly the quantization process of the raw data into input data.
EDIT:
Quantize the raw data into input data is already a pre-processing of the data, adding or removing some features (even minor). It seems not very clear the impact in term of accuracy on this quantization process on real dnn computation.
Maybe, some research available.
I'm not aware of any research specifically dealing with quantization of input data, but you may want to check out some related work on quantization of CNN parameters: http://arxiv.org/pdf/1512.06473v2.pdf. Depending on what your end goal is, the "Q-CNN" approach may be useful for you.
My own experience with using various quantizations of the input data for CNNs has been that there's a heavy dependency between the degree of quantization and the model itself. For example, I've played around with using various interpolation methods to reduce image sizes and reducing the color palette size, and in the end, I discovered that each variant required a different tuning of hyper-parameters to achieve optimal results. Generally, I found that minor quantization of data had a negligible impact, but there was a knee in the curve where throwing away additional information dramatically impacted the achievable accuracy. Unfortunately, I'm not aware of any way to determine what degree of quantization will be optimal without experimentation, and even deciding what's optimal involves a trade-off between efficiency and accuracy which doesn't necessarily have a one-size-fits-all answer.
On a theoretical note, keep in mind that CNNs need to be able to find useful, spatially-local features, so it's probably reasonable to assume that any encoding that disrupts the basic "structure" of the input would have a significantly detrimental effect on the accuracy achievable.
In usual practice -- a discrete classification task in classic implementation -- it will have no effect. However, the critical point is in the initial computations for back-propagation. The classic definition depends only on strict equality of the predicted and "base truth" classes: a simple right/wrong evaluation. Changing the class coding has no effect on whether or not a prediction is equal to the training class.
However, this function can be altered. If you change the code to have something other than a right/wrong scoring, something that depends on the encoding choice, then encoding changes can most definitely have an effect. For instance, if you're rating movies on a 1-5 scale, you likely want 1 vs 5 to contribute a higher loss than 4 vs 5.
Does this reasonably deal with your concerns?
I see now. My answer above is useful ... but not for what you're asking. I had my eye on the classification encoding; you're wondering about the input.
Please note that asking for off-site resources is a classic off-topic question category. I am unaware of any such research -- for what little that is worth.
Obviously, there should be some effect, as you're altering the input data. The effect would be dependent on the particular quantization transformation, as well as the individual application.
I do have some limited-scope observations from general big-data analytics.
In our typical environment, where the data were scattered with some inherent organization within their natural space (F dimensions, where F is the number of features), we often use two simple quantization steps: (1) Scale all feature values to a convenient integer range, such as 0-100; (2) Identify natural micro-clusters, and represent all clustered values (typically no more than 1% of the input) by the cluster's centroid.
This speeds up analytic processing somewhat. Given the fine-grained clustering, it has little effect on the classification output. In fact, it sometimes improves the accuracy minutely, as the clustering provides wider gaps among the data points.
Take with a grain of salt, as this is not the main thrust of our efforts.

Can variance be replaced by absolute value in this objective function?

Initially I modeled my objective function as follows:
argmin var(f(x),g(x))+var(c(x),d(x))
where f,g,c,d are linear functions
in order to be able to use linear solvers I modeled the problem as follows
argmin abs(f(x),g(x))+abs(c(x),d(x))
is it correct to change variance to absolute value in this context, I'm pretty sure they imply the same meaning as having the least difference between two functions
You haven't given enough context to answer the question. Even though your question doesn't seem to be about regression, in many ways it is similar to the question of choosing between least squares and least absolute deviations approaches to regression. If that term in your objective function is in any sense an error term then the most appropriate way to model the error depends on the nature of the error distribution. Least squares is better if there is normally distributed noise. Least absolute deviations is better in the nonparametric setting and is less sensitive to outliers. If the problem has nothing to do with probability at all then other criteria need to be brought in to decide between the two options.
Having said all this, the two ways of measuring distance are broadly similar. One will be fairly small if and only if the other is -- though they won't be equally small. If they are similar enough for your purposes then the fact that absolute values can be linearized could be a good motivation to use it. On the other hand -- if the variance-based one is really a better expression of what you are interested in then the fact that you can't use LP isn't sufficient justification to adopt absolute values. After all -- quadratic programming is not all that much harder than LP, at least below a certain scale.
To sum up -- they don't imply the same meaning, but they do imply similar meanings; and, whether or not they are similar enough depends upon your purposes.