Using an ARX model in GAMS - gams-math

I have created an Autoregressive model with exogenous input (ARX model) for my thermal model which I want to use to calculate Room temperature for an optimized Air conditioner power. The ARX requires the previous 6 values of input (and the output) variables.
However, I have not found a suitable way to successfully write an expression in GAMS that includes the current values and the previous values of the same variables.
I will appreciate it if anyone can point me to some websites, materials, etc that address such an issue.
Best regards.

Related

Predict a nonlinear array based on 2 features with scalar values using XGBoost or equivalent

So I have been looking at XGBoost as a place to start with this, however I am not sure the best way to accomplish what I want.
My data is set up something like this
Where every value, whether it be input or output is numerical. The issue I'm facing is that I only have 3 input data points per several output data points.
I have seen that XGBoost has a multi-output regression method, however I am only really seeing it used to predict around 2 outputs per 1 input, whereas my data may have upwards of 50 output points needing to be predicted with only a handful of scalar input features.
I'd appreciate any ideas you may have.
For reference, I've been looking at mainly these two demos (they are the same idea just one is scikit and the other xgboost)
https://machinelearningmastery.com/multi-output-regression-models-with-python/
https://xgboost.readthedocs.io/en/stable/python/examples/multioutput_regression.html

How to test a machine learning model?

I want to develop a framework(for QA testing purpose) that validates a machine learning model. I had a lot of discussions with my peers and read articles from the google.
Most of the discussions or articles are telling machine learning model will evolve with the test data that we provide. correct me if I'm wrong.
What is the possibility of developing a framework that validates the machine learning model will give accurate results?
Few ways to test the model from the articles I read: Split and Multi-split technique, Metamorphic testing
Please also suggest any other approaches
QA testing of ML-based software requires additional, and rather unconventional, tests because oftentimes their outputs for a given set of inputs are not defined, deterministic, or known a priori and they produce approximations rather than exact results.
QA may be designed to test against:
naive but predictable benchmark methods: the average method in forecasting, the class-frequency-based classifier in classification, etc.
sanity checks (the outputs being feasible/rational): e.g., is the predicted age positive?
preset objective acceptance levels: e.g., is its AUCROC > 0.5?
extreme/boundary cases: e.g., thunderstorm conditions for a weather forecast model.
bias-variance tradeoff: what is its performance on in-sample and out-of-sample data? K-Fold cross-validation is useful here.
the model itself: is the coefficient of variation of its performance measure (e.g., AUCROC) from n runs on the same data for same/random train and test partitioning within a reasonable bound?
Some of these tests need performance measures. Here is a comprehensive library of them.
I think the data flow is, actually, the one that needs to be tested here such as raw input, manipulation, test output and predictions. For example, if you have a simple linear model you actually want to test the predictions produced from that model instead of the coefficients of the model. So, maybe, the high level steps are summarized as below;
Raw Input: Does the raw input make sense? Before you start manipulating, you need to be sure the raw data values are within the expected limits. For example, if you normally see 5-10% NA rate in some data, having 95% NA rate in a new batch might be an indicator that something is wrong.
Train/Predict Ready Input: Either you train a new model or feeding new data into a already trained model for prediction, you probably want to be sure that manipulated data makes sense, too. Some ML algorithms are delicate to data anomalies. You don't want to predict a credit score around thousands just because you have some data anomalies in the input.
Model Success: By this time, you should have some idea about your model success. So, you can measure the model's performance on a new test data. You can also check train and test score if they are not significantly different (i.e. Overfitting). If you're retraining, you can compare with the previous training scores. Or, you can separate some test set and compare its score.
Predictions: Finally, you need to be sure your final output makes sense before delivering to production/clients. For example, if you're revenue forecasting for a very small shop, the daily revenue predictions can't be million dollars or some negative amounts.
Full disclosure, I wrote a small Python package for this. You can check here or download as below,
pip install mlqa

hmmlearn: Using GausianHMM, how does one calculate the probability of an observation (as opposed to the probability of a state)

i'm new to the HMM universe. I've followed the tutorials using a GaussianHMM machine learner, and they work but i was just wondering how i can use the code to display the probability of an observation given the most likely sequence, assuming i have multiple sequences of observations? Thanks
so for example, if the observations are:
seq1:[1,2,-1,4,2], seq2:[a,v,s,a,f], and the model has 2 states,
once the model predicts the states, how does one calculate the probability of an observed output [1],[a] ?

Bayesian Networks with multiple layers

So I'm trying to solve a problem with Bayesian networking. I know the conditional probabilities of some event, say that it will rain. Suppose that I measure (boolean) values from each of four sensors (A1 - A4). I know the probability that of rain and I know the probability of rain given the measurements on each of the sensors.
Now I add in a new twist. A4 is no longer available, but B1 and B2 are (they are also boolean sensors). I know the conditional probabilities of both B1 and B2 given the measurement of A4. How do I incorporate those probabilities into my Bayesian network to replace the lost data from A4?
Your problem fits perfectly to Multi-Entity Bayesian Networks (MEBN). This is an extension to standard BN using First Order Logic (FOL). It basically allows nodes to be added and/or removed based on the specific situation at hand. You define a template for creating BN on the fly, based on the current knwoledge available.
There are several papers on it available on the Web. A classic reference to this work is "Multi-Entity Bayesian Networks Without Multi-Tears".
We have implemented MEBN inside UnBBayes. You can get a copy of it by following the instructions # http://sourceforge.net/p/unbbayes/discussion/156015/thread/cb2e0887/. An example can be seen in the paper "Probabilistic Ontology and Knowledge Fusion for Procurement Fraud Detection in Brazil" # http://link.springer.com/chapter/10.1007/978-3-642-35975-0_2.
If you are interested in it, I can give you more pointers later on.
Cheers,
Rommel

Probability Density Function with Zero Standard Deviation

I am now implementing an email filtering application using the Naive Bayes algorithm. My application uses the Spambase Data Set from the UCI Machine Learning Repository. Since the attributes are continuous, I calculate the probability using the Probability Density Function (PDF). However, when I evaluate the data using the k-fold cross validation, a training set may contain only 0 for one of its attributes. For this reason, I got a 0 standard deviation and the PDF returns NaN and it leads to a huge number of spams are not correctly classified with that training set. What should I do to fix the problem?
You could use a discrete PDF, which will always be bounded.
Alternatively, simply ignore any attribute with zero variance. There is no point in including distributions with zero variance, because they won't actually do anything. For example, you want to know how old I am, and then I tell you that I live on planet Earth. That shouldn't change your estimate, because every single piece of data you have is for people on planet Earth.