using H2O flow XGboost model - xgboost

It gives a regression prediction as continuous score with negative values, like -1.27544 < x < 6.68112. How I interpret the negatives?

If you are using an H2O algorithm to predict a binary target (0/1), unless you convert your target column to a factor using (.asfactor() in python or as.factor() in R), H2O will assume this column is numeric and will solve a regression problem.
please verify the data type of your target column (it will likely show integer) and make sure that it shows enum.
more informations about your target distribution choices can be found here: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/distribution.html

Related

How to perform dynamic optimization for a nonlinear discrete optimization problem with nonlinear constraints, using non-linear solvers like SNOPT?

I am new to the field of optimization and I need help in the following optimization problem. I have tried to solve it using normal coding to make sure that I got he correct results. However, the results I got are different and I am not sure my way of analysis is correct or not. This is a short description of the problem:
The objective function shown in the picture is used to find the optimal temperature of the insulating system that minimizes the total cost over a given horizon.
[This image provides the mathematical description of the objective function and the constraints] (https://i.stack.imgur.com/yidrO.png)
The data of the problems are as follow:
1-
Problem data:
A=1.07×10^8
h=1
T_ref=87.5
N=20
p1=0.001;
p2=0.0037;
This is the curve I want to obtain
2- Optimization variable:
u_t
3- Model type:
The model is a nonlinear cost function with non-linear constraints and it is solved using non-linear solver SNOPT.
4-The meaning of the symbols in the objective and constrained functions
The optimization is performed over a prediction horizon of N years.
T_ref is The reference temperature.
Represent the degree of polymerization in the kth year.
X_DP Represents the temperature of the insulating system in the kth year.
h is the time step (1 year) of the discrete-time model.
R is the ratio of the load loss at the rated load to the no-load loss.
E is the activation energy.
A is the pre-exponential constant.
beta is a linear coefficient representing the cost due to the decrement of the temperature.
I have developed the source code in MATLAB, this code is used to check if my analysis is correct or not.
I have tried to initialize the Ut value in its increasing or decreasing states so that I can have the curves similar to the original one. [This is the curve I obtained] (https://i.stack.imgur.com/KVv2q.png)
I have tried to simulate the problem using conventional coding without optimization and I got the figure shown above.
close all; clear all;
h=1;
N=20;
a=250;
R=8.314;
A=1.07*10^8;
E=111000;
Tref=87.5;
p1=0.0019;
p2=0.0037;
p3=0.0037;
Utt=[80,80.7894736842105,81.5789473684211,82.3684210526316,83.1578947368421,... % The value of Utt given here represent the temperature increament over a predictive horizon.
83.9473684210526,84.7368421052632,85.5263157894737,86.3157894736842,...
87.1052631578947,87.8947368421053,88.6842105263158,89.4736842105263,...
90.2631578947369,91.0526315789474,91.8421052631579,92.6315789473684,...
93.4210526315790,94.2105263157895,95];
Utt1 = [95,94.2105263157895,93.4210526315790,92.6315789473684,91.8421052631579,... % The value of Utt1 given here represent the temperature decreament over a predictive horizon.
91.0526315789474,90.2631578947369,89.4736842105263,88.6842105263158,...
87.8947368421053,87.1052631578947,86.3157894736842,85.5263157894737,...
84.7368421052632,83.9473684210526,83.1578947368421,82.3684210526316,...
81.5789473684211,80.7894736842105,80];
Ut1=zeros(1,N);
Ut2=zeros(1,N);
Xdp =zeros(N,N);
Xdp(1,1)=1000;
Xdp1 =zeros(N,N);
Xdp1(1,1)=1000;
for L=1:N-1
for k=1:N-1
%vt(k+L)=Ut(k-L+1);
Xdq(k+1,L) =(1/Xdp(k,L))+A*exp((-1*E)/(R*(Utt(k)+273)))*24*365*h;
Xdp(k+1,L)=1/(Xdq(k+1,L));
Xdp(k,L+1)=1/(Xdq(k+1,L));
Xdq1(k+1,L) =(1/Xdp1(k,L))+A*exp((-1*E)/(R*(Utt1(k)+273)))*24*365*h;
Xdp1(k+1,L)=1/(Xdq1(k+1,L));
Xdp1(k,L+1)=1/(Xdq1(k+1,L));
end
end
% MATLAB code
for j =1:N-1
Ut1(j)= -p1*(Utt(j)-Tref);
Ut2(j)= -p2*(Utt1(j)-Tref);
end
sum00=sum(Ut1);
sum01=sum(Ut2);
X1=1./Xdp(:,1);
Xf=1./Xdp(:,20);
Total= table(X1,Xf);
Tdiff =a*(Total.Xf-Total.X1);
X22=1./Xdp1(:,1);
X2f=1./Xdp1(:,20);
Total22= table(X22,X2f);
Tdiff22 =a*(Total22.X2f-Total22.X22);
obj=(sum00+(Tdiff));
ob1 = min(obj);
obj2=sum01+Tdiff22;
ob2 = min(obj2);
plot(Utt,obj,'-o');
hold on
plot(Utt1,obj)

Feature Selection for Text Classification with Information Gain in R

I´m trying to prepare my dataset ideally for binary document classification with an SVM algorithm in R.
The dataset is a combination of 150171 labelled variables and 2099 observations stored in a dataframe. The variables are a combination uni- and bigrams which were retrieved from a text dataset.
When I´m trying to calculate the Information gain as a feature selection method, the Error "cannot allocate vector of size X Gb" occurs although I already extended my memory and I´m running on a 64-bit operating system. I tried the following package:
install.packages("FSelector")
library(FSelector)
value <- information.gain(Usefulness ~., dat_SentimentAnalysis)
Does anybody know a solution/any trick for this problem?
Thank you very much in advance!

Tensorflow: Add small number before division for numerical stability

In order to prevent divisions by zero in TensorFlow, I want to add a tiny number to my dividend. A quick search did not yield any results. In particular, I am interested in using the scientific notation, e.g.
a = b/(c+1e-05)
How can this be achieved?
Assuming a, b and c are tensors. The formula you have written will work as expected. 1e-5 will be broadcasted and added on the tensor c. Tensorflow automatically typecasts the 1e-5 to tf.constant(1e-5).
Tensorflow however has some limitations with non-scalar broadcasts. Take a look at my other answer.

Tensorflow - how to predict modular arithmetic/clock arithmetic (angles)?

I'm working on a machine learning project where I'm using TensorFlow (and DNNRegressor). I want to predict a modular arithmetic value (an angle) ranging between -pi and pi. When I try doing it "the normal way" the model isn't very good, as it doesn't understand that -pi and pi is actually the same value.
Does tensorflow have any functionality to make ML models with modular arithmetic?
You should output two values in this case: sin(angle) and cos(angle). Then you can reconstruct the real angle from this (school trigonometry).
The loss function can be a sum of RMSEs for each output.

Xgboost node splits on a value that out of feature range?

I have some features that range from 0 to 1.
But when I dump the model, I find some nodes split those features using "feature < 2.00001".
Does xgboost scale the feature or add some value to the feature? Or why 2.00001 is chosen to split?
Thanks~
xgboost has separate splits based on values of the feature and whether or not the feature is missing.
This is usually when xgboost wants to split only on whether or not the feature is missing, and not based on the value.