Predictive Maintenance - How to use Bayesian Optimization with objective function and Logistic Regression with Gradient Descent together? - optimization

I'm trying to reproduce the problem shown in arimo.com
This is an example how to build a preventive maintenance Machine Learning model for an Hard Drive failures. The section I really don't understand is how to use Bayesian Optimization with a custom objective function and Logistic Regression with Gradient Descent together. What are the hyper-parameters to be optimized? What is the flow of the problem?
As described in our previous post, Bayesian Optimization [6] is used
to find the best hyperparameter values. The objective function to be
optimized in the hyperparameter tuning is the following score measured
on the validation set:
S = alpha * fnr + (1 – alpha) * fpr
where fpr and fnr are the False Positive and False Negative rates
obtained on the validation set. Our goal is to keep False Positive
rate low, therefore we use alpha = 0.2. Since the validation set is
highly unbalanced, we found out that standard scores like Precision,
F1-score, etc… do not work well. In fact, using this custom score is
crucial for the model to obtain a good performance generally.
Note that we only use the above score when running Bayesian
Optimization. To train logistic regression models, we use Gradient
Descent with the usual ridge loss function.
My dataframe before features selection:
index date serial_number model capacity_bytes failure Read Error Rate Reallocated Sectors Count Power-On Hours (POH) Temperature Current Pending Sector Count age yesterday_temperature yesterday_age yesterday_reallocated_sectors_count yesterday_read_error_rate yesterday_current_pending_sector_count yesterday_power_on_hours tomorrow_failure
0 77947 2013-04-11 MJ0331YNG69A0A Hitachi HDS5C3030ALA630 3000592982016 0 0 0 4909 29 0 36348284.0 29.0 20799895.0 0.0 0.0 0.0 4885.0 0.0
1 79327 2013-04-11 MJ1311YNG7EWXA Hitachi HDS5C3030ALA630 3000592982016 0 0 0 8831 24 0 36829839.0 24.0 21280074.0 0.0 0.0 0.0 8807.0 0.0
2 79592 2013-04-11 MJ1311YNG2ZD9A Hitachi HDS5C3030ALA630 3000592982016 0 0 0 13732 26 0 36924206.0 26.0 21374176.0 0.0 0.0 0.0 13708.0 0.0
3 80715 2013-04-11 MJ1311YNG2ZDBA Hitachi HDS5C3030ALA630 3000592982016 0 0 0 12745 27 0 37313742.0 27.0 21762591.0 0.0 0.0 0.0 12721.0 0.0
4 79958 2013-04-11 MJ1323YNG1EK0C Hitachi HDS5C3030ALA630 3000592982016 0 524289 0 13922 27 0 37050016.0 27.0 21499620.0 0.0 0.0 0.0 13898.0 0.0

Related

Negative binomial , Poisson-gamma mixture winbugs

Winbugs trap error
model
{
for (i in 1:5323) {
Y[i] ~ dpois(mu[i]) # NB model as a Poisson-gamma mixture
mu[i] ~ dgamma(b[i], a[i]) # NB model as a poisson-gamma mixture
a[i] <- b[i] / Emu[i]
b[i] <- B * X[i]
Emu[i] <- beta0 * pow(X[i], beta1) # model equation
}
# Priors
beta0 ~ dunif(0,10) # parameter
beta1 ~ dunif(0,10) # parameter
B ~ dunif(0,10) # over-dispersion parameter
}
X[] Y[]
1.5 0
2.9 0
1.49 0
0.39 0
3.89 0
2.03 0
0.91 0
0.89 0
0.97 0
2.16 0
0.04 0
1.12 1s
2.26 0
3.6 1
1.94 0
0.41 1
2 0
0.9 0
0.9 0
0.9 0
0.1 0
0.88 1
0.91 0
6.84 2
3.14 3
End ```
This is just a sample of the data, the model question is coming from Ezra Hauer 8.3.2, the art of regression of road safety, the model is providing an **error undefined real result. **
The aim of model is to fully Bayesian and a one step model and not use empirical bayes.
The results should be similar to MLE where beta0 is 1.65, beta1 0.871, overdispersion is 0.531
X is the only variable and y is actual collision,
So X cannot be zero or negative, while y cannot be lower than zero, if the model in solved as Poisson gamma mixture using maximum likelihood then it can be created
How can I make this model work
Solving an error in winbugs?
the data is in excel, the model worked fine when I selected the biggest 1000 observations only.

Sklearn only predicts one class while dataset is fairly balanced (±80/20 split)

I am trying to come up with a way to check what are the most influential factors of a person not paying back a loan (defaulting). I have worked with the sklearn library quite intensively, but I feel like I am missing something quite trivial...
The dataframe looks like this:
0 7590-VHVEG Female Widowed Electronic check Outstanding loan 52000 20550 108 0.099 288.205374 31126.180361 0 No Employed No Dutch No 0
1 5575-GNVDE Male Married Bank transfer Other 42000 22370 48 0.083 549.272708 26365.089987 0 Yes Employed No Dutch No 0
2 3668-QPYBK Male Registered partnership Bank transfer Study 44000 24320 25 0.087 1067.134272 26678.356802 0 No Self-Employed No Dutch No 0
The distribution of the "DefaultInd" column (target variable) is this:
0 0.835408
1 0.164592
Name: DefaultInd, dtype: float64
I have label encoded the data to make it look like this, :
CustomerID Gender MaritalStatus PaymentMethod SpendingTarget EstimatedIncome CreditAmount TermLoanMonths YearlyInterestRate MonthlyCharges TotalAmountPayments CurrentLoans SustainabilityIndicator EmploymentStatus ExistingCustomer Nationality BKR_Registration DefaultInd
0 7590-VHVEG 0 4 2 2 52000 20550 108 0.099 288.205374 31126.180361 0 0 0 0 5 0 0
1 5575-GNVDE 1 1 0 1 42000 22370 48 0.083 549.272708 26365.089987 0 1 0 0 5 0 0
2 3668-QPYBK 1 2 0 4 44000 24320 25 0.087 1067.134272 26678.356802 0 0 2 0 5 0
After that I have removed NaNs and cleaned it up some more (removing capitalizion, punctuation etc)
After that, I try to run this cell:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
y = df['DefaultInd']
X = df.drop(['CustomerID','DefaultInd'],axis=1)
X = X.astype(float)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.20,random_state=42)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print(classification_report(y_test, y_pred))
Which results in this:
precision recall f1-score support
0 0.83 1.00 0.91 1073
1 0.00 0.00 0.00 213
accuracy 0.83 1286
macro avg 0.42 0.50 0.45 1286
weighted avg 0.70 0.83 0.76 1286
As you can see, the "1" class does not get predicted 1 time, I am wondering whether or not this behaviour is to be expected (I think it is not). I tried to use class_weightd = ‘balanced’, but that resulted in an average f1 score of 0.59 (instead of 0.76)
I feel like I am missing something, or is this kind of behaviour expected and should I rebalance the dataset before fitting? I feel like the division is not that skewed (±80/20), there should not be this big of a problem.
Any help would be more than appreciated :)

Time-series prediction by separating dependent and independent variables

Suppose, I have this kind of data:
date pollution dew temp press wnd_dir wnd_spd snow rain
2010-01-02 00:00:00 129.0 -16 -4.0 1020.0 SE 1.79 0 0
2010-01-02 01:00:00 148.0 -15 -4.0 1020.0 SE 2.68 0 0
2010-01-02 02:00:00 159.0 -11 -5.0 1021.0 SE 3.57 0 0
2010-01-02 03:00:00 181.0 -7 -5.0 1022.0 SE 5.36 1 0
2010-01-02 04:00:00 138.0 -7 -5.0 1022.0 SE 6.25 2 0
I want to apply neural network for the time-series prediction of pollution.
It should be noted that other variables: dew, temp, press, wnd_dir, wnd_spd, snow, rain are independent variables of pollution.
If I implement LSTM as in here the LSTM learns for all the variables as independent; and the model can predict for all variables.
But it is not necessary to predict for all independent variables, the only requirement is pollution, a dependent variable.
Is there any way to implement LSTM or another better architecture which learns and predict for only the dependent variable, by considering other independent variables as independent, and perform much better prediction of pollution?
It seems like the example is predicting only pollution already. If you see the reframed:
var1(t-1) var2(t-1) var3(t-1) var4(t-1) var5(t-1) var6(t-1) \
1 0.129779 0.352941 0.245902 0.527273 0.666667 0.002290
2 0.148893 0.367647 0.245902 0.527273 0.666667 0.003811
3 0.159960 0.426471 0.229508 0.545454 0.666667 0.005332
4 0.182093 0.485294 0.229508 0.563637 0.666667 0.008391
5 0.138833 0.485294 0.229508 0.563637 0.666667 0.009912
var7(t-1) var8(t-1) var1(t)
1 0.000000 0.0 0.148893
2 0.000000 0.0 0.159960
3 0.000000 0.0 0.182093
4 0.037037 0.0 0.138833
5 0.074074 0.0 0.109658
The var1 seems to be pollution. As you see, you have the values from the previous step (t-1) for all variables and the value for the current step t for pollution (var1(t)).
This last variable is what the example is feeding as y, as you can see in lines:
# split into input and outputs
train_X, train_y = train[:, :-1], train[:, -1]
test_X, test_y = test[:, :-1], test[:, -1]
So the network should be already predicting only on pollution.

CPLEX: lowest possible gap isn't necessarily 0.00%?

this is a follow-up question to this question: Interpretation of GAP in CPLEX
I used the following Expression at the beginning of my optimization (min) problem:
execute gapTermination {
cplex.epgap = 0.00; // result at gap of 0%
}
This is a part of the engine log:
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
0 0 560.7929 100 560.7929 115
0 0 742.1396 57 Cuts: 121 214
0 0 744.3119 61 Cuts: 10 226
0 0 747.2193 61 Cuts: 10 233
0 0 747.2797 61 MCF: 1 234
* 0+ 0 916.3811 747.2797 18.45%
0 2 747.2797 61 916.3811 747.2797 234 18.45%
Elapsed time = 0.13 sec. (49.77 ticks, tree = 0.00 MB, solutions = 1)
* 916 755 integral 0 778.9609 753.8931 7249 3.22%
* 4739 1918 integral 0 771.9166 759.5332 25884 1.60%
Cover cuts applied: 5
Implied bound cuts applied: 8
Flow cuts applied: 27
Mixed integer rounding cuts applied: 36
Multi commodity flow cuts applied: 1
Gomory fractional cuts applied: 22
Root node processing (before b&c):
Real time = 0.11 sec. (49.41 ticks)
Parallel b&c, 16 threads:
Real time = 0.38 sec. (202.30 ticks)
Sync time (average) = 0.07 sec.
Wait time (average) = 0.07 sec.
------------
Total (root+branch&cut) = 0.49 sec. (251.71 ticks)
As you can see, the "optimal" solution seems to be found, but it still has a gap of 1,60%.
How to Interpret this? My thought would be, that I found the optimal integer solution (no single integer solution left that's better), but the non-integer value achieves an even better result being 1.60% lower (minimization Problem).
If my thought is correct, then it would mean that a 0.00% gap can only be achieved if the optimum of the relaxed solution (usually non-integer) happens to be an integer value.
I'd really appreciate if someone could help me out here. Thanks in advance.

Can anyone help meevaluate testing set data in Weka

I got one training dataset and one testing dataset. I am using weka explorer, trying to create a model with Random forest (algorithm). After creating model when I use my testing set data to implement it by (supply test set/ re-evaluate on current dataset) tab, it showing some thing like that.
What am I doing wrong?
Training Model:
=== Evaluation on training set ===
Time taken to test model on training data: 0.24 seconds
=== Summary ===
Correctly Classified Instances 5243 98.9245 %
Incorrectly Classified Instances 57 1.0755 %
Kappa statistic 0.9439
Mean absolute error 0.0453
Root mean squared error 0.1137
Relative absolute error 23.2184 %
Root relative squared error 36.4074 %
Coverage of cases (0.95 level) 100 %
Mean rel. region size (0.95 level) 59.3019 %
Total Number of Instances 5300
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.996 0.067 0.992 0.996 0.994 0.944 0.999 1.000 0
0.933 0.004 0.968 0.933 0.950 0.944 0.999 0.990 1
Weighted Avg. 0.989 0.060 0.989 0.989 0.989 0.944 0.999 0.999
=== Confusion Matrix ===
a b <-- classified as
4702 18 | a = 0
39 541 | b = 1
Model Implement on my testing dataset:
=== Evaluation on test set ===
Time taken to test model on supplied test set: 0.22 seconds
=== Summary ===
Total Number of Instances 0
Ignored Class Unknown Instances 4000
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.000 0.000 0.000 0.000 0.000 0.000 ? ? 0
0.000 0.000 0.000 0.000 0.000 0.000 ? ? 1
Weighted Avg. NaN NaN NaN NaN NaN NaN NaN NaN
=== Confusion Matrix ===
a b <-- classified as
0 0 | a = 0
0 0 | b = 1
Your test data set does not appear to have labels.
You can only evaluate your prediction quality using labeled data.