Printing the multi layer Perceptron ANN prediction value in the data frame - tensorflow

I am the beginner to tensor flow and need to
get the predicted value (if customer will subscribe the term deposit or not) as the output (in data frame format) for Multi-Perception ANN Model for given data frame as the input..
for Banking Campaign..
We are referring this sample
https://github.com/ManikandanJeyabal/Workplace/blob/master/ANN/TensorFlow/BankMarketing.py
We have tried to run this in notebooks on Azure virtual machine with Python 3.6
In above sample , we will need to modify the source code below to get the predictions (in the form of data frame , so that it can be displayed as the report.)
plt.plot(mse_his, 'r')
plt.show()
plt.plot(accu_his)
plt.show()
# print the final accuracy
correct_pred = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
print("Test Accuracy--> ", (sess.run(accuracy, feed_dict={x: x_test, y_: y_test})))
# print final mean square error
pred_y = sess.run(y, feed_dict={x: x_test})
mse = tf.reduce_mean(tf.square(pred_y - y_test))
print("MSE: %.4f" % sess.run(mse))
print(correct_pred)
print(y_test) ```
we need to get the output in the form of panadas dataframe along with the predicted columns?
Please guide me here
----------------------------------------------
Updates:
Thank you for the response,McAngus..
After the changes in comments below.. I could render the dataframe output but with this output , How can I derive True or False Predicted Value?
[Dataframe Output][1]
[1]: https://i.stack.imgur.com/f8iJ9.png

If I understand correctly, you're looking to put the results in a data frame. From here you can use pd.DataFrame.from_dict like so:
pd.DataFrame.from_dict({"target": y_test.tolist(), "prediction": pred_y.tolist()})
This will give the column headers target and prediction.

Related

How can I implement iterative incremental training for xgboost? Is it worth doing?

I am currently using Xgboost version 1.3.1. There is a custom docker image created out of training scripts and uses SageMaker to run training. Training data is also present in S3. I am facing an issue recently that input data size (data frame) required is more than what the box could support (and there is no higher instance after that). And hence facing OOM issue
I would like to know, if there is a way to resolve this big data issue. Or is it possible to load data iteratively and train using xgb_model option? If so how?
Thanks in advance
I don't know about Sage, but to train using XGB incrementally in batches of rows, I do the following. For your case, I guess you will have to see if this helps you make data fit.
first, I split the dataframe into X,y, then convert them to np arrays
e.g.
X = pd.read_csv(final_ds)
y = X.pop('target')
X = X.values # convert to numpy array
y = y.values # convert to numpy array
Then do the split into the usual X_train, X_valid, y_train, y_valid
Determine batch size desired, like for example here :
size = len(X_train)
print()
print(f'Size of X_train: {size}')
print()
for i in range(1, size):
if (size % i) == 0:
print(f'{i}', end=' ') # choose from these
*The smaller the batch size, the slower the training
Split data into batches
batch_size = <your selected batch size>
col_size = <the number of columns of X_train>
X_train_batched, y_train_batched = X_train.reshape(-1,batch_size,col_size), y_train.reshape(-1,batch_size)
Then use the parameter xgb_model in the xgb fit(). This tells the fit() to resume from the last trial.
e.g.
param = <your xgb parmeters>
model_xgbc = XGBClassifier(**param,use_label_encoder =False)
'''.
For incremental, use xgb_model parameter in fit().
Run the 1st fit() first without xgb_model as parameter, and
next fits with the xgb_model which contains the model object of the last training.
'''
# Fit Model
for i, (X_batch, y_batch) in enumerate(zip(X_train_batched, y_train_batched)):
print(f'Step: {i}',end = ' ')
if i == 0:
model_xgbc.fit(X_batch, y_batch, eval_set=[(X_valid, y_valid)],
verbose=False, eval_metric = ['logloss'],
early_stopping_rounds = 400)
else:
model_xgbc.fit(X_batch, y_batch, eval_set=[(X_valid, y_valid)],
verbose=False, eval_metric = ['logloss'],
early_stopping_rounds = 400,
xgb_model = model_xgbc
)
preds = model_xgbc.predict(X_valid)
rmse = metrics.mean_squared_error(y_valid, preds,squared=False)
print(rmse)

Identify misclassified images with Tensorflow

I have been working on an image classifier and I would like to have a look at the images that the model has misclassified in the validation. My idea was to compare the true and predicted values and used the index of the values that didn't match to get the images.
However, when I tried to compare the accuracy I don't get the same result I got when I use the evaluate method.
This is what I have done:
I import the data using this function:
def create_dataset(folder_path, name, split, seed, shuffle=True):
return tf.keras.preprocessing.image_dataset_from_directory(
folder_path, labels='inferred', label_mode='categorical', color_mode='rgb',
batch_size=32, image_size=(320, 320), shuffle=shuffle, interpolation='bilinear',
validation_split=split, subset=name, seed=seed)
train_set = create_dataset(dir_path, 'training', 0.1, 42)
valid_set = create_dataset(dir_path, 'validation', 0.1, 42)
# output:
# Found 16718 files belonging to 38 classes.
# Using 15047 files for training.
# Found 16718 files belonging to 38 classes.
# Using 1671 files for validation.
Then to evaluate the accuracy on the validation set I use this line:
model.evaluate(valid_set)
# output:
# 53/53 [==============================] - 22s 376ms/step - loss: 1.1322 - accuracy: 0.7349
# [1.1321837902069092, 0.7348892688751221]
which is fine since the values are exactly the same I got in the last epoch of training.
To extract the true labels from the validation set I use this line of code based on this answer. Note that I need to create the validation again because every time I call the variable that refers to the validation set, the validation set gets shuffled.
I thought that it was this factor to cause the inconsistent accuracy, but apparently it didn't solve the problem.
y_val_true = np.concatenate([y for x, y in create_dataset(dir_path, 'validation', 0.1, 42)], axis=0)
y_val_true = np.argmax(y_val_true, axis=1)
I make the prediction:
y_val_pred = model.predict(create_dataset(dir_path, 'validation', 0.1, 42))
y_val_pred = np.argmax(y_val_pred, axis=1)
And finally I compute once again the accuracy to verify that everything is ok:
m = tf.keras.metrics.Accuracy()
m.update_state(y_val_true, y_val_pred)
m.result().numpy()
# output:
# 0.082585275
As you can see, instead of getting the same value I got when I ran the evaluate method, now I get only 8%.
I would be truly grateful if you could point out where my approach is flawed.
And since the my first question I post, I apologize in advance for any mistake I made.
This method can help provide giving insights if you want to display or analyse batch-by-batch
m = tf.keras.metrics.Accuracy()
# Iterating over individual batches to keep track of the images
# being fed to the model.
for valid_images, valid_labels in valid_set.as_numpy_iterator():
y_val_true = np.argmax(valid_labels, axis=1)
# Model can take inputs other than dataset as well. Hence, after images
# are collected you can give them as input.
y_val_pred = model.predict(valid_images)
y_val_pred = np.argmax(y_val_pred, axis=1)
# Update the state of the accuracy metric after every batch
m.update_state(y_val_true, y_val_pred)
m.result().numpy()
If you want to feed altogether
valid_ds = create_dataset(dir_path, 'validation', 0.1, 42, shuffle=False)
y_val_true = np.concatenate([y for x, y in valid_ds, axis=0)
y_val_true = np.argmax(y_val_true, axis=1)
y_val_pred = model.predict(valid_ds)
y_val_pred = np.argmax(y_val_pred, axis=1)
m = tf.keras.metrics.Accuracy()
m.update_state(y_val_true, y_val_pred)
m.result().numpy()
I couldn't find the bug in your code though.

Low score in Linear Regression with discrete attributes

I'm trying to do a linear regression in my dataframe. The dataframe is about apple applications, and I want to predict the notes of applications. The notes are in following format:
1.0
1.5
2.0
2.5
...
5.0
My code is:
atributos = ['size_bytes','price','rating_count_tot','cont_rating','sup_devices_num','num_screenshots','num_lang','vpp_lic']
atrib_prev = ['nota']
X = np.array(data_regress.drop(['nota'],1))
y = np.array(data_regress['nota'])
X = preprocessing.scale(X)
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
clf = LinearRegression()
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
print(accuracy)
But my accuracy is 0.046295306696438665. I think this occurs because the linear model is predicting real values, while my 'note' is real, but at intervals. I don't know how to round this values before the clf.score.
First, for regression models, clf.score() calculates R-squared value, not accuracy. So you would need to decide if you want to treat this problem as a classification problem (For some fixed number of target labels) or a regression problem (for a real-valued target)
Secondly, if you insist on using regression models and not classification, you can call clf.predict() to first get the predicted values and then round off as you want to, and then call r2_score() on actual and predicted labels. Something like:
# Get actual predictions
y_pred = clf.predict(X_test)
# You will need to implement the round function yourself
y_pred_rounded = round(y_pred)
# Call the appropriate scorer
score = r2_score(y_test, y_pred_rounded)
You can look at the sklearn documentation here for available metrics in sklearn.

Tensorflow - training Adam

I try to build my first simple neural network with tensorflow, above you can see my code. My code can calculate the loss, but when i try to add the train_step i got the error message InvalidArgumentError (see above for traceback): Matrix size-incompatible: In[0]: [2,2], In[1]: [1024,1], which says that the dimensions of the matrxis aren't compatible, but i don't understand the dimensions. In my opinion they must be [1] and [1]...
input=[[1,2,3,4,5],[6,7,8,9,10]]
labels=[1,1]
x = tf.placeholder(tf.float32, [None, 5])
y = tf.placeholder(tf.float32)
hidden = tf.layers.dense(inputs=x, units=1024, activation=tf.nn.relu)
output = tf.layers.dense(inputs=hidden, units=1)
loss = tf.losses.softmax_cross_entropy(y, output)
train_step = tf.train.AdamOptimizer(1).minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for i in range(1):
result = sess.run(train_step, feed_dict={x: input,y: labels})
print(result)
The reason is due to your input and labels being inconsistent. For your inputs, you have 2 input vectors with dimensions (1, 5). In your output layer, you have one output. And in your labels, you have only one example of dimension (1,2).
Two fixes depending on what you wanted to do. If you meant to do two training examples (which is what it looks like you're doing):
input=[[1,2,3,4,5],[6,7,8,9,10]]
labels=[[1],[1]]
and keep the rest the same. This way, you have 2 input vectors, and 2 label examples.
Second possible interpretation, where you are feeding in 2 input vectors, both with the label of [1, 1]. Then keep everything the same, but change the output layer to:
output = tf.layers.dense(inputs=hidden, units=2)
I'm pretty sure the first fix is what you're looking for. Also your code will never update your neural network because you did not sess.run(train_step) anywhere. If you want it to actually train, you'll need that step as well.

How to output tensor flow prediction results in .csv?

I am training a model using CNN.
Here is my prediction part in the model.
predictions = {
"classes": tf.argmax(input=logit2, axis=1),
"probabilities": tf.nn.softmax(logit2, name="softmax_tensor")
}
Here is the code in main that do the evaluation.
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": images_test},
y=test_labels,
num_epochs=1,
shuffle=False)
eval_results = model.evaluate(input_fn=eval_input_fn)
I have trained my models, now I have a list of test image names (in the first column of a csv file), and I want to make the predictions and output the corresponding results to the second column (with probability between 0 and 1), how to achieve this, and where to add the code?
Thanks in advance.
The Estimator class has a predict function that returns the predictions as an iterable object (for an example, scroll to the very bottom of this page).
so you could do:
predictions = model.predict(input_fn=predict_input_fn)
for p in predictions:
# write p['classes'] to the csv
As for writing to the second column of the csv, take a look at the csv python module.