Scaler Transform help sklearn - pandas

I'm working on a logistic regression assignment and my professor has this code example.
What is the new_x variable and why are we transforming it as a matrix?
data = pd.DataFrame( {’id’: [ 1,2,3,4,5,6,7,8], ’Label’: [’green’, ’green’, ’green’, ’green’,
’red’, ’red’, ’red’, ’red’],
’Height’: [5, 5.5, 5.33, 5.75, 6.00, 5.92, 5.58, 5.92],
’Weight’: [100, 150, 130, 150, 180, 190, 170, 165], ’Foot’: [6, 8, 7, 9, 13, 11, 12, 10]},
columns = [’id’, ’Height’, ’Weight’, ’Foot’, ’Label’] )
X = data[[’Height’, ’Weight’]].values
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
Y = data[’Label’].values
log_reg_classifier = LogisticRegression()
log_reg_classifier.fit(X,Y)
new_x = scaler.transform(np.asmatrix([6, 160]))
predicted = log_reg_classifier.predict(new_x)
accuracy = log_reg_classifier.score(X, Y)

Let's take it step by step.
data = pd.DataFrame( {’id’: [ 1,2,3,4,5,6,7,8], ’Label’: [’green’, ’green’, ’green’, ’green’,
’red’, ’red’, ’red’, ’red’],
’Height’: [5, 5.5, 5.33, 5.75, 6.00, 5.92, 5.58, 5.92],
’Weight’: [100, 150, 130, 150, 180, 190, 170, 165], ’Foot’: [6, 8, 7, 9, 13, 11, 12, 10]},
columns = [’id’, ’Height’, ’Weight’, ’Foot’, ’Label’] )
You create an initial feature matrix that contains the columns [’id’, ’Height’, ’Weight’, ’Foot’, ’Label’].
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
Y = data[’Label’].values
You than obtain a np.array, that contains only weight and height using data[[’Height’, ’Weight’]].values. See pandas docs on slicing for more info. You can obtain the size of the feature matrix with X.shape i. e., [n,2].
X = data[[’Height’, ’Weight’]].values
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
Y = data[’Label’].values
log_reg_classifier = LogisticRegression()
log_reg_classifier.fit(X,Y)
You use those two features only to train the logistic regression after standardization.
That is your classifier is learned on two features (i. e., height and weight) only, but mutliple samples. Every classifier in sklearn implements the fit() method to fit the classifier to the training data.
As your model is trained on a feature matrix with two features, your sample that you want to predict (new_x) also needs two features. Thus, you first create a np.asmatrix([6, 160] with shape [1,2] and elements [height=6,weight=160], scale it and pass it to your trained model. log_reg_classifier.predict(new_x) returns the prediction. You assess the performance of the classifier by comparing the prediction with the true label and calculating the (mean) accuracy. Et voila.
new_x = scaler.transform(np.asmatrix([6, 160]))
predicted = log_reg_classifier.predict(new_x)
accuracy = log_reg_classifier.score(X, Y)

Related

How can I easily convert a numpy tuple to tuple tensor in Tensorflow

I have numpy tuple (with len 4, 5, 6, or more). How can I convert a numpy tuple to a Tensorflow tuple with input like this:
import tensorflow as tf
import numpy as np
a = np.array([[20, 20], [40, 40]], dtype=np.int32)
b = np.array([[20, 20, 20], [40, 40, 40], [60, 60, 60]], dtype=np.int32)
c = np.array([[20, 20], [40, 40]], dtype=np.int32)
d = np.array([[20, 20, 20], [40, 40, 40], [60, 60, 60]], dtype=np.int32)
e = (a, b, c, d) # e is numpy tensor i want convert to tensor
tf_shapes = ((None, 2), (None, 3), (2, 2), (3, 3))
tf_types = (tf.int64, tf.float32, tf.int64, tf.float32)
I must write a generator to convert this to a Tensorflow tuple.
def data_generator():
for i in range(16):
yield a, b, c, d
dataset=tf.data.Dataset.from_generator(data_generator, tf_types, tf_shapes).batch(batch_size=4, drop_remainder=True)
for sample in dataset:
res = model(sample, training=False)
How can I get a sample directly without not using tf.data.Dataset.from_generator?
I'm not sure if I understood your question correctly, but it appears that you just want to have a, b, c, and d converted to tensorflow tensors without having to use the tf.data.Dataset.from_generator function.
In that case, you can simply use tf.convert_to_tensor:
import tensorflow as tf
import numpy as np
a_tensor = tf.convert_to_tensor(a, np.int32)
b_tensor = tf.convert_to_tensor(b, np.int32)
c_tensor = tf.convert_to_tensor(c, np.int32)
d_tensor = tf.convert_to_tensor(d, np.int32)
# use the tensors however you want
Additionaly, if you want to have a tensor that is similar to e in your code, then do:
e_tensor = tf.stack(e, axis=0)
# e_tensor[0] == a_tensor, e_tensor[1] == b_tensor, ...

Passing x_train as a list of numpy arrays to tf.data.Dataset is not working

My problem is that x_train in tf.data.Dataset.from_tensor_slices(x_train, y_train) needs to be a list. When I use the following lines to pass [x1_train,x2_train] to tensorflow.data.Dataset.from_tensor_slices, then I get error (x1_train, x2_train and y_train are numpy arrays):
Train=tensorflow.data.Dataset.from_tensor_slices(([x1_train,x2_train], y_train)).batch(batch_size)
Error:
Train=tensorflow.data.Dataset.from_tensor_slices(([x1_train,x2_train], y_train)).batch(batch_size)
return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Can't convert non-rectangular Python sequence to Tensor.
What should I do?
If the main goal is to feed data to a model having multiple input layers then the following might be helpful:
import tensorflow as tf
from tensorflow import keras
import numpy as np
def _input_fn(n):
x1_train = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
x2_train = np.array([15, 25, 35, 45, 55, 65, 75, 85], dtype=np.int64)
labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
dataset = tf.data.Dataset.from_tensor_slices(({"input_1": x1_train, "input_2": x2_train}, labels))
dataset = dataset.batch(2, drop_remainder=True)
dataset = dataset.repeat(n)
return dataset
input1 = keras.layers.Input(shape=(1,), name='input_1')
input2 = keras.layers.Input(shape=(1,), name='input_2')
model = keras.models.Model(inputs=[input_1, input_2], outputs=output)
basically instead of passing a python list, pass a dictionary where the key indicates the layer's name to which the array will be fed to.
like in the above array x1_train will be fed to tensor input1 whose name is input_1. Refered from here
If you have a dataframe with different types (float32, int and str) you have to create it manually.
Following the Pratik's syntax:
tf.data.Dataset.from_tensor_slices(({"input_1": np.asarray(var_float).astype(np.float32), "imput_2": np.asarray(var_int).astype(np.int), ...}, labels))

Tensorflow tf.metrics.accuracy multi-label always zero

My label looks like this:
label = [0, 1, 0, 0, 1, 1, 0]
In other words, classes 1, 4, 5 are present at the corresponding sample. I believe this is called a soft class.
I'm calculating my loss with:
logits = tf.layers.dense(encoding, 7, activation=None)
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(
labels=labels,
logits=logits
)
loss = tf.reduce_mean(cross_entropy)
According to Tensorboard, the loss is decreasing over time, as expected. However, the accuracy is flat at zero:
eval_metric_ops = {
'accuracy': tf.metrics.accuracy(labels=labels, predictions=logits),
}
tf.summary.scalar('accuracy', eval_metric_ops['accuracy'][1])
How do I calculate the accuracy of my model when using soft classes?
Did you solve this? I think the comment about softmax_cross_entropy_with_logits is incorrect because you have a multi-label, (each label is a) binary-class problem.
Partial solution:
labels = tf.constant([1, 1, 1, 0, 0, 0]) # example
predicitons = tf.constant([0, 1, 0, 0, 1, 0]) # example
is_equal = tf.equal(label, predicitons)
accuracy = tf.reduce_mean(tf.cast(is_equal, tf.float32))
This gives a number but still need to convert it into a tf metric.

Alternative to np.insert in tensorflow

There is a function in numpy that inserts given values to the array:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.insert.html
Is there something similar in tensorflow?
Alternatively, is there a function in tensorflow that can do tensor upsampling using zeros in between values of a tensor?
tf.nn.conv2d_transpose can do this upsampling (with careful design of output_shape and strides). A sample code:
import tensorflow as tf
import numpy as np
input = tf.convert_to_tensor(np.ones((1, 20, 20, 1)))
input = tf.cast(input, tf.float32)
b = np.zeros((3, 3, 1, 1))
b[1, 1, 0, 0] = 1
weight = tf.convert_to_tensor(b)
weight = tf.cast(weight, tf.float32)
output = tf.nn.conv2d_transpose(input, weight, output_shape=(1, 40, 40, 1), strides=[1, 2, 2, 1])
sess = tf.Session()
print sess.run(output[0, :, :, 0])
I believe checking its api will help you more.

TensorFlow: questions regarding tf.argmax() and tf.equal()

I am learning the TensorFlow, building a multilayer_perceptron model. I am looking into some examples like the one at: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/multilayer_perceptron.ipynb
I then have some questions in the code below:
def multilayer_perceptron(x, weights, biases):
:
:
pred = multilayer_perceptron(x, weights, biases)
:
:
with tf.Session() as sess:
sess.run(init)
:
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Accuracy:", accuracy.eval({x: X_test, y: y_test_onehot}))
I am wondering what do tf.argmax(prod,1) and tf.argmax(y,1) mean and return (type and value) exactly? And is correct_prediction a variable instead of real values?
Finally, how do we get the y_test_prediction array (the prediction result when the input data is X_test) from the tf session? Thanks a lot!
tf.argmax(input, axis=None, name=None, dimension=None)
Returns the index with the largest value across axis of a tensor.
input is a Tensor and axis describes which axis of the input Tensor to reduce across. For vectors, use axis = 0.
For your specific case let's use two arrays and demonstrate this
pred = np.array([[31, 23, 4, 24, 27, 34],
[18, 3, 25, 0, 6, 35],
[28, 14, 33, 22, 20, 8],
[13, 30, 21, 19, 7, 9],
[16, 1, 26, 32, 2, 29],
[17, 12, 5, 11, 10, 15]])
y = np.array([[31, 23, 4, 24, 27, 34],
[18, 3, 25, 0, 6, 35],
[28, 14, 33, 22, 20, 8],
[13, 30, 21, 19, 7, 9],
[16, 1, 26, 32, 2, 29],
[17, 12, 5, 11, 10, 15]])
Evaluating tf.argmax(pred, 1) gives a tensor whose evaluation will give array([5, 5, 2, 1, 3, 0])
Evaluating tf.argmax(y, 1) gives a tensor whose evaluation will give array([5, 5, 2, 1, 3, 0])
tf.equal(x, y, name=None) takes two tensors(x and y) as inputs and returns the truth value of (x == y) element-wise.
Following our example, tf.equal(tf.argmax(pred, 1),tf.argmax(y, 1)) returns a tensor whose evaluation will givearray(1,1,1,1,1,1).
correct_prediction is a tensor whose evaluation will give a 1-D array of 0's and 1's
y_test_prediction can be obtained by executing pred = tf.argmax(logits, 1)
The documentation for tf.argmax and tf.equal can be accessed by following the links below.
tf.argmax() https://www.tensorflow.org/api_docs/python/math_ops/sequence_comparison_and_indexing#argmax
tf.equal() https://www.tensorflow.org/versions/master/api_docs/python/control_flow_ops/comparison_operators#equal
Reading the documentation:
tf.argmax
Returns the index with the largest value across axes of a tensor.
tf.equal
Returns the truth value of (x == y) element-wise.
tf.cast
Casts a tensor to a new type.
tf.reduce_mean
Computes the mean of elements across dimensions of a tensor.
Now you can easily explain what it does. Your y is one-hot encoded, so it has one 1 and all other are zero. Your pred represents probabilities of classes. So argmax finds the positions of best prediction and correct value. After that you check whether they are the same.
So now your correct_prediction is a vector of True/False values with the size equal to the number of instances you want to predict. You convert it to floats and take the average.
Actually this part is nicely explained in TF tutorial in the Evaluate the Model part
tf.argmax(input, axis=None, name=None, dimension=None)
Returns the index with the largest value across axis of a tensor.
For the case in specific, it receives pred as argument for it's input and 1 as axis. The axis describes which axis of the input Tensor to reduce across. For vectors, use axis = 0.
Example: Given the list [2.11,1.0021,3.99,4.32] argmax will return 3 which is the index of the highest value.
correct_prediction is a tensor that will be evaluated later. It is not a regular python variable. It contains the necessary information to compute the value later.
For this specific case, it will be part of another tensor accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) and will be evaluated by eval on accuracy.eval({x: X_test, y: y_test_onehot}).
y_test_prediction should be your correct_prediction tensor.
For those who do not have much time to understand tf.argmax:
x = np.array([[1, 9, 3],[4, 5, 6]])
tf.argmax(x, axis = 0)
output:
[array([1, 0, 1], dtype=int64)]
tf.argmax(x, axis = 1)
Output:
[array([1, 2], dtype=int64)]
source