I have a training model like
Y = w * X + b
where Y and X are output and input placeholder, w and b are the vectors
I already know the value of w can only be 0 or 1, while b is still tf.float32.
How could I quantize the range of variable w when I define it?
or
Can I have two different learning rates? The rate for w is 1 or -1 and the rate for b is 0.0001 as usual.
There is no way to limit your variable during the activation. But what you can do is to limit it after each iteration. Here is one way to do this with tf.where():
import tensorflow as tf
a = tf.random_uniform(shape=(3, 3))
b = tf.where(
tf.less(a, tf.zeros_like(a) + 0.5),
tf.zeros_like(a),
tf.ones_like(a)
)
with tf.Session() as sess:
A, B = sess.run([a, b])
print A, '\n'
print B
Which will convert everything above 0.5 to 1 and everything else to 0:
[[ 0.2068541 0.12682056 0.73839438]
[ 0.00512838 0.43465161 0.98486936]
[ 0.32126224 0.29998791 0.31065524]]
[[ 0. 0. 1.]
[ 0. 0. 1.]
[ 0. 0. 0.]]
One method I have used to limit variables to a particular range is to add a constraint to my loss equation. If the variable goes outside of the desired range, then the loss will get bigger and the optimizer will push it back within the desired range.
For example:
#initialize variable to be between 0 and 1
variable = tf.Variable(tf.random_uniform([self.numOutputs], 0, 1))
#Clip the variable to force the result to be between 0 and 1 during training
variableClipped = tf.clip_by_value(variable, 0, 1)
#Set the loss to be the difference between the clipped variable and actual variable.
#Anytime it goes outside the variable range the loss will increase,
#and the optimizer will push it back within the desired range.
loss = originalLossEquation + tf.reduce_sum((variable - variableClipped)**2)
Related
I got a problem when using tf.gradients to compute gradient.
my x is a tf.constant() of a vector v of shape (4, 1)
and my y is the sigmoid of v, also of shape (4, 1), so the gradient of y with respect to x should be a diagonal matrix of shape (4, 4).
My code:
c = tf.constant(sigmoid(x_0#w_0))
d = tf.constant(x_0#w_0)
Omega = tf.gradients(c, d)
_Omega = sess.run(Omega)
the error is
Fetch argument None has invalid type .
In addition, I think using tf.gradients might be wrong, there may be some other functions that can compute this.
My question:
point out where I am wrong and how to fix it using tf.gradients
or using another function.
Edit:
want to compute the derivative like this: see the vector_by_vector section https://en.wikipedia.org/wiki/Matrix_calculus#Vector-by-vector
and the result Omega would look like the following:
[[s1(1-s1) 0 0 0 ]
[0 s2(1-s2) 0 0 ]
[0 0 s3(1-s3) 0 ]
[0 0 0 s4(1-s4)]]
where si = sigmoid(x_0i#w_0), where x_0i is the ith row of x_0.
Generally, compute a vector over another vector, should be a matrix.
First of all, you can't calculate gradients for constants. You'll get None op for gradients. That's the reason for your error. One way to calculate gradients would be tf graph (see the code below) Or other way could be using tf.GradientTape in Eager execution mode:
import tensorflow as tf
import numpy as np
arr = np.random.rand(4, 1)
ip = tf.Variable(initial_value=arr)
sess = tf.Session()
c_var = tf.math.sigmoid(ip)
Omega = tf.gradients(c_var, ip)
sess.run(tf.global_variables_initializer())
_Omega = sess.run(Omega)
print(_Omega)
Now, you can pass any sized vector. Still, not sure how you will get (4, 4) diagonal matrix for the gradients.
I am working on a Tensor Flow Coursera Course and I dont understand why I am getting a type mismatch.
This is the function I am defining:
def one_hot_matrix(labels, C):
"""
Creates a matrix where the i-th row corresponds to the ith class number and the jth column
corresponds to the jth training example. So if example j had a label i. Then entry (i,j)
will be 1.
Arguments:
labels -- vector containing the labels
C -- number of classes, the depth of the one hot dimension
Returns:
one_hot -- one hot matrix
"""
### START CODE HERE ###
# Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
C = tf.constant(C, name="C")
#labels =tf.placeholder(labels, name="labels")
# Use tf.one_hot, be careful with the axis (approx. 1 line)
one_hot_matrix = tf.one_hot(indices=labels, depth=C, axis=0)
# Create the session (approx. 1 line)
sess = tf.Session()
# Run the session (approx. 1 line)
one_hot = sess.run(one_hot_matrix, feed_dict={labels:labels, C:C})
# Close the session (approx. 1 line). See method 1 above.
sess.close()
### END CODE HERE ###
return one_hot
And when running this:
labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels, C = 4)
print ("one_hot = " + str(one_hot))
I get this type error:
TypeError Traceback (most recent call last)
<ipython-input-113-2b9d0290645f> in <module>()
1 labels = np.array([1,2,3,0,2,1])
----> 2 one_hot = one_hot_matrix(labels, C = 4)
3 print ("one_hot = " + str(one_hot))
<ipython-input-112-f9f17c86d0ba> in one_hot_matrix(labels, C)
28
29 # Run the session (approx. 1 line)
---> 30 one_hot = sess.run(one_hot_matrix, feed_dict={labels:labels, C:C})
31
32 # Close the session (approx. 1 line). See method 1 above.
TypeError: unhashable type: 'numpy.ndarray'ter code here
I checked the Tensorflow documentation for tf.one_hot and there shouldn't be a problem with np.arrays.
https://www.tensorflow.org/api_docs/python/tf/one_hot
The labels and C were constants during the graph definition. Therefore, you don't need to feed them again when calling sess.run(). I just slightly changed the line to one_hot = sess.run(one_hot_matrix1) and it should work now.
def one_hot_matrix(labels, C):
"""
Creates a matrix where the i-th row corresponds to the ith class number and the jth column
corresponds to the jth training example. So if example j had a label i. Then entry (i,j)
will be 1.
Arguments:
labels -- vector containing the labels
C -- number of classes, the depth of the one hot dimension
Returns:
one_hot -- one hot matrix
"""
### START CODE HERE ###
# Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line)
C = tf.constant(C, name="C")
#labels =tf.placeholder(labels, name="labels")
# Use tf.one_hot, be careful with the axis (approx. 1 line)
one_hot_matrix1 = tf.one_hot(indices=labels, depth=C, axis=0)
# Create the session (approx. 1 line)
sess = tf.Session()
# Run the session (approx. 1 line)
one_hot = sess.run(one_hot_matrix1) #, feed_dict={labels:labels, C:C}
# Close the session (approx. 1 line). See method 1 above.
sess.close()
### END CODE HERE ###
return one_hot
Run:
labels = np.array([1,2,3,0,2,1])
one_hot = one_hot_matrix(labels, C = 4)
print ("one_hot = " + str(one_hot))
Output:
one_hot = [[ 0. 0. 0. 1. 0. 0.]
[ 1. 0. 0. 0. 0. 1.]
[ 0. 1. 0. 0. 1. 0.]
[ 0. 0. 1. 0. 0. 0.]]
i'm trying to create a convolutional neural network, which predicts whether or not to sell for a hydropower dam, the issue i am having is the output. I input two inputs, price(a normalized float) and waterinflow (either 1 or 0 at this point).
My issue is that running this and trying to get the answer as a set of actions 0/1, gives me floats which do not make any sense other than if the output is set as the corresponding number instead of the set of actions. This is fine when the amount of actions are small, but will be horrible later on when the number of actions are extended.
Does anyone know how i can make it so that it outputs the actions as either 0 or 1, instead of the floats which seem to be certainty of the prediction.
Meaning if there are 4 actions, and the correct answer is 0, 1, 0, 1, then the predictions should be in the same form(4 actions either 0 or 1)
Any help would be much appreciated
Binary output from Normalized Probability
What you are looking for is a method of converting your normalized probability output to a binary one.
This is very straight forward in Tensorflow and involves added a tf.round function. The trick is to make sure you do not use the output tf.round in training. This is best demonstrated using a working code example.
Working code example
This code calculates the XOR function using a neural net. The outputs are y_out (the probability output) and y_binary (the casting of the probability output to binary)
### imports
import tensorflow as tf
import numpy as np
### constant data
x = [[0.,0.],[1.,1.],[1.,0.],[0.,1.]]
y_ = [[1.,0.],[1.,0.],[0.,1.],[0.,1.]]
### induction
# 1x2 input -> 2x3 hidden sigmoid -> 3x1 sigmoid output
# Layer 0 = the x2 inputs
x0 = tf.placeholder( dtype=tf.float32 , shape=[None,2] )
y0 = tf.placeholder( dtype=tf.float32 , shape=[None,2] )
# Layer 1 = the 2x3 hidden sigmoid
m1 = tf.Variable( tf.random_uniform( [2,3] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
b1 = tf.Variable( tf.random_uniform( [3] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
h1 = tf.sigmoid( tf.matmul( x0,m1 ) + b1 )
# Layer 2 = the 3x2 softmax output
m2 = tf.Variable( tf.random_uniform( [3,2] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
b2 = tf.Variable( tf.random_uniform( [2] , minval=0.1 , maxval=0.9 , dtype=tf.float32 ))
y_logit = tf.matmul( h1,m2 ) + b2
y_out = tf.nn.softmax( y_logit )
y_binary = tf.round( y_out )
### loss
# loss : a loss function that uses y_logit or y_out , but NOT y_binary
loss = tf.reduce_sum( tf.square( y0 - y_out ) )
# training step
train = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
### training
# run 500 times using all the X and Y
# print out the loss and any other interesting info
with tf.Session() as sess:
sess.run( tf.global_variables_initializer() )
print "\nloss"
for step in range(500) :
sess.run(train, feed_dict={x0:x,y0:y_})
if (step + 1) % 100 == 0 :
print sess.run(loss, feed_dict={x0:x,y0:y_})
y_out_value , y_binary_value = sess.run([y_out,y_binary], feed_dict={x0:x,y0:y_})
print "\nThe expected output is :"
print np.array(y_)
print "\nThe softmax output is :"
print np.array(y_out_value)
print "\nThe binary output is :"
print np.array(y_binary_value)
print ""
Output
The expected output is :
[[ 1. 0.]
[ 1. 0.]
[ 0. 1.]
[ 0. 1.]]
The softmax output is :
[[ 0.96538627 0.03461381]
[ 0.81609273 0.18390732]
[ 0.11534476 0.88465524]
[ 0.0978259 0.90217412]]
The binary output is :
[[ 1. 0.]
[ 1. 0.]
[ 0. 1.]
[ 0. 1.]]
As you can see, you can retrieve the probability outputs OR the probabilities cast as binary and still have all the benefits of classic logits.
Cheers.
I guess it is important to note that the output of neural nets are actually posterior probability computed on each element of the classes present---for a typical classification problem.
The figures returned tells you how likely is the ouput to be of class A, B, C given the input x. So that you can not expect to get 0 or 1 always.
#An example would be if I get
Output = [0.5,0.2,0.3] given input x.
#I predict the class should be A because it has posterior of 0.5
(the highest value of the 3 values returned).
Class = A (0.5)
# Or I might as well round it up. Tensor flow can do this for you
So I guess you should get the output and apply probabilistic assumptions thats fit your model like say the highest value in the returned predictions gives the class it belongs.
It might not be easy to wait for absolute one or zero prediction.
Be careful of this fact I wrote above. Its a common mistake. And please do read the paper below. Once you have posteriors, you can add and build models on them. There is no limitation to what you can achieve!
For example you can apply Gaussian Mixture models/ Markov models/Build decision Tress/Combine expert systems on the output, those are the elegant and scientific approach.
Read this paper for more info.
http://www.ee.iisc.ac.in/people/faculty/prasantg/downloads/NeuralNetworksPosteriors_Lippmann1991.pdf
Hope it helps!
When I get the output of the network, it is a tensor with the size like [batch_size, height, weight].The content is the probability. What I want to do is to set a threshold to the tensor, and do the binarization processing. So what should I do to the tensor?
You can use tf.clip_by_value:
import tensorflow as tf
a = tf.random_uniform(shape=(2, 3, 3), minval=-1, maxval=3)
b = tf.clip_by_value(a, 0, 1)
with tf.Session() as sess:
A, B = sess.run([a, b])
print A, '\n'
print B
Here everything above 1 will become 1, everything below 0 will be 0. Everything else will stay the same.
Also take a look at this answer which does a similar thing but convert everything to 0 or 1:
Here's my problem. I have a tensor X and I want to set all negative values to zero. In numpy, I would do the following np.maximum(0, X). Is there any way to achieve the same effect in tensorflow? I tried tf.maximum(tf.fill(X.get_shape(), 0.0), X), but this throws ValueError: Cannot convert a partially known TensorShape to a Tensor: (?,).
PS. X is a 1-D tensor of shape (?,).
As it happens, your problem is exactly the same as computing the rectifier activation function, and TensorFlow has a built-in operator, tf.nn.relu(), that does exactly what you need:
X_with_negatives_set_to_zero = tf.nn.relu(X)
You can use tf.clip_by_value function as follows:
t = tf.clip_by_value(t, min_val, max_val)
It will clip tensor t in the range [min_val, max_val]. Here you can set min_val to 0 to clip all negative values and set those to 0. More documentation about clip_by_value.
A simple solution is to use the cast function keras documentation (as suggested by #ldavid)
X = tf.cast(X > 0, X.dtype) * X
Moreover this can be adapted to any threshold level with :
X = tf.cast(X > threshold, X.dtype) * X
One possible solution could be this (although it's not the best):
class TensorClass(object):
def __init__(tensor_values):
self.test_tensor = tf.Variable(tensor_values, name="test_tensor")
test_session = tf.Session()
with test_session.as_default():
tc = TensorClass([1, -1, 2, -2, 3])
test_session.run(tf.initialize_all_variables())
test_tensor_value = test_session.run(tc.test_tensor)
print(test_tensor_value) # Will print [1, -1, 2, -2, 3]
new_test_tensor_value = [element * int(element > 0) for element in test_tensor_value]
test_tensor_value_assign_op = tf.assign(tc.test_tensor, new_test_tensor_value)
test_session.run(test_tensor_value_assign_op)
test_tensor_value = test_session.run(tc.test_tensor)
print(test_tensor_value) # Will print [1 0 2 0 3]
While this does what you need, it's not done in tensorflow. We are pulling out a tensorflow variable, changing it, and putting it back again.
For performance critical things, don't use this because it's not very efficient.