None Value while optimizing data with gradient descent - tensorflow

I'm trying to make a small neural network in tensorflow and I'm a bit new in this. I saw this in a tutorial (http://de.slideshare.net/tw_dsconf/tensorflow-tutorial) and everything is working fine till I try to optimize the weights (with gradient descent) since I get a Null value.
with tf.Session() as sess:
x = tf.placeholder("float",[1,3],name="x")
w = tf.Variable(tf.random_uniform([3,3]),name="w")
y = tf.matmul(x,w)
labels = tf.placeholder("float",[1,3],name="labels")
relu_out = tf.nn.relu(y)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(relu_out,labels,name="loss")
optimizer = tf.train.GradientDescentOptimizer(0.5)
train_op = optimizer.minimize(cross_entropy)
e_labels = np.array([[1.0,1.0,0.0]])
sess.run(tf.initialize_all_variables())
for step in range(10):
[out,loss] = sess.run([train_op,cross_entropy],feed_dict={x:np.array([[1.0,2.0,3.0]]),labels: e_labels})
print("the result is:",out)
print("The loss of the function is:",loss)
Till now I changed label values (e_labels) and the input values (x) but anyway I always get a None result. My question is: Is that None Value normal? I don't think so, but if someone could tell me, I would be glad to know what can I do and how to solve it.

I assume you mean that the value of out (i.e., the first return value from sess.run([train_op, cross_entropy], ...)) is None.
This is perfectly normal: train_op is a tf.Operation, and when you pass a tf.Operation to tf.Session.run() (quoting the docs) "The corresponding fetched value will be None."
You can think of a tf.Operation like a function with a void return type (in a language like C or Java). It's something that you run() to cause a side effect (i.e., updating the variables) but it doesn't have a meaningful return value itself.

Related

How to create a custom layer in Keras with 'stateful' variables/tensors?

I would like to ask you some help for creating my custom layer.
What I am trying to do is actually quite simple: generating an output layer with 'stateful' variables, i.e. tensors whose value is updated at each batch.
In order to make everything more clear, here is a snippet of what I would like to do:
def call(self, inputs)
c = self.constant
m = self.extra_constant
update = inputs*m + c
X_new = self.X_old + update
outputs = X_new
self.X_old = X_new
return outputs
The idea here is quite simple:
X_old is initialized to 0 in the def__ init__(self, ...)
update is computed as a function of the inputs to the layer
the output of the layer is computed (i.e. X_new)
the value of X_old is set equal to X_new so that, at the next batch, X_old is no longer equal to zero but equal to X_new from the previous batch.
I have found out that K.update does the job, as shown in the example:
X_new = K.update(self.X_old, self.X_old + update)
The problem here is that, if I then try to define the outputs of the layer as:
outputs = X_new
return outputs
I will receiver the following error when I try model.fit():
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have
gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
And I keep having this error even though I imposed layer.trainable = False and I did not define any bias or weights for the layer. On the other hand, if I just do self.X_old = X_new, the value of X_old does not get updated.
Do you guys have a solution to implement this? I believe it should not be that hard, since also stateful RNN have a 'similar' functioning.
Thanks in advance for your help!
Defining a custom layer can become confusing some times. Some of the methods that you override are going to be called once but it gives you the impression that just like many other OO libraries/frameworks, they are going to be called many times.
Here is what I mean: When you define a layer and use it in a model the python code that you write for overriding call method is not going to be directly called in forward or backward passes. Instead, it's called only once when you call model.compile. It compiles the python code to a computational graph and that graph in which the tensors will flow is what does the computations during training and prediction.
That's why if you want to debug your model by putting a print statement it won't work; you need to use tf.print to add a print command to the graph.
It is the same situation with the state variable you want to have. Instead of simply assigning old + update to new you need to call a Keras function that adds that operation to the graph.
And note that tensors are immutable so you need to define the state as tf.Variable in the __init__ method.
So I believe this code is more like what you're looking for:
class CustomLayer(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(CustomLayer, self).__init__(**kwargs)
self.state = tf.Variable(tf.zeros((3,3), 'float32'))
self.constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.extra_constant = tf.constant([[1,1,1],[1,0,-1],[-1,0,1]], 'float32')
self.trainable = False
def call(self, X):
m = self.constant
c = self.extra_constant
outputs = self.state + tf.matmul(X, m) + c
tf.keras.backend.update(self.state, tf.reduce_sum(outputs, axis=0))
return outputs

Error in Value function approximator = ValueError: No gradients provided for any variable

I am trying to implement a Vanilla Policy gradient, which is basically a REINFORCE algorithm that uses an Advantage function. For estimating the Advantage function the Value function V(s) has to be computed. REINFORCE with just Return works but after trying to replace it with Advantage function I'm getting an error: ValueError: No gradients provided for any variable
Thank you for your help, if it helps, I will send you the entire code
# make action selection op (outputs int actions, sampled from policy)
actions = tf.squeeze(tf.multinomial(logits=logits,num_samples=1), axis=1)
#computing value function
value_app = tf.squeeze(funct.critic_nn(obs_ph), axis=1)
# make loss function whose gradient, for the right data, is policy gradient
weights_ph = tf.placeholder(shape=(None,), dtype=tf.float32)
adv_ph = tf.placeholder(shape=(None,), dtype=tf.float32)
v_ph = tf.placeholder(shape=(None,), dtype=tf.float32)
act_ph = tf.placeholder(shape=(None,), dtype=tf.int32)
#Loss for actor
action_masks = tf.one_hot(act_ph, n_acts)
log_probs = tf.reduce_sum(action_masks * tf.nn.log_softmax(logits), axis=1)
loss = -tf.reduce_mean(adv_ph * log_probs)
#Loss for critic
critic_loss = tf.reduce_mean((v_ph - weights_ph)**2)
#optimizers
train_actor = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
train_critic = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(critic_loss)
Ive figured it out. The problem was that I built the critic loss function using placeholder, which I fed with records from the Value function neural network. However, it is needed to not use the placeholder (v_ph) but to actually use real outputs from neural network. It means that you should record states from environment and in a training phase feed that records through the value function approximator and use its output to build loss function, which will be minimized.
critic_loss = tf.reduce_mean((value_app - weights_ph)**2)

How can I see values in tensor object? How can we see what's going on inside tensor object?

Why can't I see values in the tensorflow object? I don't know what values are going in object and how to see them. Seeing values in objects will solve my problem. I am finding tensorflow difficult because you can't see what's going on inside objects.
I have tried tf.Print() but it is not working
How can I see "predict_op" value? I don't know what is inside it. It is really important for me to see the values.
predict_op = tf.argmax(Z3, 1) #Will return max value column index.
correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))
# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
test_accuracy = accuracy.eval({X: X_test, Y: Y_test})
print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)
Also if I run below code it gives error because I don't know what "tf.argmax(Y, 1)" is giving me.
con = tf.confusion_matrix(labels=tf.argmax(Y, 1),
predictions=tf.argmax(Z3, 1))
sess = tf.Session()
with sess.as_default():
print(sess.run(con))
In tensorflow, a tensor is, roughly, something that has a shape, a numerical representation in some curcumstances. Namely, a variable is a tensor and a tf.matmul produces a tensor, and a tf.placeholder is a tensor. All of them have a shape, but act drastically different when it comes to "what is a value of a tensor question?".
A variable once initialized always has a value - that is what we all are familiar with. A tensor like tf.matmul is an operation. Operations only describe what should be done with it's inputs. Operations only have value once you provide an input (or an input of an input, if op depends on another op). They are like functions, that descrive what to do, but you can never tell what is the ouput without providing an input. Placeholders, while still being a tensor, never have a value at all.
That said, if you, for example, want to debug a line tf.matmul(a, b) you must go on with running next code:
a_mul_b_op = tf.matmul(a, b)
a, b, a_mul_b = sess.run([a, b, a_mul_b_op], {x: input_x, y: input_y, etc: etc})
print(a, b, a_mul_b)
If you would like to read a value of variable (variables persist in memory in between calls to sess.run unlike operational tensors) you can go for either of next 2 ways that are equivalent:
print(var_conv42.eval())
print(sess.run([var_conv42]))
You probably need to go through the Introduction to TensorFlow article to understand how TensorFlow works. But here's a brief summary.
Define-by-run vs define-then-run
A TensorFlow program doesn't execute like a normal python script. A python scripts are define-by-run programs, meaning anything once defined you can change/see values. However TensorFlow programs are define-then-run. TensorFlow first builds a computational graph and then executes parts of/whole graph using a Session object. More info in the linke above.
Solving the problem with your code
If you want to see the value of predict_op you need to feed in the inputs/placeholders required to compute that particular tensor. For example say (I don't know how you are computing Z3 so I am assuming a simple computation),
X1 = tf.placeholder(…)
X2 = tf.placeholder(…)
Z3 = X1 + X2
predict_op = tf.argmax(Z3, 1)
Then you need to do the following to get the value of predict_op,
sess.run(predict_op, feed_dict={X1:<value>, X2:<value>})

`get_variable()` doesn't recognize existing variables for tf.estimator

This question has been asked here, difference is my problem is focused on Estimator.
Some context: We have trained a model using estimator and get some variable defined within Estimator input_fn, this function preprocesses data to batches. Now, we are moving to prediction. During the prediction, we use the same input_fn to read in and process the data. But got error saying variable (word_embeddings) does not exist (variables exist in the chkp graph), here's the relevant bit of code in input_fn:
with tf.variable_scope('vocabulary', reuse=tf.AUTO_REUSE):
if mode == tf.estimator.ModeKeys.TRAIN:
word_to_index, word_to_vec = load_embedding(graph_params["word_to_vec"])
word_embeddings = tf.get_variable(initializer=tf.constant(word_to_vec, dtype=tf.float32),
trainable=False,
name="word_to_vec",
dtype=tf.float32)
else:
word_embeddings = tf.get_variable("word_to_vec", dtype=tf.float32)
basically, when it's in prediction mode, else is invoked to load up variables in checkpoint. Failure of recognizing this variable indicates a) inappropriate usage of scope; b) graph is not restored. I don't think scope matters that much here as long as reuse is set properly.
I suspect that is because the graph is not yet restored at input_fn phase. Usually, the graph is restored by calling saver.restore(sess, "/tmp/model.ckpt") reference. Investigation of estimator source code doesn't get me anything relating to restore, the best shot is MonitoredSession, a wrapper of training. It's already been stretch so much from the original problem, not confident if I'm on the right path, I'm looking for help here if anyone has any insights.
One line summary of my question: How does graph get restored within tf.estimator, via input_fn or model_fn?
Hi I think that you error comes simply because you didn't specify the shape in the tf.get_variable (at predict) , it seems that you need to specify the shape even if the variable is going to be restored.
I've made the following test with a simple linear regressor estimator that simply needs to predict x + 5
def input_fn(mode):
def _input_fn():
with tf.variable_scope('all_input_fn', reuse=tf.AUTO_REUSE):
if mode == tf.estimator.ModeKeys.TRAIN:
var_to_follow = tf.get_variable('var_to_follow', initializer=tf.constant(20))
x_data = np.random.randn(1000)
labels = x_data + 5
return {'x':x_data}, labels
elif mode == tf.estimator.ModeKeys.PREDICT:
var_to_follow = tf.get_variable("var_to_follow", dtype=tf.int32, shape=[])
return {'x':[0,10,100,var_to_follow]}
return _input_fn
featcols = [tf.feature_column.numeric_column('x')]
model = tf.estimator.LinearRegressor(featcols, './outdir')
This code works perfectly fine, the value of the const is 20 and also for fun use it in my test set to confirm :p
However if you remove the shape=[] , it breaks, you can also give another initializer such as tf.constant(500) and everything will work and 20 will be used.
By running
model.train(input_fn(tf.estimator.ModeKeys.TRAIN), max_steps=10000)
and
preds = model.predict(input_fn(tf.estimator.ModeKeys.PREDICT))
print(next(preds))
You can visualize the graph and you'll see that a) the scoping is normal and b) the graph is restored.
Hope this will help you.

Variable rnn/gru_cell/gates/weights already exists, disallowed

I want to implement the code in th book Tesorflow for Machine Intelligence, the code runs well at the first time,but when run it again ,the error
"Variable rnn/gru_cell/gates/weights already exists, disallowed" occurs. when I restart the console the error disapear and it occurs after the first running or debug. the code is below:
def prediction(self):
output, _ = tf.nn.dynamic_rnn(tf.contrib.rnn.GRUCell(300),
self.data,
dtype = tf.float32,
sequence_length = self.length)
last = self._last_relevant(output, self.length)
#softmax层
num_classes =int(self.target.get_shape()[1])
weight = tf.Variable(tf.truncated_normal([self.params.rnn_hidden, num_classes], stddev = 0.01))
bias = tf.Variable(tf.constant(0.1, shape = [num_classes]))
prediction = tf.nn.softmax(tf.matmul(last, weight) + bias)
return prediction
anyone can help me with the problem?
Code that adds things to your graph (which includes pretty much everything in the function you posted) should usually only be run once. When you want to train your model or have it make a prediction, you would use something like sess.run with a feed_dict and the ops you want output from.
If you actually want to delete your graph without restarting the console, you can use tf.reset_default_graph().