Conditional execution in TensorFlow - tensorflow

How can I choose to execute a portion of the graph based on a condition?
I have a part of my network which is to be executed only if a placeholder value is provided in feed_dict. An alternate path is taken if the value is not provided. How do I go about implementing this using tensorflow?
Here are the relevant portions of my code:
sess.run(accuracy, feed_dict={inputs: mnist.test.images, outputs: mnist.test.labels})
N = tf.shape(outputs)
cost = 0
if N > 0:
y_N = tf.slice(h_c, [0, 0], N)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y_N, outputs, name='xentropy')
cost = tf.reduce_mean(cross_entropy, name='xentropy_mean')
In the above code, I'm looking for something to use in the place of if N > 0:

Hrm. It's possible that what you want is tf.control_flow_ops.cond()
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/control_flow_ops.py#L597
But that's not exported into the tf namespace, and I'm answering without checking how guaranteed-stable this interface is, but it's used in released models, so go for it. :)
However: Because you actually know in advance what path you want when you construct the feed_dict, you could also take a different approach of invoking a separate path through your model. The standard way to do this is to, e.g., set up code like:
def model(input, n_greater_than):
... cleverness ...
if n_greater_than:
... other cleverness...
return tf.reduce_mean(input)
out1 = model(input, True)
out2 = model(input, False)
And then pull the out1 or out2 nodes depending upon what you know when you're about to run your computation and set the feed_dict. Remember that by default, if the model references the same variables (create them outside the model() func), then you'll basically have two separate paths through.
You can see an example of this in the convolutional mnist example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/mnist/convolutional.py#L165
I'm a fan of doing it this way without introducing control flow dependencies if you can.

Here is a simple example, that can get you started. It executes different parts of the graph based on the shape of the tensor:
import tensorflow as tf
a = tf.Variable([[3.0, 3.0], [3.0, 3.0]])
b = tf.Variable([[1.0, 1.0], [2.0, 2.0]])
l = tf.shape(a)
add_op, sub_op = tf.add(a, b), tf.sub(a, b)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
t = sess.run(l)
print sess.run(sub_op if t[0] == 3 else add_op)
sess.close()
Change 3 to 2 to see how tensor will be subtracted. As you see I initiated the nodes for add and sub and shape, then in the graph I check for the shape and go run the specific part.

Related

I cant understand LSTM implementation in tensorflow 1

I have been looking at an implementation of LSTM layers in a neural network architecture. An LSTM layer has been defined in it as given below. I am having trouble understanding this code. I have listed my doubts after the code snippet.
code source:https://gist.github.com/awjuliani/66e8f477fc1ad000b1314809d8523455#file-a3c-py
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(RNN_SIZE,state_is_tuple=True)
c_init = np.zeros((1, lstm_cell.state_size.c), np.float32)
h_init = np.zeros((1, lstm_cell.state_size.h), np.float32)
state_init = [c_init, h_init]
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
state_in = (c_in, h_in)
rnn_in = tf.expand_dims(self.h3, [0])
step_size = tf.shape(inputs)[:1]
state_in = tf.nn.rnn_cell.LSTMStateTuple(c_in, h_in)
lstm_outputs, lstm_state = tf.nn.dynamic_rnn(
lstm_cell, rnn_in, initial_state=state_in, sequence_length=step_size,
time_major=False)
lstm_c, lstm_h = lstm_state
state_out = (lstm_c[:1, :], lstm_h[:1, :])
self.rnn_out = tf.reshape(lstm_outputs, [-1, RNN_SIZE])
Here are my doubts:
I understand we need to initialize a random Context and hidden
vectors to pass to our first LSTM cell. But why do initialize both c_init, h_init and then c_in, h_in. What purpose do they serve?
How are they different from each other? (same for state_in and state_init?)
Why do we use LSTMStateTuple?
def work(self, max_episode_length, gamma, sess, coord, saver, dep):
........
rnn_state = self.local_AC.state_init
def train(self, rollout, sess, gamma, bootstrap_value):
......
rnn_state = self.local_AC.state_init
feed_dict = {self.local_AC.target_v: discounted_rewards,
self.local_AC.inputs: np.vstack(observations),
self.local_AC.actions: actions,
self.local_AC.advantages: advantages,
self.local_AC.state_in[0]: rnn_state[0],
self.local_AC.state_in[1]: rnn_state[1]}
At the beginning of work, and then
before training a new batch, the network state is filled with zeros
I understand we need to initialize a random Context and hidden vectors to pass to our first LSTM cell. But why do initialize both c_init, h_init, and then c_in, h_in. What purpose do they serve? How are they different from each other? (same for state_in and state_init?)
To start using LSTM, one should initialise its cell and state state - named c and h respectively. For every input, these states are considered 'empty' and should be initialised with zeros. So that, we have here
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
state_in = (c_in, h_in)
state_in = tf.nn.rnn_cell.LSTMStateTuple(c_in, h_in)
Why are there are two variables, state_in and state_init? The first is just placeholders that will be initialised with the second at the evaluation state (i.e., session.run). Because state_in doesn't contain any actual values, in other words, numpy arrays are used during the training phase and tf.placeholders during the phase when one defines an architecture of the network.
TL;DR
Why so? Well, tf1.x (was?) is quite a low-level system. It has the following entities:
tf.Session aka computational session - thing that contain a computational graph(s) and allows user to provide inputs to the graph(s) via session.run.
tf.Graph, that is a representation of a computational graph. Usually engineer defines graph using tf.placeholders and tf.Variabless. One could connect them 'just like' math operations:
with tf.Session() as sess:
a = tf.placeholder(tf.float32, (1,))
b = tf.Variable(1.0, dtype=tf.float32)
tf.global_variables_initializer()
c = a * b
# ...and so on
tf. placeholder's are placeholers, but not actual values, intended to be filled with actual values at the session.run stage. And tf.Variables, well, for the actual weights of the neural network to be optimized. Why not plain NumPy arrays, but something else? It's because TensorFlow automatically adds each tensor and placeholder as an edge to the default computational graph (it's impossible to do the same with NumPy arrays); also, it allows to define an architecture and then initialize/train it with different inputs, which is good.
So, to do a computation (forward/backward propagation, etc.), one has to set placeholders and variables to some values. To do so, in a simple example, we could do the following:
import tensorflow as tf
with tf.compat.v1.Session() as sess:
a = tf.compat.v1.placeholder(tf.float32, shape=())
b = tf.compat.v1.Variable(1.0, dtype=tf.float32)
init = tf.compat.v1.global_variables_initializer()
c = a + b
sess.run(init)
a_value = 2.0
result = sess.run([c], feed_dict={a: a_value})
print("value of [c]:", result)
(I use tf.compat.v1 instead of just tf here because I work in tf2 environment; you could omit it)
Note two things: first, I create init operation. Because in tf1.x it is not enough to initialize a variable like tf.Variable(1.0), but the user has to kinda 'notify' the framework about creating and running init operation.
Then I do a computation: I initialize an a_value variable and map it to the placeholder a' in the sess.runmethod.Session.run` requires a list of tensors to be calculated as a first argument and a mapping from placeholders necessary to compute target tensors to their actual values.
Back to your example: state_in is a placeholder and state_init contains values to be fed into this placeholder somewhere in the code.
It would look like this: less.run(..., feed_dict={state_in: state_init, ...}).
Why do we use LSTMStateTuple?
Addressing the second part of the question: it looks like TensorFlow developers implemented it for some performance optimization. From the source code:
logging.warning(
"%s: Using a concatenated state is slower and will soon be"
"deprecated. Use state_is_tuple=True.", self)
and if state_is_tuple=True, state should be a StateTuple. But I'm not 100% sure about it - I don't remember how I used it. After all, StateTuple is just a collections.namedtuple with two named attributes, c and h.

Tensorflow Graph - check if a node depends on a placeholder

In a Tensorflow graph, is there a way to find out if a node depends on a placeholder, like node.depends(placeholder) -> Bool
import tensorflow as tf
x = tf.placeholder(name='X', dtype=tf.int64, shape=[])
y = tf.placeholder(name='Y', dtype=tf.int64, shape=[])
p = tf.add(x, y)
q = tf.add(x, x)
sess = tf.Session()
result = sess.run([p,q], feed_dict={x: 1, y: 2})
print(result)
result = sess.run([p,q], feed_dict={x: 1, y: 3})
print(result)
In the code example above, q does not depend on y. In the second call of session.run, we modify only y. Thus q does not need to be evaluated again. Does session automatically reuse existing values in these cases? If so, is there any way to find out which are the nodes that were evaluated during .run?
Otherwise, if I can quickly find out which nodes are dependent on the placeholders I modify, I can send only those as input to run, and reuse existing values (in a dictionary as cache).
The idea is to avoid costly evaluations and more importantly, in my application, to minimize costly operations (outside tensorflow) that need to be triggered whenever the output nodes change - a necessity in my application.
Checking dependency between tensors in a graph can be done with a function like this:
import tensorflow as tf
# Checks if tensor a depends on tensor b
def tensor_depends(a, b):
if a.graph is not b.graph:
return False
gd = a.graph.as_graph_def()
gd_sub = tf.graph_util.extract_sub_graph(gd, [a.op.name])
return b.op.name in {n.name for n in gd_sub.node}
For example:
import tensorflow as tf
x = tf.placeholder(name='X', dtype=tf.int64, shape=[])
y = tf.placeholder(name='Y', dtype=tf.int64, shape=[])
p = tf.add(x, y)
q = tf.add(x, x)
print(tensor_depends(q, x))
# True
print(tensor_depends(q, y))
# False
About your questions, generally TensorFlow does recompute everything on each run, even if the inputs do not change. If you want to cache results, you can do it yourself, at a higher level - for TensorFlow, it would not be clear what results it should keep (only the latest ones, a few recent ones, ...). And in any case, even if the inputs do not change, the output may change, as is the case with recurrent models, or more generally due to any change in stateful objects such as variables or datasets. There are some optimization opportunities that are lost, but it would probably not be reasonable to expect TensorFlow to address them (analyze the model to determine whether it can cache results or not, what results can be cached, how much, how to configure it, etc.). If you know that some intermediate values in the graph will not change, you can also compute that part and then feed it as input (you can actually feed values for any tensor in the feed_dict). Otherwise, as you suggest, you can just check what depends on what and recompute only as necessary.

How can I see values in tensor object? How can we see what's going on inside tensor object?

Why can't I see values in the tensorflow object? I don't know what values are going in object and how to see them. Seeing values in objects will solve my problem. I am finding tensorflow difficult because you can't see what's going on inside objects.
I have tried tf.Print() but it is not working
How can I see "predict_op" value? I don't know what is inside it. It is really important for me to see the values.
predict_op = tf.argmax(Z3, 1) #Will return max value column index.
correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))
# Calculate accuracy on the test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
test_accuracy = accuracy.eval({X: X_test, Y: Y_test})
print("Train Accuracy:", train_accuracy)
print("Test Accuracy:", test_accuracy)
Also if I run below code it gives error because I don't know what "tf.argmax(Y, 1)" is giving me.
con = tf.confusion_matrix(labels=tf.argmax(Y, 1),
predictions=tf.argmax(Z3, 1))
sess = tf.Session()
with sess.as_default():
print(sess.run(con))
In tensorflow, a tensor is, roughly, something that has a shape, a numerical representation in some curcumstances. Namely, a variable is a tensor and a tf.matmul produces a tensor, and a tf.placeholder is a tensor. All of them have a shape, but act drastically different when it comes to "what is a value of a tensor question?".
A variable once initialized always has a value - that is what we all are familiar with. A tensor like tf.matmul is an operation. Operations only describe what should be done with it's inputs. Operations only have value once you provide an input (or an input of an input, if op depends on another op). They are like functions, that descrive what to do, but you can never tell what is the ouput without providing an input. Placeholders, while still being a tensor, never have a value at all.
That said, if you, for example, want to debug a line tf.matmul(a, b) you must go on with running next code:
a_mul_b_op = tf.matmul(a, b)
a, b, a_mul_b = sess.run([a, b, a_mul_b_op], {x: input_x, y: input_y, etc: etc})
print(a, b, a_mul_b)
If you would like to read a value of variable (variables persist in memory in between calls to sess.run unlike operational tensors) you can go for either of next 2 ways that are equivalent:
print(var_conv42.eval())
print(sess.run([var_conv42]))
You probably need to go through the Introduction to TensorFlow article to understand how TensorFlow works. But here's a brief summary.
Define-by-run vs define-then-run
A TensorFlow program doesn't execute like a normal python script. A python scripts are define-by-run programs, meaning anything once defined you can change/see values. However TensorFlow programs are define-then-run. TensorFlow first builds a computational graph and then executes parts of/whole graph using a Session object. More info in the linke above.
Solving the problem with your code
If you want to see the value of predict_op you need to feed in the inputs/placeholders required to compute that particular tensor. For example say (I don't know how you are computing Z3 so I am assuming a simple computation),
X1 = tf.placeholder(…)
X2 = tf.placeholder(…)
Z3 = X1 + X2
predict_op = tf.argmax(Z3, 1)
Then you need to do the following to get the value of predict_op,
sess.run(predict_op, feed_dict={X1:<value>, X2:<value>})

Confused about tensorflow variable shapes

I am learning tensorflow using the tensorflow machine learning cookbook (https://github.com/nfmcclure/tensorflow_cookbook). I am currently on the NLP chapter (07). I am very confused about how one decides on the dimensions of the tensorflow variables. For example, in the bag of words example they use:
# Create variables for logistic regression
A = tf.Variable(tf.random_normal(shape=[embedding_size,1]))
b = tf.Variable(tf.random_normal(shape=[1,1]))
# Initialize placeholders
x_data = tf.placeholder(shape=[sentence_size], dtype=tf.int32)
y_target = tf.placeholder(shape=[1, 1], dtype=tf.float32)
and in the tf-idf example they use:
# Create variables for logistic regression
A = tf.Variable(tf.random_normal(shape=[max_features,1]))
b = tf.Variable(tf.random_normal(shape=[1,1]))
x_data = tf.placeholder(shape=[None, max_features], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
How does one decide on when to use None vs. 1 in the placeholder shapes? Thank you!
Using None as part of a shape, means it will be determined when you run the session.
This is useful for training in what is called batch training where you feed each iteration of the training process a fixed size subset of the data.
So if you kept it at None you can switch between batch sizes without a problem. (Although you won't be doing so in the same session, but every session you can try a different batch size)
When you state a specific shape, that is what it will be and that is the only shape that can be fed to it during the session (using the feed_dict param)
In your specific example, the first part of code, the shape of y_target will always be [1, 1] where in the second part of code, y_target could be [10, 1] / [200, 1] / [<whatever>, 1]
'None' should be used when count of elements in placeholder is unknown in advance. But for example in x_data placeholder if count of data elements is 1 i.e. it is known in advance, then you can replace 'None' with 1.

What's the difference between tf.cond and if-else?

What difference between tf.cond and if-else?
Scenario 1
import tensorflow as tf
x = 'x'
y = tf.cond(tf.equal(x, 'x'), lambda: 1, lambda: 0)
with tf.Session() as sess:
print(sess.run(y))
x = 'y'
with tf.Session() as sess:
print(sess.run(y))
Scenario 2
import tensorflow as tf
x = tf.Variable('x')
y = tf.cond(tf.equal(x, 'x'), lambda: 1, lambda: 0)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
print(sess.run(y))
tf.assign(x, 'y')
with tf.Session() as sess:
init.run()
print(sess.run(y))
The outputs are both 1.
Does it mean only tf.placeholder can work, and not all the tensor, such as tf.variable? When should I choose if-else condition and when to use tf.cond? What are the diffences between them?
tf.cond is evaluated at the runtime, whereas if-else is evaluated at the graph construction time.
If you want to evaluate your condition depending on the value of the tensor at the runtime, tf.cond is the best option.
Did you mean if ... else in Python vs. tf.cond?
You can use if ... else for creating different graph for different external conditions. For example you can make one python script for graphs with 1, 2, 3 hidden layers, and use command line parameters for select which one use.
tf.cond is for add condition block to the graph. For example, you can define Huber function by code like this:
import tensorflow as tf
delta = tf.constant(1.)
x = tf.placeholder(tf.float32, shape=())
def left(x):
return tf.multiply(x, x) / 2.
def right(x):
return tf.multiply(delta, tf.abs(x) - delta / 2.)
hubber = tf.cond(tf.abs(x) <= delta, lambda: left(x), lambda: right(x))
and calculation in Graph will go by different branch for different input data.
sess = tf.Session()
with sess.as_default():
sess.run(tf.global_variables_initializer())
print(sess.run(hubber, feed_dict = {x: 0.5}))
print(sess.run(hubber, feed_dict = {x: 1.0}))
print(sess.run(hubber, feed_dict = {x: 2.0}))
> 0.125
> 0.5
> 1.5
Since the graph in TensorFlow is static, you cannot modify it once built. Thus you can use if-else outside of the graph at anytime for example while preparing batches and etc., but you can also employ it while constructing the graph. That is, if the condition doesn't depend on the value of any tensor, for example the dimention(having been set) of the tensor or the shape of any tensor. In such scenarios the graph will not be changed due to the condition while excuting the graph. The graph has been fixed after you finished drawing the graph and the if-else condition would not affect the graph while excuting the graph.
But if the condition depends on the value of the tensor in it that condition should be included in the graph and hence tf.cond should be applied.
Simply put: if else is how you do switch in Python, while tf.cond is how you do switch in Tensorflow. During running, if else is fixed in the compiled Python program, while tf.cond is fixed in the constructed Tensorflow graph.
You can think of tf.cond as the Tensorflow's internal way of doing if else.