What's the difference between tf.cond and if-else? - tensorflow

What difference between tf.cond and if-else?
Scenario 1
import tensorflow as tf
x = 'x'
y = tf.cond(tf.equal(x, 'x'), lambda: 1, lambda: 0)
with tf.Session() as sess:
print(sess.run(y))
x = 'y'
with tf.Session() as sess:
print(sess.run(y))
Scenario 2
import tensorflow as tf
x = tf.Variable('x')
y = tf.cond(tf.equal(x, 'x'), lambda: 1, lambda: 0)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
print(sess.run(y))
tf.assign(x, 'y')
with tf.Session() as sess:
init.run()
print(sess.run(y))
The outputs are both 1.
Does it mean only tf.placeholder can work, and not all the tensor, such as tf.variable? When should I choose if-else condition and when to use tf.cond? What are the diffences between them?

tf.cond is evaluated at the runtime, whereas if-else is evaluated at the graph construction time.
If you want to evaluate your condition depending on the value of the tensor at the runtime, tf.cond is the best option.

Did you mean if ... else in Python vs. tf.cond?
You can use if ... else for creating different graph for different external conditions. For example you can make one python script for graphs with 1, 2, 3 hidden layers, and use command line parameters for select which one use.
tf.cond is for add condition block to the graph. For example, you can define Huber function by code like this:
import tensorflow as tf
delta = tf.constant(1.)
x = tf.placeholder(tf.float32, shape=())
def left(x):
return tf.multiply(x, x) / 2.
def right(x):
return tf.multiply(delta, tf.abs(x) - delta / 2.)
hubber = tf.cond(tf.abs(x) <= delta, lambda: left(x), lambda: right(x))
and calculation in Graph will go by different branch for different input data.
sess = tf.Session()
with sess.as_default():
sess.run(tf.global_variables_initializer())
print(sess.run(hubber, feed_dict = {x: 0.5}))
print(sess.run(hubber, feed_dict = {x: 1.0}))
print(sess.run(hubber, feed_dict = {x: 2.0}))
> 0.125
> 0.5
> 1.5

Since the graph in TensorFlow is static, you cannot modify it once built. Thus you can use if-else outside of the graph at anytime for example while preparing batches and etc., but you can also employ it while constructing the graph. That is, if the condition doesn't depend on the value of any tensor, for example the dimention(having been set) of the tensor or the shape of any tensor. In such scenarios the graph will not be changed due to the condition while excuting the graph. The graph has been fixed after you finished drawing the graph and the if-else condition would not affect the graph while excuting the graph.
But if the condition depends on the value of the tensor in it that condition should be included in the graph and hence tf.cond should be applied.

Simply put: if else is how you do switch in Python, while tf.cond is how you do switch in Tensorflow. During running, if else is fixed in the compiled Python program, while tf.cond is fixed in the constructed Tensorflow graph.
You can think of tf.cond as the Tensorflow's internal way of doing if else.

Related

I cant understand LSTM implementation in tensorflow 1

I have been looking at an implementation of LSTM layers in a neural network architecture. An LSTM layer has been defined in it as given below. I am having trouble understanding this code. I have listed my doubts after the code snippet.
code source:https://gist.github.com/awjuliani/66e8f477fc1ad000b1314809d8523455#file-a3c-py
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(RNN_SIZE,state_is_tuple=True)
c_init = np.zeros((1, lstm_cell.state_size.c), np.float32)
h_init = np.zeros((1, lstm_cell.state_size.h), np.float32)
state_init = [c_init, h_init]
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
state_in = (c_in, h_in)
rnn_in = tf.expand_dims(self.h3, [0])
step_size = tf.shape(inputs)[:1]
state_in = tf.nn.rnn_cell.LSTMStateTuple(c_in, h_in)
lstm_outputs, lstm_state = tf.nn.dynamic_rnn(
lstm_cell, rnn_in, initial_state=state_in, sequence_length=step_size,
time_major=False)
lstm_c, lstm_h = lstm_state
state_out = (lstm_c[:1, :], lstm_h[:1, :])
self.rnn_out = tf.reshape(lstm_outputs, [-1, RNN_SIZE])
Here are my doubts:
I understand we need to initialize a random Context and hidden
vectors to pass to our first LSTM cell. But why do initialize both c_init, h_init and then c_in, h_in. What purpose do they serve?
How are they different from each other? (same for state_in and state_init?)
Why do we use LSTMStateTuple?
def work(self, max_episode_length, gamma, sess, coord, saver, dep):
........
rnn_state = self.local_AC.state_init
def train(self, rollout, sess, gamma, bootstrap_value):
......
rnn_state = self.local_AC.state_init
feed_dict = {self.local_AC.target_v: discounted_rewards,
self.local_AC.inputs: np.vstack(observations),
self.local_AC.actions: actions,
self.local_AC.advantages: advantages,
self.local_AC.state_in[0]: rnn_state[0],
self.local_AC.state_in[1]: rnn_state[1]}
At the beginning of work, and then
before training a new batch, the network state is filled with zeros
I understand we need to initialize a random Context and hidden vectors to pass to our first LSTM cell. But why do initialize both c_init, h_init, and then c_in, h_in. What purpose do they serve? How are they different from each other? (same for state_in and state_init?)
To start using LSTM, one should initialise its cell and state state - named c and h respectively. For every input, these states are considered 'empty' and should be initialised with zeros. So that, we have here
c_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.c])
h_in = tf.placeholder(tf.float32, [1, lstm_cell.state_size.h])
state_in = (c_in, h_in)
state_in = tf.nn.rnn_cell.LSTMStateTuple(c_in, h_in)
Why are there are two variables, state_in and state_init? The first is just placeholders that will be initialised with the second at the evaluation state (i.e., session.run). Because state_in doesn't contain any actual values, in other words, numpy arrays are used during the training phase and tf.placeholders during the phase when one defines an architecture of the network.
TL;DR
Why so? Well, tf1.x (was?) is quite a low-level system. It has the following entities:
tf.Session aka computational session - thing that contain a computational graph(s) and allows user to provide inputs to the graph(s) via session.run.
tf.Graph, that is a representation of a computational graph. Usually engineer defines graph using tf.placeholders and tf.Variabless. One could connect them 'just like' math operations:
with tf.Session() as sess:
a = tf.placeholder(tf.float32, (1,))
b = tf.Variable(1.0, dtype=tf.float32)
tf.global_variables_initializer()
c = a * b
# ...and so on
tf. placeholder's are placeholers, but not actual values, intended to be filled with actual values at the session.run stage. And tf.Variables, well, for the actual weights of the neural network to be optimized. Why not plain NumPy arrays, but something else? It's because TensorFlow automatically adds each tensor and placeholder as an edge to the default computational graph (it's impossible to do the same with NumPy arrays); also, it allows to define an architecture and then initialize/train it with different inputs, which is good.
So, to do a computation (forward/backward propagation, etc.), one has to set placeholders and variables to some values. To do so, in a simple example, we could do the following:
import tensorflow as tf
with tf.compat.v1.Session() as sess:
a = tf.compat.v1.placeholder(tf.float32, shape=())
b = tf.compat.v1.Variable(1.0, dtype=tf.float32)
init = tf.compat.v1.global_variables_initializer()
c = a + b
sess.run(init)
a_value = 2.0
result = sess.run([c], feed_dict={a: a_value})
print("value of [c]:", result)
(I use tf.compat.v1 instead of just tf here because I work in tf2 environment; you could omit it)
Note two things: first, I create init operation. Because in tf1.x it is not enough to initialize a variable like tf.Variable(1.0), but the user has to kinda 'notify' the framework about creating and running init operation.
Then I do a computation: I initialize an a_value variable and map it to the placeholder a' in the sess.runmethod.Session.run` requires a list of tensors to be calculated as a first argument and a mapping from placeholders necessary to compute target tensors to their actual values.
Back to your example: state_in is a placeholder and state_init contains values to be fed into this placeholder somewhere in the code.
It would look like this: less.run(..., feed_dict={state_in: state_init, ...}).
Why do we use LSTMStateTuple?
Addressing the second part of the question: it looks like TensorFlow developers implemented it for some performance optimization. From the source code:
logging.warning(
"%s: Using a concatenated state is slower and will soon be"
"deprecated. Use state_is_tuple=True.", self)
and if state_is_tuple=True, state should be a StateTuple. But I'm not 100% sure about it - I don't remember how I used it. After all, StateTuple is just a collections.namedtuple with two named attributes, c and h.

TensorFlow get updates computed by optimizer

In tf, the optimizer class only has two function:
compute_gradients
apply_gradients
where apply_gradients returns an op that performs the update w <- w + Δw via a tf.assign_add function.
However I need direct access to the Δw itself. (or w' = w+Δw). I know that the optimizer adds nodes to the computational graph which compute this Δw for each variable. How can I access them? Or do I have to re-implement the optimizer myself?
The reason is that I need to compute gradients dw'/dw, as I am working on something related to gradient based hyperparameter optimization (cf. https://arxiv.org/abs/1703.01785)
The "delta" applied to each variable is not accessible through any common method or name. In fact, looking a bit into the source it seems rather difficult extract, as it varies from one optimizer to the other.
What you can do, at least, is to compute the differences between variable values and their updates. For example it could work like this:
import tensorflow as tf
with tf.Graph().as_default():
# Setup example model
x = tf.placeholder(tf.float32, [None, 1])
y = tf.placeholder(tf.float32, [None, 2])
w = tf.Variable([[1., 2.]], tf.float32)
pred = x # w
loss = (tf.reduce_sum(tf.squared_difference(pred, y))
/ tf.cast(tf.shape(x)[0], tf.float32))
# Record variable values before training step
# (tf.identity should work here but it does not so we use
# a trivial add operation to enforce the control dependency)
w_old = w + 0
# Train after having recorded variable values
with tf.control_dependencies([w_old]):
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
# Compute deltas after training
with tf.control_dependencies([train_op]):
w_delta = w.read_value() - w_old
init_op = tf.global_variables_initializer()
# Test
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(w))
# [[1. 2.]]
_, w_delta_val = sess.run(
[train_op, w_delta],
feed_dict={x: [[1.], [2.]], y: [[3., 4], [5., 6.]]})
print(w_delta_val)
# [[0.79999995 0.5999999 ]]
print(sess.run(w))
# [[1.8 2.6]]
To get the updated w', you get just print the w directly after you have executed optimizer.apply_gradients(). The w at present is the w'.
Now, if you want to acquire the gradient of w, just do the operation of w'-w.

How to force Tensorflow to show a simple linear regression prediction result?

I have a simple linear regression question as below:
My codes are as below:
import tensorflow as tf
import numpy as np
batch_xs=np.array([[0,0,1],[1,1,1],[1,0,1],[0,1,1]])
batch_ys=np.array([[0],[1],[1],[0]])
x = tf.placeholder(tf.float32, [None, 3])
W = tf.Variable(tf.zeros([3, 1]))
b = tf.Variable(tf.zeros([1]))
y = tf.nn.sigmoid(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 1])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
learning_rate = 0.05
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)
sess = tf.Session()
tf.global_variables_initializer().run(session=sess)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
Prediction:
x0=np.array([[1.,0.,0.]])
x0=np.float32(x0)
y0=tf.nn.softmax(tf.matmul(x0,W) + b)
print(y0)
However, print(y0) shows Tensor("Softmax_2:0", shape=(1, 1), dtype=float32) instead of a figure. I expect y0 would be around 0.99.
I tried y0.eval(), but I got ValueError: Cannot evaluate tensor using 'eval()': No default session is registered..
How can I make a change to obtain the result? Thanks!
There are a couple of ways to get things to print out while writing TensorFlow code. Of course, there’s the classic Python built-in, print (Or the function print(), if we’re being Python 3 about it). And then there’s TensorFlow’s print function, tf.Print (notice the capital P).
When working with TensorFlow, it’s important to remember that everything is ultimately a graph computation. This means that if you print a TensorFlow operation using Python’s print, it will simply show a description of what that operation is, since no values have been passed through it yet. It will also often show the dimensions that are expected to be in that node, if they’re known.
If you want to print the values that are ‘flowing’ through a particular part of the graph as it’s being executed, then we need to turn to using tf.Print.

Tensorflow: optimize over input with gradient descent

I have a TensorFlow model (a convolutional neural network) which I successfully trained using gradient descent (GD) on some input data.
Now, in a second step, I would like to provide an input image as initialization then and optimize over this input image with fixed network parameters using GD. The loss function will be a different one, but this a detail.
So, my main question is how to tell the gradient descent algorithm to
stop optimizing the network parameters
to optimize over the input image
The first can probably done with this
Holding variables constant during optimizer
Do you guys have ideas about the second point?
I guess I can recode the gradient descent algorithm myself using the TF gradient function, but my gut feeling tells me that there should be an easier way, which also allows me to benefit from more complex GD variants (Adam etc.).
No need for your SDG own implementation. TensorFlow provides all functions:
import tensorflow as tf
import numpy as np
# some input
data_pldhr = tf.placeholder(tf.float32)
img_op = tf.get_variable('input_image', [1, 4, 4, 1], dtype=tf.float32, trainable=True)
img_assign = img_op.assign(data_pldhr)
# your starting image
start_value = (np.ones((4, 4), dtype=np.float32) + np.eye(4))[None, :, :, None]
# override variable_getter
def nontrainable_getter(getter, *args, **kwargs):
kwargs['trainable'] = False
return getter(*args, **kwargs)
# all variables in this scope are not trainable
with tf.variable_scope('myscope', custom_getter=nontrainable_getter):
x = tf.layers.dense(img_op, 10)
y = tf.layers.dense(x, 10)
# the usual stuff
cost_op = tf.losses.mean_squared_error(x, y)
train_op = tf.train.AdamOptimizer(0.1).minimize(cost_op)
# fire up the training process
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(img_assign, {data_pldhr: start_value})
print(sess.run(img_op))
for i in range(10):
_, c = sess.run([train_op, cost_op])
print(c)
print(sess.run(img_op))
represent an image as tf.Variable with trainable=True
initialise this variable with the starting image (initial guess)
recreate the NN graph using TF variables with trainable=False and copy the weights from the trained NN graph using tf.assign
calculate the loss function
plug the loss into any TF optimiser algorithm you want
Another alternative is to use ScipyOptimizerInterface, which allows to use scipy's minimizer. This supports constrained minimization.
I'm looking for a solution to the same problem, but my model is not an easy one as I have an LSTM network with cells created with MultiRNNCell, I don't think it is possible to get the weight and clone the network. Is there any workaround so that I can compute the gradient wrt the input?

Conditional execution in TensorFlow

How can I choose to execute a portion of the graph based on a condition?
I have a part of my network which is to be executed only if a placeholder value is provided in feed_dict. An alternate path is taken if the value is not provided. How do I go about implementing this using tensorflow?
Here are the relevant portions of my code:
sess.run(accuracy, feed_dict={inputs: mnist.test.images, outputs: mnist.test.labels})
N = tf.shape(outputs)
cost = 0
if N > 0:
y_N = tf.slice(h_c, [0, 0], N)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y_N, outputs, name='xentropy')
cost = tf.reduce_mean(cross_entropy, name='xentropy_mean')
In the above code, I'm looking for something to use in the place of if N > 0:
Hrm. It's possible that what you want is tf.control_flow_ops.cond()
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/control_flow_ops.py#L597
But that's not exported into the tf namespace, and I'm answering without checking how guaranteed-stable this interface is, but it's used in released models, so go for it. :)
However: Because you actually know in advance what path you want when you construct the feed_dict, you could also take a different approach of invoking a separate path through your model. The standard way to do this is to, e.g., set up code like:
def model(input, n_greater_than):
... cleverness ...
if n_greater_than:
... other cleverness...
return tf.reduce_mean(input)
out1 = model(input, True)
out2 = model(input, False)
And then pull the out1 or out2 nodes depending upon what you know when you're about to run your computation and set the feed_dict. Remember that by default, if the model references the same variables (create them outside the model() func), then you'll basically have two separate paths through.
You can see an example of this in the convolutional mnist example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/mnist/convolutional.py#L165
I'm a fan of doing it this way without introducing control flow dependencies if you can.
Here is a simple example, that can get you started. It executes different parts of the graph based on the shape of the tensor:
import tensorflow as tf
a = tf.Variable([[3.0, 3.0], [3.0, 3.0]])
b = tf.Variable([[1.0, 1.0], [2.0, 2.0]])
l = tf.shape(a)
add_op, sub_op = tf.add(a, b), tf.sub(a, b)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
t = sess.run(l)
print sess.run(sub_op if t[0] == 3 else add_op)
sess.close()
Change 3 to 2 to see how tensor will be subtracted. As you see I initiated the nodes for add and sub and shape, then in the graph I check for the shape and go run the specific part.