What makes dynamic_rnn faster to compile? - tensorflow

I have a complicated network with many repeated RNN steps. Compiling takes a long time (30+ minutes, mostly stuck at the gradient step) and I found that this issue might be related, which mentions dynamic_rnn as a much faster way to compile:
Looking over dynamic_rnn, I then reformatted my network to include a while_loop, like so:
#input: tensor with 1000 time steps
def body(i, prev_state):
inp = tf.slice(input, i, 1)
new_state = cell(tf.squeeze(int), prev_state) # Includes scope reuse
return [tf.add(i, tf.constant(1)), new_state]
def cond(i):
return some_cond(i)
tf.while_loop(cond, body, [tf.constant(0), initial_state])
But this didn't seem to help. What besides simply putting the cell call in a loop makes dynamic_rnn so much faster to compile?

Related

Tensorflow error:The graph couldn't be sorted in topological order

When I run my loss function and it will be occur this error or warning.
I really can not figure out what cause it.
I guess that maybe I didn't use the origin input,for example:
def loss(predict,label):
#because some reason I need to extract some values in predict
predictProcessed = process(predict)
#predictProcessed is a subset of predict
loss = tf.square(predict - label)
return loss
My guess is right or not?
And I also use double for-loop in this code,Should the code use less for for-loop?thanks

Why AdamOptimizer fails to find optimal value to minimize x*x?

I am trying to minimize x*x with adagrad optimiser. I expect to get x=0 as result, but I get value x, close to initial value.
import tensorflow as tf
x=tf.Variable(-2.)
sq=x*x
o = tf.train.AdamOptimizer(1e-1).minimize(sq)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run([init])
sess.run([o])
r=sess.run([x])
print("done",r)
I get -1.9 as a result, instead of expected 0.
Do I understand correctly that -2 is initial value here, or is it something else? Does AdamOptimiser perform just one step or is it possible to launch it for continious optimisation? How do I get x=0 as result?
sess.run([0]) runs only a single step. To perform a full optimization, you need to run many steps, which can be done by repeating the single step in a loop.
Thus, you can replace sess.run([o]) with:
for i in range(1000):
sess.run([o])
This yields the results 3.4735016e-23, very close to the expected 0.
In my experience, people usually run many optimization steps just as I demonstrated, with a for loop. If you are interested in implementing the loop as a TensorFlow operation, and then running this operation only once, this can be done, but it is not recommended. The reasons are: (a) I don't think you will gain any "elegance" in your code by doing this. (b) If you want to run 1000 steps, you will need to add 1000 sets of operations to your graph, and group them as one. Contrast this to needing only one set of operations.
You can see more relevant information in this question.

Canonical Tensorflow "for loop"

What is the canonical way of running a Tensorflow "for loop"?
Specifically, suppose we have some body function which does NOT depend on the loop iteration, but must be run n times.
One might think that a good method might be to run this inside of a tf.while_loop like this:
def body(x):
return ...
def while_body(i,x):
return i+1, body(x)
i, x = tf.while_loop(lambda i: tf.less(i, n), while_body, [tf.constant(0),x])
In fact, that is precisely what the highest rated answer in this question suggests:
How can I run a loop with a tensor as its range? (in tensorflow)
However, the tf.while_loop docs specifically say
For correct programs, while_loop should return the same result for any parallel_iterations > 0.
If you put a counter in the body, then it seems that that condition is violated. So it seems that there must be a different way of setting up a "for loop".
Furthermore, even if there is no explicit error, doing so seems like it will create a dependency between iterations meaning that I do not think they will run in parallel.
After some investigation, it seems that the tf.while_loop idiom used above is quite common. Alternatively, one can use tf.scan:
def body( x ):
return ...
def scan_body( previous_output, iteration ):
return body( ... )
x = tf.scan( scan_body, tf.range(n), initializer = [x] )
although I have no idea if one is preferable from a performance point of view. Note in the above that we have to wrap the body function to accept the previous output.

Cache intermediate tensor and update periodically

I have a large tensor that is expensive to calculate, but realistically I only need to recalculate it every 10 iterations or so (during gradient descent). What's the best way to do this?
More specifically:
Suppose I have an intermediate_tensor that is used in the calculation of final_tensor each time the a tf.Session is run. final_tensor is, in my case, a set of modified gradients to use in optimization. It is possible to define a graph that contains both intermediate_tensor and final_tensor. However, running this graph will be inefficient when intermediate_tensor changes slowly. In pseudocode, this is what I'd like to do:
intermediate_tensor = tf.some_operation(earlier_variable)
final_tensor = tf.matmul(intermediate_tensor, other_earlier_variable)
with tf.Session() as sess:
# pretending `partial_run` works like I want it to:
sess.partial_run(intermediate_tensor, feed_dict = {})
for i in range(5):
ft = sess.partial_run(final_tensor, feed_dict = {})
print(ft)
The experimental partial_run feature is almost what I'm looking for. However, partial_run can only be used if I want to evaluate final_tensor just once for each time I evaluate intemediate_tensor. It won't work for a for loop.
My workaround for the moment is to use tf.placeholder. I evaluate intermediate_tensor in one call to sess.run, then feed the result to a new call of sess.run as a placeholder. However, this is very inflexible. It requires that I hardcode the variable shape at compile time, for example. It's also not very good when the number of intermediate variables I'd like to use is very large.
Is there a better way? This would be very helpful if, say, one were using a curvature matrix that doesn't need to be evaluated every iteration.

Why shuffling data gives significantly higher accuracy?

In Tensorflow, I've wrote a big model for 2 image classes problem. My question is concerned with the following code snippet:
X, y, X_val, y_val = prepare_data()
probs = calc_probs(model, session, X)
accuracy = float(np.equal(np.argmax(probs, 1), np.argmax(y, 1)).sum()) / probs.shape[0]
loss = log_loss(y, probs)
X is an np.array of shape: (25000,244,244,3). That code results in accuracy=0.5834 (towards random accuracy) and loss=2.7106. But
when I shuffle the data, by adding these 3 lines after the first line:
sample_idx = random.sample(range(0, X.shape[0]), 25000)
X = X[sample_idx]
y = y[sample_idx]
, the results become convenient: accuracy=0.9933 and loss=0.0208.
Why shuffling data can give significantly higher accuracy ? or what can be a reason for that ?
The function calc_probs is mainly a run call:
probs = session.run(model.probs, feed_dict={model.X: X})
Update:
After hours of debugging, I figured out that evaluating a single image gives different result. For example, if you run the following line of code multiple times, you get a different result each time:
session.run(model.props, feed_dict={model.X: [X[20]])
My data is normally sorted, X contains class 1 samples first then class 2. And in calc_probs function, I run using each batch of the data sequentially. So, without shuffling, each run has data of a single class.
I've also noted that with shuffling, if batch size is very small, I get the random accuracy.
There is some mathematical justification for this in the context of randomized Kaczmarz algorithm. Regular Kaczmarz algorithm is an old algorithm which can be seen as an non-shuffling SGD on a least squares problem, and there are guaranteed faster convergence rates that come out if you use randomization, follow references in http://www.cs.ubc.ca/~nickhar/W15/Lecture21Notes.pdf