What is the canonical way of running a Tensorflow "for loop"?
Specifically, suppose we have some body function which does NOT depend on the loop iteration, but must be run n times.
One might think that a good method might be to run this inside of a tf.while_loop like this:
def body(x):
return ...
def while_body(i,x):
return i+1, body(x)
i, x = tf.while_loop(lambda i: tf.less(i, n), while_body, [tf.constant(0),x])
In fact, that is precisely what the highest rated answer in this question suggests:
How can I run a loop with a tensor as its range? (in tensorflow)
However, the tf.while_loop docs specifically say
For correct programs, while_loop should return the same result for any parallel_iterations > 0.
If you put a counter in the body, then it seems that that condition is violated. So it seems that there must be a different way of setting up a "for loop".
Furthermore, even if there is no explicit error, doing so seems like it will create a dependency between iterations meaning that I do not think they will run in parallel.
After some investigation, it seems that the tf.while_loop idiom used above is quite common. Alternatively, one can use tf.scan:
def body( x ):
return ...
def scan_body( previous_output, iteration ):
return body( ... )
x = tf.scan( scan_body, tf.range(n), initializer = [x] )
although I have no idea if one is preferable from a performance point of view. Note in the above that we have to wrap the body function to accept the previous output.
Related
I am stuck with a problem when I need to compute an integration and run the loop for thousands of times.
The prototype of this function would be:
func_integrated = lambda t: scipy.integrate.quad(func, t, 1)
but then if I want to feed func_integrated into an array, that means I'll have to run this integration thousands of time, which is not very efficient. After all, the series from t to t-dt is just approximately func(t) *dt.
My guess would be to write something like this:
def function_integration(func, arr):
arr = np.flip(arr, 0) # flip to count form back to front
out_arr = np.array([0])
for i in range(1,len(arr)):
increment = arr[i-1] -arr[i]
out_arr = np.hstack((out_arr, increment * func(arr[i-1])))
return np.cumsum(out_arr)
However this is not generic, and still needs looping. I was wondering if there is any scipy or numpy way to handle this more efficiently.
P.S. I am aware of the sympy approach. However that relies heavily on built-in functions, so the method can hardly be generally adapted.
I am trying to minimize x*x with adagrad optimiser. I expect to get x=0 as result, but I get value x, close to initial value.
import tensorflow as tf
x=tf.Variable(-2.)
sq=x*x
o = tf.train.AdamOptimizer(1e-1).minimize(sq)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run([init])
sess.run([o])
r=sess.run([x])
print("done",r)
I get -1.9 as a result, instead of expected 0.
Do I understand correctly that -2 is initial value here, or is it something else? Does AdamOptimiser perform just one step or is it possible to launch it for continious optimisation? How do I get x=0 as result?
sess.run([0]) runs only a single step. To perform a full optimization, you need to run many steps, which can be done by repeating the single step in a loop.
Thus, you can replace sess.run([o]) with:
for i in range(1000):
sess.run([o])
This yields the results 3.4735016e-23, very close to the expected 0.
In my experience, people usually run many optimization steps just as I demonstrated, with a for loop. If you are interested in implementing the loop as a TensorFlow operation, and then running this operation only once, this can be done, but it is not recommended. The reasons are: (a) I don't think you will gain any "elegance" in your code by doing this. (b) If you want to run 1000 steps, you will need to add 1000 sets of operations to your graph, and group them as one. Contrast this to needing only one set of operations.
You can see more relevant information in this question.
I intend to write a queue runner thread which does some processing. But basically I can formulate my action as some TF op which depends on another queue.dequeue.
So, maybe the canonical way to write this would be:
op = make_op()
with coord.stop_on_exception():
while True:
session.run(op)
At the very end it would stop because of an OutOfRangeError exception.
I could however also implement the loop as part of the graph and write it like this:
def body(last):
with tf.control_dependencies([last]):
return make_op()
op = tf.while_loop(
cond=lambda: tf.constant(True),
body=body,
loop_vars=[tf.no_op()],
parallel_iterations=1, back_prop=False)
with coord.stop_on_exception():
session.run(op)
I would expect that this is faster because there will be no Python code involved while executing the loop. Is there any downsides with this approach or any reason why this would not work?
I have a complicated network with many repeated RNN steps. Compiling takes a long time (30+ minutes, mostly stuck at the gradient step) and I found that this issue might be related, which mentions dynamic_rnn as a much faster way to compile:
Looking over dynamic_rnn, I then reformatted my network to include a while_loop, like so:
#input: tensor with 1000 time steps
def body(i, prev_state):
inp = tf.slice(input, i, 1)
new_state = cell(tf.squeeze(int), prev_state) # Includes scope reuse
return [tf.add(i, tf.constant(1)), new_state]
def cond(i):
return some_cond(i)
tf.while_loop(cond, body, [tf.constant(0), initial_state])
But this didn't seem to help. What besides simply putting the cell call in a loop makes dynamic_rnn so much faster to compile?
I've tried to find function comparing two PyArrayObject - something like numpy array_equal But I haven't found anything. Do you know function like this?
If not - How to import this numpy array_equal to my C code?
Here's the code for array_equal:
def array_equal(a1, a2):
try:
a1, a2 = asarray(a1), asarray(a2)
except:
return False
if a1.shape != a2.shape:
return False
return bool(asarray(a1 == a2).all())
As you can see it is not a c-api level function. After making sure both inputs are arrays, and that shape match it performs a element == test, followed by all.
This does not work reliably with floats. It's ok with ints and booleans.
There probably is some sort of equality function in the c-api, but a clone of this probably isn't what you need.
PyArray_CountNonzero(PyArrayObject* self)
might be a good function. I remember from digging into the code earlier that PyArray_Nonzero uses it to determine how big of an array to allocate and return. You could give it an object that compares the elements of your 2 arrays (in what ever way is appropriate given the dtype), and then test for a nonzero count.
Or you could construct your own iterator that bails out as soon as it gets a not-equal pair of elements. Use nditer to get the full array broadcasting power.