Force copy of tensor when enqueuing - tensorflow

first, I'm not sure if the title is very good, but it was the best I could come up with given my understanding of the situation.
The background is that I'm trying to understand how queues work in tensorflow and ran into the following issue which puzzled me.
I have a variable n, which I enqueue to a tf.FIFOQueue, and then I increment the variable. This is repeated several times, and one would expect a result similar to 0, 1, 2, ... However, when emptying the queue all values are the same.
More precisely, the code is as follows:
from __future__ import print_function
import tensorflow as tf
q = tf.FIFOQueue(10, tf.float32)
n = tf.Variable(0, trainable=False, dtype=tf.float32)
inc = n.assign(n+1)
enqueue = q.enqueue(n)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
sess.run(enqueue)
sess.run(inc)
sess.run(enqueue)
sess.run(inc)
sess.run(enqueue)
sess.run(inc)
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
Which I expect would print:
0.0
1.0
2.0
Instead I get the following result:
3.0
3.0
3.0
It seems like I'm pushing some pointer to n to the queue, instead of the actual value, which is what I want. However, I don't really have any actual understanding of tensorflow internals, so maybe something else is going on?
I tried changing
enqueue = q.enqueue(n)
to
enqueue = q.enqueue(tf.identity(n))
since answers to How can I copy a variable in tensorflow and In TensorFlow, what is tf.identity used for? gives me the impression that it might help, but it does not change the result. I also tried adding a tf.control_dependencies(), but again, all values are the same when dequeueing.
Edit: The output above is from running the code on a computer with a single CPU, when trying to see if there was some difference between different versions of tensorflow, I noticed if I run the code on a computer with CPU and GPU I get the "expected" result. Indeed, if I run with CUDA_VISIBLE_DEVICES="" I get the result above, and with CUDA_VISIBLE_DEVICES="0" I get the "expected" result.

To force a non-caching read you can do
q.enqueue(tf.add(q, 0))
This is what's currently done by the batch-normalization layer to force a copy.
Semantics of how variables get read vs. referenced are in the process of getting revamped so they are temporarily non-intuitive. In particular, I expected q.enqueue(v.read_value()) to force a non-caching read, but it doesn't fix your example on TF 0.12rc1
Using GPU machine puts variable on GPU, while Queue is CPU only, so enqueue op forces a GPU->CPU copy.

In case it helps, I've found that the other answers despite correct they do not work for all dtypes.
For example, this works fine with floats or ints but fails when n is a string tensor:
q.enqueue(tf.add(n, 0))
This one fails when the queue uses tuples with heterogeneous types (e.g., ints and floats):
q.enqueue_many([[n]])
So, if you see yourself caught in any of these situations try this instead:
q.enqueue(tf.add(n, tf.zeros_like(n)))
Or, to enqueue a tuple t:
q.enqueue([tf.add(n, tf.zeros_like(n)) for n in t])
That works even for string tensors and heterogeneous tuple types.
Hope it helps!
--
Update: it looks like tf.bool types do not work with tf.zeros_like(). For those, an explicit cast to an integer type might be needed.

Related

TensorFlow: What is the effect of calling tf.random.set_seed() twice, where the second function call is passed a hard-coded value?

I'm using someone else's code base and in one spot (early on in execution), the tensorflow seed is set via tf.random.set_seed(seed), where seed is provided via command line argument. But then a bit later in execution, they set it again with tf.random.set_seed(0).
What is the effect of setting the seed a second time with a hard-coded constant?
Does it mean that everything which happens after the second call will be identical, even for different seeds?
I realized checking myself yields a faster answer than waiting. For anyone else wondering, the answer is yes.
import tensorflow as tf
for seed in range(3):
print(seed)
tf.random.set_seed(seed)
print(tf.random.uniform(shape=(3, 2)))
tf.random.set_seed(0)
print(tf.random.uniform(shape=(3, 2)))
The second tensor will always be the same.

Inexplicable behaviour when using numpy.T as init for pyTorch weights

I use numpy to init the weights of my PyTorch MLP. It's a really small network, 2 layers, 21 neurons per layer. The network's output is BRDF values that are then rendered by Mitsuba 0.6.0.
The very peculiar and strange issue I am experiencing is when transposing the np-arrays during the initialization phase. Doing version A gives me a network that renders perfectly in Mitsuba (what I would expect). Doing version B, which should be equivalent, gives me a network that scores the same loss in PyTorch, but renders different values in Mitsuba.
# Version A:
w = np.random.uniform(low=-0.05, high=0.05, size=(6, 21)).astype(np.float32)
model.fc1.weight = torch.nn.Parameter(torch.from_numpy(w.T), requires_grad=True)
# Version B:
w = np.random.uniform(low=-0.05, high=0.05, size=(21, 6)).astype(np.float32)
model.fc1.weight = torch.nn.Parameter(torch.from_numpy(w), requires_grad=True)
Note how in Version B, all that changed are the dimensions and the call to transpose. Therefore, the shapes are equivalent to Version A, and the contents should be equivalent as well, as both are sampled from the same distribution.
I cannot share a MWE, as this is proprietary research, but I assure you that the ONLY thing I changed between these two runs is the two lines in the above code snippets. I do not think Mitsuba is at fault either, because the first network (version A) renders fine, and the second network is equivalent to that, but for the init. I tried mimicking the numpy-inits with the respective PyTorch-equivalents, and the issue persists.
Any help is greatly appreciated!!
VersionA
VersionB

Why AdamOptimizer fails to find optimal value to minimize x*x?

I am trying to minimize x*x with adagrad optimiser. I expect to get x=0 as result, but I get value x, close to initial value.
import tensorflow as tf
x=tf.Variable(-2.)
sq=x*x
o = tf.train.AdamOptimizer(1e-1).minimize(sq)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run([init])
sess.run([o])
r=sess.run([x])
print("done",r)
I get -1.9 as a result, instead of expected 0.
Do I understand correctly that -2 is initial value here, or is it something else? Does AdamOptimiser perform just one step or is it possible to launch it for continious optimisation? How do I get x=0 as result?
sess.run([0]) runs only a single step. To perform a full optimization, you need to run many steps, which can be done by repeating the single step in a loop.
Thus, you can replace sess.run([o]) with:
for i in range(1000):
sess.run([o])
This yields the results 3.4735016e-23, very close to the expected 0.
In my experience, people usually run many optimization steps just as I demonstrated, with a for loop. If you are interested in implementing the loop as a TensorFlow operation, and then running this operation only once, this can be done, but it is not recommended. The reasons are: (a) I don't think you will gain any "elegance" in your code by doing this. (b) If you want to run 1000 steps, you will need to add 1000 sets of operations to your graph, and group them as one. Contrast this to needing only one set of operations.
You can see more relevant information in this question.

in TensorFlow runtime, how tensors are copied?

I am reading the whole Tensorflow source code, and has been puzzled by one thing. In an Op, we can get the underlying data buffer of an input tensor and change its value, but this change will not reflected outside this op (the input is not a Ref type).
For example,
y = op1(x)
z = op2(x)
in op1, suppose we get the underlying buffer of x and change its value, but when I run y_val, z_val = sess.run([y, z]), it seems that this does not affect the value of z (it x really changes, z should change).
here since x tensor is consumed by two ops, I initially think maybe tensorflow split x into two tensors, one as the input of op1, and the other as input of op2. However, I checked the code, it seems not.
Another possibility is that tensor is copy-on-write, but it also seems not after I checked the code.
Anyone know what really happen here? Thanks a lot.

does tensorflow 0.10.0rc version support float16?

In order to reduce the tensor, I defined all the variables with dytpe=tf.float16 in my Model, and then defined the optimizer:
optimizer = tf.train.AdamOptimizer(self.learning_rate)
self.compute_gradients = optimizer.compute_gradients(self.mean_loss_reg)
train_adam_op = optimizer.apply_gradients(self.compute_gradients, global_step=self.global_step)
Everything works ok! but after I run the train_adam_op, the the gradients and variables are nan in python. I wander If the apply_gradients() API supports tf.float16 type? Why I got nan after apply_gradients() was called by session.run()....
The dynamic range of fp16 is fairly limited compared to that of 32-bit floats. As a result, it's pretty easy to overflow or underflow them, which often results in the NaN that you've encountered.
You can insert a few check_numerics operations in your model to help pinpoint the specific operation(s) that becomes unstable when performed on fp16.
For example, you can wrap a L2 loss operation as follow to check that its result fits in an fp16
A = tf.l2_loss(some_tensor)
becomes
A = tf.check_numerics(tf.l2_loss(some_tensor), "found the root cause")
The most common source of overflows and underflows are the exp(), the log(), as well as the various classification primitives, so I would start looking there.
Once you've figured out which sequence of operations is problematic, you can update your model to perform that sequence using 32-bit floats by using tf.cast() to convert the inputs of the sequence to 32bit floats, and cast the result back to fp16.