Are tensorflow random-variables only created once per sess.run? - tensorflow

If I have something like this:
a = tf.random_uniform((1,), dtype=tf.float32)
b = 1 + a
c = 2 + a
Will a be the same or different when calculating b and c?

Every time a sess.run() is executed, different results are generated, as can be seen in the official documentation of tensorflow.
For example, given the following code:
import tensorflow as tf
a = tf.random_uniform((1,), dtype=tf.float32)
b = 1 + a
c = 2 + a
init = tf.global_variables_initializer()
sess = tf.Session()
print(sess.run(a))
print(sess.run(b))
print(sess.run(c))
print(sess.run(a))
It will produce different values of a and hence the values of b will be 1 + a (new generated)
where a(new generated) will be different from a.
Output:
[ 0.13900638] # value of a
[ 1.87361598] # value of b = 1 + 0.87361598(!= a)
[ 2.81547356] # value of c = 2 + 0.81547356(!= a)
[ 0.00705874] # value of a(!= previous value of a)

As answered by #heena bawa
For every sess.run() the values will be re initialised.
To solve for that problem, we initialise the session and call run once. If multiple results are expected then they are passed in a list as such:
import tensorflow as tf
a = tf.random_uniform((1,), dtype=tf.float32)
b = 1 + a
c = 2 + a
init = tf.global_variables_initializer()
with tf.Session() as sess:
print(sess.run([c, b, a]))
output:
[array([2.0236197], dtype=float32), array([1.0236198], dtype=float32), array([0.02361977], dtype=float32)]
# c is 2.023..
# b is 1.023..
# a is 0.023..

Related

tensorflow giving nans when calculating gradient with sparse tensors

The following snippet is from a fairly large piece of code but hopefully I can give all the information necessary:
y2 = tf.matmul(y1,ymask)
dist = tf.norm(ystar-y2,axis=0)
y1 and y2 are 128x30 and ymask is 30x30. ystar is 128x30. dist is 1x30. When ymask is the identity matrix, everything works fine. But when I set it to be all zeros, apart from a single 1 along the diagonal (so as to set all columns but one in y2 to be zero), I get nans for the gradient of dist with respect to y2, using tf.gradients(dist, [y2]). The specific value of dist is [0,0,7.9,0,...], with all the ystar-y2 values being around the range (-1,1) in the third column and zero elsewhere.
I'm pretty confused as to why a numerical issue would occur here, given there are no logs or divisions, is this underflow? Am I missing something in the maths?
For context, I'm doing this to try to train individual dimensions of y, one at a time, using the whole network.
longer version to reproduce:
import tensorflow as tf
import numpy as np
import pandas as pd
batchSize = 128
eta = 0.8
tasks = 30
imageSize = 32**2
groups = 3
tasksPerGroup = 10
trainDatapoints = 10000
w = np.zeros([imageSize, groups * tasksPerGroup])
toyIndex = 0
for toyLoop in range(groups):
m = np.ones([imageSize]) * np.random.randn(imageSize)
for taskLoop in range(tasksPerGroup):
w[:, toyIndex] = m * 0.1 * np.random.randn(1)
toyIndex += 1
xRand = np.random.normal(0, 0.5, (trainDatapoints, imageSize))
taskLabels = np.matmul(xRand, w) + np.random.normal(0,0.5,(trainDatapoints, groups * tasksPerGroup))
DF = np.concatenate((xRand, taskLabels), axis=1)
trainDF = pd.DataFrame(DF[:trainDatapoints, ])
# define graph variables
x = tf.placeholder(tf.float32, [None, imageSize])
W = tf.Variable(tf.zeros([imageSize, tasks]))
b = tf.Variable(tf.zeros([tasks]))
ystar = tf.placeholder(tf.float32, [None, tasks])
ymask = tf.placeholder(tf.float32, [tasks, tasks])
dataLength = tf.cast(tf.shape(ystar)[0],dtype=tf.float32)
y1 = tf.matmul(x, W) + b
y2 = tf.matmul(y1,ymask)
dist = tf.norm(ystar-y2,axis=0)
mse = tf.reciprocal(dataLength) * tf.reduce_mean(tf.square(dist))
grads = tf.gradients(dist, [y2])
trainStep = tf.train.GradientDescentOptimizer(eta).minimize(mse)
# build graph
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
randTask = np.random.randint(0, 9)
ymaskIn = np.zeros([tasks, tasks])
ymaskIn[randTask, randTask] = 1
batch = trainDF.sample(batchSize)
batch_xs = batch.iloc[:, :imageSize]
batch_ys = np.zeros([batchSize, tasks])
batch_ys[:, randTask] = batch.iloc[:, imageSize + randTask]
gradOut = sess.run(grads, feed_dict={x: batch_xs, ystar: batch_ys, ymask: ymaskIn})
sess.run(trainStep, feed_dict={x: batch_xs, ystar: batch_ys, ymask:ymaskIn})
Here's a very simple reproduction:
import tensorflow as tf
with tf.Graph().as_default():
y = tf.zeros(shape=[1], dtype=tf.float32)
dist = tf.norm(y,axis=0)
(grad,) = tf.gradients(dist, [y])
with tf.Session():
print(grad.eval())
Prints:
[ nan]
The issue is that tf.norm computes sum(x**2)**0.5. The gradient is x / sum(x**2) ** 0.5 (see e.g. https://math.stackexchange.com/a/84333), so when sum(x**2) is zero we're dividing by zero.
There's not much to be done in terms of a special case: the gradient as x approaches all zeros depends on which direction it's approaching from. For example if x is a single-element vector, the limit as x approaches 0 could either be 1 or -1 depending on which side of zero it's approaching from.
So in terms of solutions, you could just add a small epsilon:
import tensorflow as tf
def safe_norm(x, epsilon=1e-12, axis=None):
return tf.sqrt(tf.reduce_sum(x ** 2, axis=axis) + epsilon)
with tf.Graph().as_default():
y = tf.constant([0.])
dist = safe_norm(y,axis=0)
(grad,) = tf.gradients(dist, [y])
with tf.Session():
print(grad.eval())
Prints:
[ 0.]
Note that this is not actually the Euclidean norm. It's a good approximation as long as the input is much larger than epsilon.

Compute value of variable for multiple input values

I have a tensorflow graph which is trained. After training, I want to sample one variable for multiple intermediate values. Simplified:
a = tf.placeholder(tf.float32, [1])
b = a + 10
c = b * 10
Now I want to query c for values of b. Currently, I am using an outer loop
b_values = [0, 1, 2, 3, 4, 5]
samples = []
for b_value in b_values:
samples += [sess.run(c,
feed_dict={b: [b_value]})]
This loop takes quite a bit of time, I think it is because b_values contains 5000 values in my case. Is there a way of running sess.run only once, and passing all b_values at once? I cannot really modify the graph a->b->c, but I could add something to it if that helps.
You could do it as follows:
import tensorflow as tf
import numpy as np
import time
a = tf.placeholder(tf.float32, [None,1])
b = a + 10
c = b * 10
sess = tf.Session()
b_values = np.random.randint(500,size=(5000,1))
samples = []
t = time.time()
for b_value in b_values:
samples += [sess.run(c,feed_dict={b: [b_value]})]
print time.time()-t
#print samples
t=time.time()
samples = sess.run(c,feed_dict={b:b_values})
print time.time()-t
#print samples
Output: (time in seconds)
0.874449968338
0.000532150268555
Hope this helps !

How to duplicate operations & placeholders in Tensorflow

Suppose I have two neural network model defined each with 1 input placeholder and 1 output tensor. From these 2 outputs I need 3 separate values.
inputs: i1, i2, outputs: o1, o2
a = 1
b = 2
v1 = session.run(o1, feed_dict={i1: a})
v2 = session.run(o1, feed_dict={i1: b})
v3 = session.run(o2, feed_dict={i2: a})
The problem is I need to feed these 3 values into a loss function so I can't do the above. I need to do
loss = session.run(L, feed_dict={i1: a, i1: b, i2:a })
I don't think I can do that but even if I could I would still have the ambiguity in later operations since o1 with input i1 is used differently than o1 with input i2.
I think it could be solved by having 2 input placeholders and 2 outputs in the first neural network. So given I already have a model is there a way to restructure the inputs and outputs so that I can accommodate this?
Visually I want to turn
i1 ---- (model) ----- o1
into
i1a o1a
\ /
\ /
x ----- (model) ----- x
/ \
/ \
i1b o1b
Your intuition is right, you have to create 2 different placeholders i1a and i1b for your network 1, with two outputs o1a and o1b. Your visuals look great so here is my proposition:
i1a ----- (model) ----- o1a
|
shared weights
|
i1b ----- (model) ----- o1b
The proper way to do that is to duplicate your network by using tf.get_variable() for every variable with reuse=True.
def create_variables():
with tf.variable_scope('model'):
w1 = tf.get_variable('w1', [1, 2])
b1 = tf.get_variable('b1', [2])
def inference(input):
with tf.variable_scope('model', reuse=True):
w1 = tf.get_variable('w1')
b1 = tf.get_variable('b1')
output = tf.matmul(input, w1) + b1
return output
create_variables()
i1a = tf.placeholder(tf.float32, [3, 1])
o1a = inference(i1a)
i1b = tf.placeholder(tf.float32, [3, 1])
o1b = inference(i1b)
loss = tf.reduce_mean(o1a - o1b)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
sess.run(loss, feed_dict={i1a: [[0.], [1.], [2.]], i1b: [[0.5], [1.5], [2.5]]})

Interpreting Tensorflow/Tensorboard "subtraction" operation

The following is code adapted from a simple learning example, that I have bent out of shape to understand the Tensorboard graph visualizations:
import tensorflow as tf
import numpy as np
sess = tf.InteractiveSession()
# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(10).astype("float32")
y_data = x_data * 0.1 + 0.3
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0, name = "internal_W"), name = "external_W")
b = tf.Variable(2*tf.zeros([1], name = "internal_b"), name = "doubled_b")
y = (W * x_data + b)
l1 = (y - y_data)
l2 = (y_data - y )
writer = tf.train.SummaryWriter("/tmp/test1", sess.graph_def)
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
sess.run(init)
print(sess.run(y))
print('---')
print((y_data))
print('---')
print(sess.run(l1))
print('---')
print(sess.run(l2))
A sample output of the print statements is:
[ 0.84253538 0.31011301 0.11627766 0.35491142 0.65550905 0.1798114
0.13632762 0.02010157 0.42960873 0.04218956]
---
[ 0.39195824 0.33384719 0.31269109 0.33873668 0.37154531 0.31962547
0.31487945 0.302194 0.3468895 0.30460477]
---
[ 0.45057714 -0.02373418 -0.19641343 0.01617473 0.28396374 -0.13981406
-0.17855182 -0.28209242 0.08271924 -0.2624152 ]
---
[-0.45057714 0.02373418 0.19641343 -0.01617473 -0.28396374 0.13981406
0.17855182 0.28209242 -0.08271924 0.2624152 ]
Clearly, the subtractions are working properly-- the inputs to the subtraction are in different order, and yield different outputs. However, the graph visualization is:
Notice the "Sub" operators, which appear not to reverse the order of the operands as the code does. (Highlighting either operator yields no additional insight.) Am I missing something obvious, or do the node visualizations completely obscure order of operands?
After futzing around with this, my considered answer to my own question is, "Yes, this is working as intended." The inputs to the nodes show only what the inputs are, not any particular relationships to the operation or the node or themselves; indeed, if one added a variable to itself in an operation node, the input variable would show up only once.
This is not a design choice I would have made, but that does seem to be the intent.
I still encourage others who may have more insight to comment or fully answer.

How to keep calculated values in a Tensorflow graph (on the GPU)?

How can we make sure that a calculated value will not be copied back to CPU/python memory, but is still available for calculations in the next step?
The following code obviously doesn't do it:
import tensorflow as tf
a = tf.Variable(tf.constant(1.),name="a")
b = tf.Variable(tf.constant(2.),name="b")
result = a + b
stored = result
with tf.Session() as s:
val = s.run([result,stored],{a:1.,b:2.})
print(val) # 3
val=s.run([result],{a:4.,b:5.})
print(val) # 9
print(stored.eval()) # 3 NOPE:
Error : Attempting to use uninitialized value _recv_b_0
The answer is to store the value in a tf.Variable by storing to it using the assign operation:
working code:
import tensorflow as tf
with tf.Session() as s:
a = tf.Variable(tf.constant(1.),name="a")
b = tf.Variable(tf.constant(2.),name="b")
result = a + b
stored = tf.Variable(tf.constant(0.),name="stored_sum")
assign_op=stored.assign(result)
val,_ = s.run([result,assign_op],{a:1.,b:2.})
print(val) # 3
val=s.run(result,{a:4.,b:5.})
print(val[0]) # 9
print(stored.eval()) # ok, still 3