How to duplicate operations & placeholders in Tensorflow - tensorflow

Suppose I have two neural network model defined each with 1 input placeholder and 1 output tensor. From these 2 outputs I need 3 separate values.
inputs: i1, i2, outputs: o1, o2
a = 1
b = 2
v1 = session.run(o1, feed_dict={i1: a})
v2 = session.run(o1, feed_dict={i1: b})
v3 = session.run(o2, feed_dict={i2: a})
The problem is I need to feed these 3 values into a loss function so I can't do the above. I need to do
loss = session.run(L, feed_dict={i1: a, i1: b, i2:a })
I don't think I can do that but even if I could I would still have the ambiguity in later operations since o1 with input i1 is used differently than o1 with input i2.
I think it could be solved by having 2 input placeholders and 2 outputs in the first neural network. So given I already have a model is there a way to restructure the inputs and outputs so that I can accommodate this?
Visually I want to turn
i1 ---- (model) ----- o1
into
i1a o1a
\ /
\ /
x ----- (model) ----- x
/ \
/ \
i1b o1b

Your intuition is right, you have to create 2 different placeholders i1a and i1b for your network 1, with two outputs o1a and o1b. Your visuals look great so here is my proposition:
i1a ----- (model) ----- o1a
|
shared weights
|
i1b ----- (model) ----- o1b
The proper way to do that is to duplicate your network by using tf.get_variable() for every variable with reuse=True.
def create_variables():
with tf.variable_scope('model'):
w1 = tf.get_variable('w1', [1, 2])
b1 = tf.get_variable('b1', [2])
def inference(input):
with tf.variable_scope('model', reuse=True):
w1 = tf.get_variable('w1')
b1 = tf.get_variable('b1')
output = tf.matmul(input, w1) + b1
return output
create_variables()
i1a = tf.placeholder(tf.float32, [3, 1])
o1a = inference(i1a)
i1b = tf.placeholder(tf.float32, [3, 1])
o1b = inference(i1b)
loss = tf.reduce_mean(o1a - o1b)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
sess.run(loss, feed_dict={i1a: [[0.], [1.], [2.]], i1b: [[0.5], [1.5], [2.5]]})

Related

If any one familiar with Andrew ng's Planer data classification with one hidden layer

How the value of X_assess dimension will be (2,3) in the below code, how did they get that..Am trying with my problem and got stuck over there
def forward_propagation_test_case():
np.random.seed(1)
X_assess = np.random.randn(2, 3)
b1 = np.random.randn(4,1)
b2 = np.array([[ -1.3]])
parameters = {'W1': np.array([[-0.00416758, -0.00056267],
[-0.02136196, 0.01640271],
[-0.01793436, -0.00841747],
[ 0.00502881, -0.01245288]]),
'W2': np.array([[-0.01057952, -0.00909008, 0.00551454, 0.02292208]]),
'b1': b1,
'b2': b2}
return X_assess, parameters

Are tensorflow random-variables only created once per sess.run?

If I have something like this:
a = tf.random_uniform((1,), dtype=tf.float32)
b = 1 + a
c = 2 + a
Will a be the same or different when calculating b and c?
Every time a sess.run() is executed, different results are generated, as can be seen in the official documentation of tensorflow.
For example, given the following code:
import tensorflow as tf
a = tf.random_uniform((1,), dtype=tf.float32)
b = 1 + a
c = 2 + a
init = tf.global_variables_initializer()
sess = tf.Session()
print(sess.run(a))
print(sess.run(b))
print(sess.run(c))
print(sess.run(a))
It will produce different values of a and hence the values of b will be 1 + a (new generated)
where a(new generated) will be different from a.
Output:
[ 0.13900638] # value of a
[ 1.87361598] # value of b = 1 + 0.87361598(!= a)
[ 2.81547356] # value of c = 2 + 0.81547356(!= a)
[ 0.00705874] # value of a(!= previous value of a)
As answered by #heena bawa
For every sess.run() the values will be re initialised.
To solve for that problem, we initialise the session and call run once. If multiple results are expected then they are passed in a list as such:
import tensorflow as tf
a = tf.random_uniform((1,), dtype=tf.float32)
b = 1 + a
c = 2 + a
init = tf.global_variables_initializer()
with tf.Session() as sess:
print(sess.run([c, b, a]))
output:
[array([2.0236197], dtype=float32), array([1.0236198], dtype=float32), array([0.02361977], dtype=float32)]
# c is 2.023..
# b is 1.023..
# a is 0.023..

What are the practical differences between the variable_scope and name_scope?

Although I have gone through the pages regarding the same question: What is the difference between variable_scope and name_scope? and What is the difference between variable_ops_scope and variable_scope?.
I still cannot fully understand their differences. I have tried to use tf.variable_scope and tf.name_scope for the same code, I found they have the same graph by TensorBoard.
Other people have discussed their main differences with the generated name in the Graph, while is their name so important? I also saw that the variable with the same name would be reused. What is the reuse occasion?
The key is to understand the difference between variables and other tensors in the graph. Any newly created tensors will gain a prefix from a name scope. tf.get_variable will look for existing variables without the name scope modifier. Newly created variables with tf.get_variable will still get their name's augmented.
The script below highlights these differences. The intention is to reproduce the simple function by refactoring the tf.matmul(x, A) + b line and variable creation into a separate function add_layer.
import tensorflow as tf
def get_x():
return tf.constant([[1., 2., 3.]], dtype=tf.float32)
def report(out1, out2):
print(out1.name)
print(out2.name)
variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
print([v.name for v in variables])
def simple():
A = tf.get_variable(shape=(3, 3), dtype=tf.float32, name='A')
b = tf.get_variable(shape=(3,), dtype=tf.float32, name='b')
x = get_x()
out1 = tf.matmul(x, A) + b
out2 = tf.matmul(out1, A) + b
return out1, out2
def add_layer(x):
A = tf.get_variable(shape=(3, 3), dtype=tf.float32, name='A')
b = tf.get_variable(shape=(3,), dtype=tf.float32, name='b')
return tf.matmul(x, A) + b
def no_scoping():
x = get_x()
out1 = add_layer(x)
out2 = add_layer(out1)
return out1, out2
def different_name_scopes():
x = get_x()
with tf.name_scope('first_layer'):
out1 = add_layer(x)
with tf.name_scope('second_layer'):
out2 = add_layer(out1)
return out1, out2
def same_name_scope():
x = get_x()
with tf.name_scope('first_layer'):
out1 = add_layer(x)
with tf.name_scope('first_layer'):
out2 = add_layer(out1)
return out1, out2
def different_variable_scopes():
x = get_x()
with tf.variable_scope('first_layer'):
out1 = add_layer(x)
with tf.variable_scope('second_layer'):
out2 = add_layer(out1)
return out1, out2
def same_variable_scope():
x = get_x()
with tf.variable_scope('first_layer'):
out1 = add_layer(x)
with tf.variable_scope('first_layer'):
out2 = add_layer(out1)
return out1, out2
def same_variable_scope_reuse():
x = get_x()
with tf.variable_scope('first_layer'):
out1 = add_layer(x)
with tf.variable_scope('first_layer', reuse=True):
out2 = add_layer(out1)
return out1, out2
def test_fn(fn, name):
graph = tf.Graph()
with graph.as_default():
try:
print('****************')
print(name)
print('****************')
out1, out2 = fn()
report(out1, out2)
print('----------------')
print('SUCCESS')
print('----------------')
except Exception:
print('----------------')
print('FAILED')
print('----------------')
for fn, name in [
[simple, 'simple'],
[no_scoping, 'no_scoping'],
[different_name_scopes, 'different_name_scopes'],
[same_name_scope, 'same_name_scope'],
[different_variable_scopes, 'different_variable_scopes'],
[same_variable_scope, 'same_variable_scope'],
[same_variable_scope_reuse, 'same_variable_scope_reuse']
]:
test_fn(fn, name)
Results:
****************
simple
****************
add:0
add_1:0
[u'A:0', u'b:0']
----------------
SUCCESS
----------------
****************
no_scoping
****************
----------------
FAILED
----------------
****************
different_name_scopes
****************
----------------
FAILED
----------------
****************
same_name_scope
****************
----------------
FAILED
----------------
****************
different_variable_scopes
****************
first_layer/add:0
second_layer/add:0
[u'first_layer/A:0', u'first_layer/b:0', u'second_layer/A:0', u'second_layer/b:0']
----------------
SUCCESS
----------------
****************
same_variable_scope
****************
----------------
FAILED
----------------
****************
same_variable_scope_reuse
****************
first_layer/add:0
first_layer_1/add:0
[u'first_layer/A:0', u'first_layer/b:0']
----------------
SUCCESS
----------------
Note that using different variable_scopes without reuse doesn't raise an error, but creates multiple copies of A and b, which may not be intended.

tensorflow giving nans when calculating gradient with sparse tensors

The following snippet is from a fairly large piece of code but hopefully I can give all the information necessary:
y2 = tf.matmul(y1,ymask)
dist = tf.norm(ystar-y2,axis=0)
y1 and y2 are 128x30 and ymask is 30x30. ystar is 128x30. dist is 1x30. When ymask is the identity matrix, everything works fine. But when I set it to be all zeros, apart from a single 1 along the diagonal (so as to set all columns but one in y2 to be zero), I get nans for the gradient of dist with respect to y2, using tf.gradients(dist, [y2]). The specific value of dist is [0,0,7.9,0,...], with all the ystar-y2 values being around the range (-1,1) in the third column and zero elsewhere.
I'm pretty confused as to why a numerical issue would occur here, given there are no logs or divisions, is this underflow? Am I missing something in the maths?
For context, I'm doing this to try to train individual dimensions of y, one at a time, using the whole network.
longer version to reproduce:
import tensorflow as tf
import numpy as np
import pandas as pd
batchSize = 128
eta = 0.8
tasks = 30
imageSize = 32**2
groups = 3
tasksPerGroup = 10
trainDatapoints = 10000
w = np.zeros([imageSize, groups * tasksPerGroup])
toyIndex = 0
for toyLoop in range(groups):
m = np.ones([imageSize]) * np.random.randn(imageSize)
for taskLoop in range(tasksPerGroup):
w[:, toyIndex] = m * 0.1 * np.random.randn(1)
toyIndex += 1
xRand = np.random.normal(0, 0.5, (trainDatapoints, imageSize))
taskLabels = np.matmul(xRand, w) + np.random.normal(0,0.5,(trainDatapoints, groups * tasksPerGroup))
DF = np.concatenate((xRand, taskLabels), axis=1)
trainDF = pd.DataFrame(DF[:trainDatapoints, ])
# define graph variables
x = tf.placeholder(tf.float32, [None, imageSize])
W = tf.Variable(tf.zeros([imageSize, tasks]))
b = tf.Variable(tf.zeros([tasks]))
ystar = tf.placeholder(tf.float32, [None, tasks])
ymask = tf.placeholder(tf.float32, [tasks, tasks])
dataLength = tf.cast(tf.shape(ystar)[0],dtype=tf.float32)
y1 = tf.matmul(x, W) + b
y2 = tf.matmul(y1,ymask)
dist = tf.norm(ystar-y2,axis=0)
mse = tf.reciprocal(dataLength) * tf.reduce_mean(tf.square(dist))
grads = tf.gradients(dist, [y2])
trainStep = tf.train.GradientDescentOptimizer(eta).minimize(mse)
# build graph
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
randTask = np.random.randint(0, 9)
ymaskIn = np.zeros([tasks, tasks])
ymaskIn[randTask, randTask] = 1
batch = trainDF.sample(batchSize)
batch_xs = batch.iloc[:, :imageSize]
batch_ys = np.zeros([batchSize, tasks])
batch_ys[:, randTask] = batch.iloc[:, imageSize + randTask]
gradOut = sess.run(grads, feed_dict={x: batch_xs, ystar: batch_ys, ymask: ymaskIn})
sess.run(trainStep, feed_dict={x: batch_xs, ystar: batch_ys, ymask:ymaskIn})
Here's a very simple reproduction:
import tensorflow as tf
with tf.Graph().as_default():
y = tf.zeros(shape=[1], dtype=tf.float32)
dist = tf.norm(y,axis=0)
(grad,) = tf.gradients(dist, [y])
with tf.Session():
print(grad.eval())
Prints:
[ nan]
The issue is that tf.norm computes sum(x**2)**0.5. The gradient is x / sum(x**2) ** 0.5 (see e.g. https://math.stackexchange.com/a/84333), so when sum(x**2) is zero we're dividing by zero.
There's not much to be done in terms of a special case: the gradient as x approaches all zeros depends on which direction it's approaching from. For example if x is a single-element vector, the limit as x approaches 0 could either be 1 or -1 depending on which side of zero it's approaching from.
So in terms of solutions, you could just add a small epsilon:
import tensorflow as tf
def safe_norm(x, epsilon=1e-12, axis=None):
return tf.sqrt(tf.reduce_sum(x ** 2, axis=axis) + epsilon)
with tf.Graph().as_default():
y = tf.constant([0.])
dist = safe_norm(y,axis=0)
(grad,) = tf.gradients(dist, [y])
with tf.Session():
print(grad.eval())
Prints:
[ 0.]
Note that this is not actually the Euclidean norm. It's a good approximation as long as the input is much larger than epsilon.

Compute value of variable for multiple input values

I have a tensorflow graph which is trained. After training, I want to sample one variable for multiple intermediate values. Simplified:
a = tf.placeholder(tf.float32, [1])
b = a + 10
c = b * 10
Now I want to query c for values of b. Currently, I am using an outer loop
b_values = [0, 1, 2, 3, 4, 5]
samples = []
for b_value in b_values:
samples += [sess.run(c,
feed_dict={b: [b_value]})]
This loop takes quite a bit of time, I think it is because b_values contains 5000 values in my case. Is there a way of running sess.run only once, and passing all b_values at once? I cannot really modify the graph a->b->c, but I could add something to it if that helps.
You could do it as follows:
import tensorflow as tf
import numpy as np
import time
a = tf.placeholder(tf.float32, [None,1])
b = a + 10
c = b * 10
sess = tf.Session()
b_values = np.random.randint(500,size=(5000,1))
samples = []
t = time.time()
for b_value in b_values:
samples += [sess.run(c,feed_dict={b: [b_value]})]
print time.time()-t
#print samples
t=time.time()
samples = sess.run(c,feed_dict={b:b_values})
print time.time()-t
#print samples
Output: (time in seconds)
0.874449968338
0.000532150268555
Hope this helps !