I have seen working examples like the following:
# GOOD CODE
import tensorflow as tf
# Build a graph.
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Launch the graph in a session.
sess = tf.compat.v1.Session()
# Evaluate the tensor `c`.
print(sess.run(c))
This is fine. I think I get it. Building the graph simply defines the graph architecture but doesn't execute the graph. Execution only happens when the session is called. But then I thought that the following would work:
# BAD CODE
import tensorflow as tf
# Build a graph.
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Launch the graph in a session.
sess = tf.compat.v1.Session()
# Evaluate the tensor `c`.
sess.run(c)
print(c)
But it doesn't, all I see is...
Tensor("mul:0", shape=(), dtype=float32)
...and I can not work out why. I have two ideas:
Idea 1: Maybe somehow the variable c only existed within the context of the session and somehow expired the moment sess.run(c) was completed? Can the code be modified such that the variable c is kept alive after the sess.run(c) call?
Idea 2: Somehow TF thought - "this guy is not doing anything with c (at least not immediately) so I'm not going to execute anything right now"... and so c has not been evaluated at all.
Related
I have been following LRP implementation using pyTorch and wanted to test it out using Tensorflow and Keras. I am using the same model with weights(VGG16) in Keras and was able to successfully execute the forward pass and element wise division using
# keras-tensorflow implementation
z = incr(clasifierLayers[l](A[l])) # forward pass step(1)
s = (R[l+1]/z) # Element wise division step(2)
But i am facing trouble in recreating the backward pass. In the original code(LRP), which uses pyTorch, the backward pass is computed using
# pyTorch implementation
(z*s).sum().backward(); c = A[l].grad
and when i tried to find the replicate the backward pass using tensorflow, my gradient returns None. Here is my code trying to compute the backward pass.
def getGradients(product,layer,l):
with tf.GradientTape() as tape:
tape.watch(product)
a=layers[l](A[l])
gradient = tape.gradient(product, a)
return gradient
c = getGradients((z*s).numpy().sum(),layers[l],l) # backward pass step(3)
Can someone tell me whats wrong with this implementation?
Thanks in Advance
I tried to replicate the issue with one layer and performing an LRP backward step, here is the code:
import tensorflow as tf
x = tf.ones((1,10))
layer=tf.keras.layers.Dense(10)
y=layer(x)
with tf.GradientTape() as tape:
tape.watch(x)
z = tf.keras.layers.Dense(10)(x)+1e-9
s = y/z
s = tf.reshape(s, z.shape)
c = tape.gradient(tf.reduce_sum(z*s), x)
y*c
This code works, in the sense that it returns the gradients to c.
I did not test it with a dataset, so do not know if it works as it should. Nonetheless, I think the problem with your code is that you should have the first block:
# keras-tensorflow implementation
z = incr(clasifierLayers[l](A[l])) # forward pass step(1)
s = (R[l+1]/z) # Element wise division step(2)
inside the TapeGradient scope and ask for the gradients with respect to the A[l].
Edit:
I forgot to avoid gradients being propagated through s. The gradient computation should be done as follows:
c = tape.gradient(tf.reduce_sum(z*s.numpy()), x)
# Build a graph.
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Launch the graph in a session.
sess = tf.compat.v1.Session()
# Evaluate the tensor `c`.
print(sess.run(c))
This above code is taken from tensorflow core r2.0 documentation
But it gives the above error
The thing is
The tensorflow core r2.0 have enabled eager execution by default so doesn't need to write tf.compat.v1.Session() and use .run() function
If we want to use tf.compat.v1.Session() then we need to do thi
tf.compat.v1.disable_eager_execution() in the starting of algorithm. Now we can use tf.compat.v1.Session() and .run() function.
Tensorflow core r2.0 have enabled eager execution by default. so, without changing it
we just have to change our code
# Launch the graph in a session.
with tf.compat.v1.Session() as ses:
# Build a graph.
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# Evaluate the tensor `c`.
print(ses.run(c))
This gives the output without any errors
And one more thing to make eager execution enable in case then remember it has to be called in the startup of the algorithm
For more please go through documentation
If any issues please feel free to ask.
By the way i am just a beginner in tensorflow and keras.
Thank You !
I tried to implement the following code.
import tensorflow as tf
a = tf.placeholder(tf.int32)
b = tf.placeholder(tf.int32)
def initw(a,b):
tf.Variable(tf.sign(tf.random_uniform(shape=[a,b],minval=-1.0,maxval=1.0)))
bla = initw(a,b)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run([bla], feed_dict={a:2, b:2}))
But I keep getting an error which states:
ValueError: initial_value must have a shape specified: Tensor("Sign:0",shape=(?, ?), dtype=float32)
Can someone tell me what I am doing wrong here? I really don't see what causes the error.
EDIT:
I want to use initw(a,b) to initialize the weights of a network. I want to be able to do something like:
weights = {
"h1": tf.get_variable("h1", initializer=initw(a,b).initialized_value())
}
Where a and b are the height and width of a matrix.
In my eyes the error message is actually quite precise. But I understand your confusion. You probably do not really understand how Tensorflow works under the hood. You might want to start reading here.
The shape of the computational graph must be known before runtime. There can only be one axis in every variable or placeholder which is unspecified at compile time, it is than later at runtime considered to be the batch dimension.
In your case you are trying to use placeholders to specify the dimensions of a variable, which is impossible because the graph can not be compiled this way.
I don't know what you are trying to do with this but I would guess there is a way to achieve what you need. You can actually use the length of the batch dimension dynamically to draw a uniform vector of that size.
Edit: After you updated the question I feel like I was right about my suspicion. There is no need for a and b to be placeholders, just make them Python variables, like this:
import tensorflow as tf
# Matrix shape must be known in advance, but can of course still be specified
# in some settings file or at the beginning of the python skript
A = 2
B = 2
W = tf.Variable(tf.sign(tf.random_uniform(shape=(A, B), minval=-1.0,
maxval=1.0)))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(W))
Following code uses tensorflow library and it runs terribly slower compared to numpy library. I am aware that I am calling a function that uses tensorflow library inside python for loop (which I will parallelize with python multiprocessing later), but the code as is, runs extremely slow.
Could someone please help how I can make this code run faster? Thanks.
from math import *
import numpy as np
import sys
from multiprocessing import Pool
import tensorflow as tf
def Trajectory_Fun(tspan, a, b, session=None, server=None):
# Open tensorflow session
if session==None:
if server==None:
sess = tf.Session()
else:
sess = tf.Session(server.target)
else:
sess = session
B = np.zeros(np.size(tspan), dtype=np.float64)
B[0] = b
for i, t in enumerate(tspan):
r = np.random.rand(1)
if r>a:
c = sess.run(tf.trace(tf.random_normal((4, 4), r, 1.0)))
else:
c = 0.0 # sess.run(tf.trace(tf.random_normal((4, 4), 0.0, 1.0)))
B[i] = c
# Close tensorflow session
if session==None:
sess.close()
return B
def main(argv):
# Parameters
tspan = np.arange(0.0, 1000.0)
a = 0.1
b = 0.0
# Run test program
B = Trajectory_Fun(tspan, a, b, None, None)
print 'Done!'
if __name__ == "__main__":
main(sys.argv[1:])
As stated in your question, this program will give poor performance because it creates several new TensorFlow graph nodes per operation. The underlying assumption in TensorFlow is (approximately) that you'll build a graph once and then call sess.run() on (various parts of) it multiple times. The first time you run a graph is relatively expensive, because TensorFlow has to build various data structures and optimize the execution of the graph across multiple devices.
However, TensorFlow caches this work, so subsequent uses are much cheaper.
You can make this program much faster by constructing the graph once and using (e.g.) a tf.placeholder() op to feed in the value that changes in each iteration. For example, the following should do the trick:
B = np.zeros(np.size(tspan), dtype=np.float64)
B[0] = b
# Define the TensorFlow graph once and reuse it in each iteration of the for loop.
r_placeholder = tf.placeholder(tf.float32, shape=[])
out_t = tf.trace(tf.random_normal((4, 4), r_placeholder, 1.0))
with tf.Session() as sess:
for i, t in enumerate(tspan):
r = np.random.rand(1)
if r > a:
c = sess.run(out_t, feed_dict={r_placeholder: r})
else:
c = 0.0
B[i] = c
return B
You could potentially make this even more efficient by using a TensorFlow loop and making fewer calls to sess.run(), but the general principle is the same: reuse same the graph multiple times to get the benefit of TensorFlow.
How can I choose to execute a portion of the graph based on a condition?
I have a part of my network which is to be executed only if a placeholder value is provided in feed_dict. An alternate path is taken if the value is not provided. How do I go about implementing this using tensorflow?
Here are the relevant portions of my code:
sess.run(accuracy, feed_dict={inputs: mnist.test.images, outputs: mnist.test.labels})
N = tf.shape(outputs)
cost = 0
if N > 0:
y_N = tf.slice(h_c, [0, 0], N)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(y_N, outputs, name='xentropy')
cost = tf.reduce_mean(cross_entropy, name='xentropy_mean')
In the above code, I'm looking for something to use in the place of if N > 0:
Hrm. It's possible that what you want is tf.control_flow_ops.cond()
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/control_flow_ops.py#L597
But that's not exported into the tf namespace, and I'm answering without checking how guaranteed-stable this interface is, but it's used in released models, so go for it. :)
However: Because you actually know in advance what path you want when you construct the feed_dict, you could also take a different approach of invoking a separate path through your model. The standard way to do this is to, e.g., set up code like:
def model(input, n_greater_than):
... cleverness ...
if n_greater_than:
... other cleverness...
return tf.reduce_mean(input)
out1 = model(input, True)
out2 = model(input, False)
And then pull the out1 or out2 nodes depending upon what you know when you're about to run your computation and set the feed_dict. Remember that by default, if the model references the same variables (create them outside the model() func), then you'll basically have two separate paths through.
You can see an example of this in the convolutional mnist example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/mnist/convolutional.py#L165
I'm a fan of doing it this way without introducing control flow dependencies if you can.
Here is a simple example, that can get you started. It executes different parts of the graph based on the shape of the tensor:
import tensorflow as tf
a = tf.Variable([[3.0, 3.0], [3.0, 3.0]])
b = tf.Variable([[1.0, 1.0], [2.0, 2.0]])
l = tf.shape(a)
add_op, sub_op = tf.add(a, b), tf.sub(a, b)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
t = sess.run(l)
print sess.run(sub_op if t[0] == 3 else add_op)
sess.close()
Change 3 to 2 to see how tensor will be subtracted. As you see I initiated the nodes for add and sub and shape, then in the graph I check for the shape and go run the specific part.