Tensorflow AssertionError "gradients list should have been aggregated by now" - tensorflow

I have a function f that is internally using some tf.while_loops and tf.gradients to compute the value y = f(x). Something like this
def f( x ):
...
def body( g, x ):
# Compute the gradient here
grad = tf.gradients( g, x )[0]
...
return ...
return tf.while_loop( cond, body, parallel_iterations=1 )
There are a few hundred lines of code. But I believe that those are the important points...
Now when I evaluate f(x), I get exactly the value I expect ..
y = known output of f(x)
with tf.Session() as sess:
fx = f(x)
print("Error = ", y - sess.run(fx, feed_dict)) # Prints 0
However, when I try to evaluate the gradient of f(x) with respect to x, that is,
grads = tf.gradients( fx, x )[0]
I get the error
AssertionError: gradients list should have been aggregated by now.
Here is the full trace:
File "C:/Dropbox/bob/tester.py", line 174, in <module>
grads = tf.gradients(y, x)[0]
File "C:\Anaconda36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 649, in gradients
return [_GetGrad(grads, x) for x in xs]
File "C:\Anaconda36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 649, in <listcomp>
return [_GetGrad(grads, x) for x in xs]
File "C:\Anaconda36\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 727, in _GetGrad
"gradients list should have been aggregated by now.")
AssertionError: gradients list should have been aggregated by now.
Could somebody please outline likely causes for this error? I have no idea where to even start looking for the issue...
Some observations:
Note that I have set the parallel iterations for the while loop to 1. This
should mean that there is no errors due to reading and writing from multiple threads.
If I discard the while loop, and just have f return body(), then the code runs:
# The following does not crash, but we removed the while_loop, so the output is incorrect
def f( x ):
...
def body( g, x ):
# Compute the gradient here
grad = tf.gradients( g, x )[0]
...
return ...
return body(...)
Obviously, the output is incorrect, but at least the gradients are computed.

I came across a similar issue. Some patterns I noted:
If the x used in tf.gradients was used in a manner that required dimension broadcasting in body, I got this error. If I changed it to one that didn't require broadcasting, tf.gradients returned [None]. I didn't test this extensively, so this pattern may not be consistent across all examples.
Both cases (returning [None] and raising this assertion error) can be resolved by differentiating tf.identity(y) rather than just y: grads = tf.gradients(tf.identity(y), xs) I have absolutely no idea why this works.

Related

Why are gradients disconnected

Consider the following code
#tf.function
def get_derivatives(function_to_diff,X):
f = function_to_diff(X)
## Derivatives
W = X[:,0]
Z = X[:,1]
V = X[:,2]
df_dW = tf.gradients(f, X[:,0])
return df_dW
I wanted get_derivatives to return the partial derivative of function_to_diff with respect to the first element of X.
However, when I run
def test_function(X):
return tf.pow(X[:,0],2) * X[:,1] * X[:,2]
get_derivatives(test_function,X)
I get None.
If I use unconnected_gradients='zero' for tf.graidents, I'd get zeros. In other words, the gradients are disconnected.
Questions
Why are the gradients disconnected?
How can I get the derivative with respect to the first element of X, i.e. how can I restore the connection? I know that if I wrote
def test_function(x,y,z)
return tf.pow(x,2) * y * z
#tf.function
def get_derivatives(function_to_diff,x,y,z):
f = function_to_diff(x,y,z)
df_dW = tf.gradients(f, x)
return df_dW
This could fix the problem. What if my function can only take in one argument, i.e. what if my function looks like test_function(X)? For example, test_function could be a trained neural network that takes in only one argument.

Optimization (scipy.optimize) L-BFGS-B wrapper args treating array elements as one variable

I am unable to understand the source of this error:
line 327, in function_wrapper
return function(*(wrapper_args + args))
TypeError: SSVOptionPriceObjFunc() missing 1 required positional argument: 'marketVolSurface'
The relevant code is below:
x0 = [1.0, 0.0] # (lambda0, rho)
x0 = np.asarray(x0)
args = (spot, 0.01*r, daysInYear, mktPrices, volSurface)
# constraints: lambd0 >0, -1<= rho <=1
boundsHere = ((0, None), (-1, 1))
res = minimize(SSVOptionPriceObjFunc, x0, args, method='L-BFGS-B', jac=None,
bounds=boundsHere,options={'xtol': 1e-8, 'disp': True})
The function to be minimized is below. The first two arguments are the free variables, while the other five are fixed as parameters.
def SSVOptionPriceObjFunc(lambda0, rho, spot, spotInterestRate, daysInYear, marketPrices,
marketVolSurface):
My intention is to find (lambd0, rho) giving a minimum. From the debugger, it seems that my initial guess x0 is interpreted as a single variable, not as a vector, giving the error about a missing positional argument. I have tried passing x0 as a list, tuple, and ndarray; all fail. Can someone spot an error, or suggest a workaround? Thank you in advance.
Update: I have found a solution: use a wrapper function from the functools package to set the parameters.
import functools as ft
SSVOptionPriceObjFuncWrapper = ft.partial(SSVOptionPriceObjFunc, spot=spot,
spotInterestRate=0.01 * r, daysInYear=daysInYear, marketPrices=mktPrices,
marketVolSurface=volSurface)
Then pass SSVOptionPriceObjFuncWrapper to the minimizer with args = None
Thank you for the replies.
Take the documented minimize inputs seriously. It's your job to write the function to fit what minimize does, not the other way around.
scipy.optimize.minimize(fun, x0, args=(),
fun: callable
The objective function to be minimized.
fun(x, *args) -> float
where x is an 1-D array with shape (n,) and args is a tuple of the fixed
parameters needed to completely specify the function.

tf.nn.softmax behaving strangely

I am learning LSTM with tensorflow with enable_eager_execution. However when implementing LSTM, I have noticed the behaviour of tf.nn.softmax
that has made me stuck. Here is a section of my code
class RNN_LSTM(object):
def __init__(self,hidden_size):
data=open('Shakespear.txt', 'r').read()
self.data = data.split()
vocab_size=len(list(set(self.data)))
self.words =list(set(self.data))
self.hidden_size=hidden_size
self.input_size=vocab_size+hidden_size
self.vocab_size=vocab_size
self.W1=tf.Variable(tf.random.uniform((self.hidden_size,self.input_size),dtype=tf.dtypes.float32,name="W1")*0.1)
self.b1=tf.Variable(tf.random.uniform((self.hidden_size,1),dtype=tf.dtypes.float32,name="b1"))
self.W2=tf.Variable(tf.random.uniform((self.hidden_size,self.input_size),dtype=tf.dtypes.float32,name="W2")*0.1)
self.b2=tf.Variable(tf.random.uniform((self.hidden_size,1),dtype=tf.dtypes.float32,name="b2")*0.1)
self.W3=tf.Variable(tf.random.uniform((self.hidden_size,self.input_size),dtype=tf.dtypes.float32,name="W3")*0.1)
self.b3=tf.Variable(tf.random.uniform((self.hidden_size,1),dtype=tf.dtypes.float32,name="b3")*0.1)
self.W4=tf.Variable(tf.random.uniform((hidden_size,self.input_size),dtype=tf.dtypes.float32,name="W4")*0.1)
self.b4=tf.Variable(tf.random.uniform((self.hidden_size,1),dtype=tf.dtypes.float32,name="b4")*0.1)
self.W5=tf.Variable(tf.random.uniform((self.vocab_size,self.hidden_size),dtype=tf.dtypes.float32,name="W5")*0.1)
self.b5=tf.Variable(tf.random.uniform((self.vocab_size,1),dtype=tf.dtypes.float32,name="b5")*0.1)
self.learning_rate=1e-1
self.sequence_length=50
#self.M_c=tf.Variable(tf.zeros((self.input_size,1)),name="M_c")
def one_hot_encoding(self,x,hprev):
M_c=tf.Variable(tf.zeros((self.input_size,1)),name="M_c")
vocab=tf.Variable(tf.zeros((self.vocab_size,1)))
#hprev=tf.Variable(tf.zeros((self.hidden_size,1)))
vocab=vocab.numpy()
vocab[x]=1
M_c=tf.concat((hprev,vocab),axis=0)
return M_c
def feedforward(self,M_c,p_s):
ft=tf.sigmoid( tf.matmul(self.W1,M_c)+self.b1)
it=tf.sigmoid(tf.matmul(self.W2,M_c)+self.b2)
gt=tf.math.tanh(tf.matmul(self.W3,M_c)+self.b3)
cs=tf.multiply(ft,p_s)+tf.multiply(it,gt)
ot=tf.nn.sigmoid(tf.matmul(self.W4,M_c)+self.b4)
ht=tf.multiply(ot,tf.math.tanh(cs))
output=self.softmax(tf.matmul(self.W5,ht)+self.b5)
return ht,output,cs
def sample_text(self,hprev,begin,p_s,n):
vocab=tf.Variable(tf.zeros((self.vocab_size,1)),tf.float32)
vocab=vocab.numpy()
vocab[begin]=1
letters=[]
for i in range(n):
M=tf.Variable(tf.zeros((self.input_size,1)),name="M")
M=tf.assign(M,tf.concat((hprev,vocab),axis=0))
ft=tf.nn.sigmoid(tf.matmul(self.W1,M)+self.b1)
it=tf.nn.sigmoid(tf.matmul(self.W2,M)+self.b2)
gt=tf.math.tanh(tf.matmul(self.W3,M)+self.b3)
cs=tf.multiply(ft,p_s)+tf.multiply(it,gt)
p_s=cs
ot=tf.sigmoid(tf.matmul(self.W4,M)+self.b4)
ht=tf.multiply(ot,tf.math.tanh(cs))
ht=tf.reshape(ht,(self.hidden_size,1))
output=tf.matmul(self.W5,ht)+self.b5
p=self.softmax(output)
#print(p.numpy())
p=tf.reshape(p,(1,self.vocab_size))
samples = tf.random.categorical(p,1)
sample_selected=tf.cast(samples[0][0].numpy(),tf.int32)
selection_sample_np=[i for i in range(self.vocab_size)]
selection_sample_tf=tf.convert_to_tensor(selection_sample_np)
selected_next_letter=selection_sample_tf[sample_selected]
trial=tf.cast(selected_next_letter,tf.int32)
k=tf.Variable(tf.zeros((self.vocab_size,1)),tf.int32)
k[selected_next_letter,0].assign(1)
letters.append(selected_next_letter)
hprev=ht
return letters
def process_input(self):
char_to_ix={ch:ix for ix,ch in enumerate(self.words)}
ix_to_char={ix:ch for ix,ch in enumerate(self.words)}
return char_to_ix,ix_to_char
def softmax(self,z):
return tf.math.exp(z-max(z))/tf.math.reduce_sum(tf.math.exp(z-max(z)))
def AggregatorNew(self):
losses,iterations=[],[]
char_to_ix,ix_to_char=self.process_input()
mem1=tf.Variable(tf.zeros_like(self.W1))
mem2=tf.Variable(tf.zeros_like(self.W2))
mem3=tf.Variable(tf.zeros_like(self.W3))
mem4=tf.Variable(tf.zeros_like(self.W4))
mem5=tf.Variable(tf.zeros_like(self.W5))
mem6=tf.Variable(tf.zeros_like(self.b1))
mem7=tf.Variable(tf.zeros_like(self.b2))
mem8=tf.Variable(tf.zeros_like(self.b3))
mem9=tf.Variable(tf.zeros_like(self.b4))
mem10=tf.Variable(tf.zeros_like(self.b5))
dW1=tf.Variable(tf.zeros_like(self.W1))
dW2=tf.Variable(tf.zeros_like(self.W2))
dW3=tf.Variable(tf.zeros_like(self.W3))
dW4=tf.Variable(tf.zeros_like(self.W4))
dW5=tf.Variable(tf.zeros_like(self.W4))
db1=tf.Variable(tf.zeros_like(self.b1))
db2=tf.Variable(tf.zeros_like(self.b2))
db3=tf.Variable(tf.zeros_like(self.b3))
db4=tf.Variable(tf.zeros_like(self.b4))
db5=tf.Variable(tf.zeros_like(self.b5))
n=0
p=0
self.loss=tf.Variable(0,dtype=tf.dtypes.float32,name="loss")
smooth_loss =-tf.math.log(1.0/self.vocab_size)*self.sequence_length
while(1):
try:
with DelayedKeyboardInterrupt():
if p+self.sequence_length+1>= len(self.data) or n == 0:
hprev=tf.Variable(np.zeros((self.hidden_size,1)),dtype=tf.float32,name="hprev")
p_s=tf.Variable(tf.zeros((self.hidden_size,1)),name="p_s")
p=0
inputs=[char_to_ix[ch] for ch in self.data[p:p+self.sequence_length]]
targets=[char_to_ix[ch] for ch in self.data[p+1:p+self.sequence_length+1]]
sample_ix = self.sample_text(hprev,inputs[0],p_s,200)
list_of_strings=[ix_to_char[ix.numpy()] for ix in sample_ix]
list_of_strings_tf=tf.convert_to_tensor(list_of_strings)
txt = tf.strings.join(list_of_strings_tf,separator=" ")
print ('----\n %s \n----' % (txt.numpy(), ))
#loss=tf.reduce_mean(xentropy,name="loss")
with tf.GradientTape() as g:
for x, y in zip(inputs,targets):
M_c=self.one_hot_encoding(x,hprev)
hprev,output,p_s=self.feedforward(M_c,p_s)
activation=output[y]
loss=-(tf.math.log(activation))
dW1,dW2,dW3,dW4,dW5,db1,db2,db3,db4,db5=g.gradient(loss,[self.W1,self.W2,self.W3,self.W4,self.W5,self.b1,self.b2,self.b3,self.b4,self.b5])
smooth_loss = smooth_loss * 0.999 + loss * 0.001
except KeyboardInterrupt:
sample_ix = self.sample_text(hprev,inputs[0],p_s,200)
txt = ''.join(ix_to_char[ix] for ix in sample_ix)
print ('----\n %s \n----' % (txt, ))
break
when I use self.softmax() it gives me probability values in the output in the feedforward, however when I use tf.nn.softmax() all values of output are strangely 1.
Second question: Is tensorflow generally slower in cpu as compared to a pure python implementation or i am implementing tensorlow wrongly?
If you are using tf.nn.softmax(), and you don't specify the axis, it defaults to tf.nn.softmax(logits ,axis=1) hence giving a tensor ouput where all values are 1s . In my case I was getting wrong values just because of not providing axis i.e tf.nn.softmax(logits,axis=0)

while_loop error in Tensorflow

I tried to use while_loop in Tensorflow, but when I try to return the target output from callable in while loop, it gives me an error because the shape is increased every time.
The output should be contains (0 or 1) values based on data value (input array). If data value is large than 5 return 1 else return 0. The returned value must be added into output
This is the code::
import numpy as np
import tensorflow as tf
data = np.random.randint(10, size=(30))
data = tf.constant(data, dtype= tf.float32)
global output
output= tf.constant([], dtype= tf.float32)
i = tf.constant(0)
c = lambda i: tf.less(i, 30)
def b(i):
i= tf.add(i,1)
cond= tf.cond(tf.greater(data[i-1], tf.constant(5.)), lambda: tf.constant(1.0), lambda: tf.constant([0.0]))
output =tf.expand_dims(cond, axis = i-1)
return i, output
r,out = tf.while_loop(c, b, [i])
print(out)
sess= tf.Session()
sess.run(out)
The error::
r, out = tf.while_loop(c, b, [i])
ValueError: The two structures don't have the same number of elements.
First structure (1 elements): [tf.Tensor 'while/Identity:0' shape=()
dtype=int32]
Second structure (2 elements): [tf.Tensor 'while/Add:0' shape=()
dtype=int32, tf.Tensor 'while/ExpandDims:0' shape=unknown
dtype=float32>]
I use tensorflow-1.1.3 and python-3.5
How can I change my code to gives me the target result?
EDIT::
I edit the code based on #mrry answer, but I still have an issue that the output is incorrect answer
the output is numbers summation
a = tf.ones([10,4])
print(a)
a = tf.reduce_sum(a, axis = 1)
i =tf.constant(0)
c = lambda i, _:tf.less(i,10)
def Smooth(x):
return tf.add(x,2)
summ = tf.constant(0.)
def b(i,_):
global summ
summ = tf.add(summ, tf.cast(Smooth(a[i]), tf.float32))
i= tf.add(i,1)
return i, summ
r, smooth_l1 = tf.while_loop(c, b, [i, smooth_l1])
print(smooth_l1)
sess = tf.Session()
print(sess.run(smooth_l1))
the out put is 6.0 (wrong).
The tf.while_loop() function requires that the following four lists have the same length, and the same type for each element:
The list of arguments to the cond function (c in this case).
The list of arguments to the body function (b in this case).
The list of return values from the body function.
The list of loop_vars representing the loop variables.
Therefore, if your loop body has two outputs, you must add a corresponding argument to b and c, and a corresponding element to loop_vars:
c = lambda i, _: tf.less(i, 30)
def b(i, _):
i = tf.add(i, 1)
cond = tf.cond(tf.greater(data[i-1], tf.constant(5.)),
lambda: tf.constant(1.0),
lambda: tf.constant([0.0]))
# NOTE: This line fails with a shape error, because the output of `cond` has
# a rank of either 0 or 1, but axis may be as large as 28.
output = tf.expand_dims(cond, axis=i-1)
return i, output
# NOTE: Use a shapeless `tf.placeholder_with_default()` because the shape
# of the output will vary from one iteration to the next.
r, out = tf.while_loop(c, b, [i, tf.placeholder_with_default(0., None)])
As noted in the comments, the body of the loop (specifically the call to tf.expand_dims()) seems to be incorrect and this program won't work as-is, but hopefully this is enough to get you started.
If you see this error:
ValueError: The two structures don't have the same number of elements.
If you see it in a while_loop, that means your inputs and outputs out of the while loop have different shapes.
I solved it by making sure that I return the same structure of loop_vars from my while loop function, the condition function must also accept same loop vars.
Here is an example code
loop_vars = [i, loss, batch_size, smaller_str_lens]
def condition(*loop_vars):
i = loop_vars[0]
batch_size = loop_vars[2]
return tf.less(i, batch_size)
def body(*loop_vars):
i, loss, batch_size, smaller_str_lens = loop_vars
tf.print("The loop passed here")
## logic here
i = tf.add(i, 1)
return i, loss, batch_size, smaller_str_lens
loss = tf.while_loop(condition, compare_strings, loop_vars)[1]
The body func must return loop vars, and the condition func must accept loop vars

How to use maxout activation function in tensorflow?

I want to use maxout activation function in tensorflow, but I don't know which function should use.
I sent a pull request for maxout, here is the link:
https://github.com/tensorflow/tensorflow/pull/5528
Code is as follows:
def maxout(inputs, num_units, axis=None):
shape = inputs.get_shape().as_list()
if axis is None:
# Assume that channel is the last dimension
axis = -1
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not a multiple of num_units({})'
.format(num_channels, num_units))
shape[axis] = -1
shape += [num_channels // num_units]
outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False)
return outputs
Here is how it works:
I don't think there is a maxout activation but there is nothing stopping yourself from making it yourself. You could do something like the following.
with tf.variable_scope('maxout'):
layer_input = ...
layer_output = None
for i in range(n_maxouts):
W = tf.get_variable('W_%d' % d, (n_input, n_output))
b = tf.get_variable('b_%d' % i, (n_output,))
y = tf.matmul(layer_input, W) + b
if layer_output is None:
layer_output = y
else:
layer_output = tf.maximum(layer_output, y)
Note that this is code I just wrote in my browser so there may be syntax errors but you should get the general idea. You simply perform a number of linear transforms and take the maximum across all the transforms.
How about this code?
This seems to work in my test.
def max_out(input_tensor,output_size):
shape = input_tensor.get_shape().as_list()
if shape[1] % output_size == 0:
return tf.transpose(tf.reduce_max(tf.split(input_tensor,output_size,1),axis=2))
else:
raise ValueError("Output size or input tensor size is not fine. Please check it. Reminder need be zero.")
I refer the diagram in the following page.
From version 1.4 on you can use tf.contrib.layers.maxout.
Maxout is a layer such that it calculates N*M output for a N*1 input, and then it returns the maximum value across the column, i.e., the final output has shape N*1 as well. Basically it uses multiple linear fittings to mimic a complex function.