Tensorflow: Random selection of masks - tensorflow

I know that this stackoverflow thread already gives some nice examples about conditionals in tensorflow, but I'm still struggling how to solve my issue of randomly selecting among several different masks in tensorflow.
Right now I can only select between two mask tensors a and b:
rand_num = tf.random_uniform([], minval=0, maxval=2.0, dtype=tf.float32, seed=None)
def if_true():
return b
def if_false():
return a
mask_sel = tf.cond(tf.less(rand_num , tf.constant(1.0)),if_true,if_false)
(I still find it weird that one needs to define these two helper functions, but not using them weirdly throws an error.)
Now the question: Lets say I have 4 mask tensors (a,b,c,d) or more to randomly select, what would be the best way to do that in tensorflow?
In python that would be
rand_num = np.random.uniform(low=0,high=4.0)
if (rand_num < 1.0):
mask_sel = a
elif(rand_num < 2.0):
mask_sel = b
elif(rand_num < 3.0):
mask_sel = c
else
mask_sel = d

About the helper functions, they are useful because they allow tensorflow to know which operations will run under each condition, this way it can optimize by running only the selected branch and ignoring the other. Operations outside the helper functions but used by any of them will always be run before tf.cond runs.
The other options is to use tf.select; you won't need the helper functions here but it will always evaluate both sides before running tf.select which can be inefficient if you don't need to.
Now for the main problem 'selecting from more than 2 tesnors', you can use multiple options:
1- Recursively nesting tf.cond operations:
def select_from_list(selector, tensor_list):
length = len(tensor_list)
if length == 0:
raise ValueError('List is empty')
elif length == 1:
return tensor_list[0]
else:
half = length // 2
return tf.cond(tf.less(selector, float(half)), lambda: select_from_list(selector, tensor_list[:half]), lambda: select_from_list(selector - half, tensor_list[half:]))
2- Using tf.case:
def select_from_list(selector, tensor_list):
length = len(tensor_list)
if length == 0:
raise ValueError('List is empty')
elif length == 1:
return tensor_list[0]
else:
def fn(tensor):
return lambda: tensor
pred_fn_pairs = [(tf.less(selector, float(i+1)), fn(tensor)) for i, tensor in enumerate(tensor_list)]
return tf.case(pred_fn_pairs, default=lambda:tensor_list[-1])
You can test any of them using:
def test(selector, value_list, sess):
return select_from_list(float(selector), [tf.constant(value) for value in value_list]).eval(session = sess)
sess = tf.Session()
test(3.5, [4,2,6,7,5], sess)
This should return 7

Related

Why are gradients disconnected

Consider the following code
#tf.function
def get_derivatives(function_to_diff,X):
f = function_to_diff(X)
## Derivatives
W = X[:,0]
Z = X[:,1]
V = X[:,2]
df_dW = tf.gradients(f, X[:,0])
return df_dW
I wanted get_derivatives to return the partial derivative of function_to_diff with respect to the first element of X.
However, when I run
def test_function(X):
return tf.pow(X[:,0],2) * X[:,1] * X[:,2]
get_derivatives(test_function,X)
I get None.
If I use unconnected_gradients='zero' for tf.graidents, I'd get zeros. In other words, the gradients are disconnected.
Questions
Why are the gradients disconnected?
How can I get the derivative with respect to the first element of X, i.e. how can I restore the connection? I know that if I wrote
def test_function(x,y,z)
return tf.pow(x,2) * y * z
#tf.function
def get_derivatives(function_to_diff,x,y,z):
f = function_to_diff(x,y,z)
df_dW = tf.gradients(f, x)
return df_dW
This could fix the problem. What if my function can only take in one argument, i.e. what if my function looks like test_function(X)? For example, test_function could be a trained neural network that takes in only one argument.

How to calculate cosine similarity given sparse matrix data in TensorFlow?

I'm supposed to change part of a python script on the GitHub website. This code is an attention-based similarity measure, but I want to turn it to cosine similarity.
The respective code is in the layers.py file (inside the call method).
Attention-Based:
def __call__(self, inputs):
x = inputs
# dropout
if self.sparse_inputs:
x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
else:
x = tf.nn.dropout(x, 1-self.dropout)
# graph learning
h = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
N = self.num_nodes
edge_v = tf.abs(tf.gather(h,self.edge[0]) - tf.gather(h,self.edge[1]))
edge_v = tf.squeeze(self.act(dot(edge_v, self.vars['a'])))
sgraph = tf.SparseTensor(indices=tf.transpose(self.edge), values=edge_v, dense_shape=[N, N])
sgraph = tf.sparse_softmax(sgraph)
return h, sgraph
I edited the above code to what I believe are my requirements (cosine similarity). However, when I run the following code, like so:
def __call__(self, inputs):
x = inputs
# dropout
if self.sparse_inputs:
x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
else:
x = tf.nn.dropout(x, 1-self.dropout)
# graph learning
h = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
N = self.num_nodes
h_norm = tf.nn.l2_normalize(h)
edge_v = tf.matmul(h_norm, tf.transpose(h_norm))
h_norm_1 = tf.norm(h_norm)
edge_v /= h_norm_1 * h_norm_1
edge_v = dot(edge_v, self.vars['a']) # It causes an error when I add this line
zero = tf.constant(0, dtype=tf.float32)
where = tf.not_equal(edge_v, zero)
indices = tf.where(where)
values = tf.gather_nd(edge_v, indices)
sgraph = tf.SparseTensor(indices, values, dense_shape= [N,N])
return h, sgraph
The script shows some runtime errors:
Screenshot of error message
I suspect the error here is related to line 226:
edge_v = dot(edge_v, self.vars['a']) # It causes an error when I add this line
Any admonition on how to accomplish this successfully?
Link of the script on GitHub:
https://github.com/jiangboahu/GLCN-tf
Note: I don't want to use built-in functions, because I think they are not precise to do this job.
ETA: It appears that there are some answers around but they seem to tackle different problems, as far, as I understood them.
Thanks a bunch in advance
What's the dot? Have you imported the method?
It should either be:
edge_v = tf.keras.backend.dot(edge_v, self.vars['a'])
or
edge_v = tf.tensordot(edge_v, self.vars['a'])

tf.nn.softmax behaving strangely

I am learning LSTM with tensorflow with enable_eager_execution. However when implementing LSTM, I have noticed the behaviour of tf.nn.softmax
that has made me stuck. Here is a section of my code
class RNN_LSTM(object):
def __init__(self,hidden_size):
data=open('Shakespear.txt', 'r').read()
self.data = data.split()
vocab_size=len(list(set(self.data)))
self.words =list(set(self.data))
self.hidden_size=hidden_size
self.input_size=vocab_size+hidden_size
self.vocab_size=vocab_size
self.W1=tf.Variable(tf.random.uniform((self.hidden_size,self.input_size),dtype=tf.dtypes.float32,name="W1")*0.1)
self.b1=tf.Variable(tf.random.uniform((self.hidden_size,1),dtype=tf.dtypes.float32,name="b1"))
self.W2=tf.Variable(tf.random.uniform((self.hidden_size,self.input_size),dtype=tf.dtypes.float32,name="W2")*0.1)
self.b2=tf.Variable(tf.random.uniform((self.hidden_size,1),dtype=tf.dtypes.float32,name="b2")*0.1)
self.W3=tf.Variable(tf.random.uniform((self.hidden_size,self.input_size),dtype=tf.dtypes.float32,name="W3")*0.1)
self.b3=tf.Variable(tf.random.uniform((self.hidden_size,1),dtype=tf.dtypes.float32,name="b3")*0.1)
self.W4=tf.Variable(tf.random.uniform((hidden_size,self.input_size),dtype=tf.dtypes.float32,name="W4")*0.1)
self.b4=tf.Variable(tf.random.uniform((self.hidden_size,1),dtype=tf.dtypes.float32,name="b4")*0.1)
self.W5=tf.Variable(tf.random.uniform((self.vocab_size,self.hidden_size),dtype=tf.dtypes.float32,name="W5")*0.1)
self.b5=tf.Variable(tf.random.uniform((self.vocab_size,1),dtype=tf.dtypes.float32,name="b5")*0.1)
self.learning_rate=1e-1
self.sequence_length=50
#self.M_c=tf.Variable(tf.zeros((self.input_size,1)),name="M_c")
def one_hot_encoding(self,x,hprev):
M_c=tf.Variable(tf.zeros((self.input_size,1)),name="M_c")
vocab=tf.Variable(tf.zeros((self.vocab_size,1)))
#hprev=tf.Variable(tf.zeros((self.hidden_size,1)))
vocab=vocab.numpy()
vocab[x]=1
M_c=tf.concat((hprev,vocab),axis=0)
return M_c
def feedforward(self,M_c,p_s):
ft=tf.sigmoid( tf.matmul(self.W1,M_c)+self.b1)
it=tf.sigmoid(tf.matmul(self.W2,M_c)+self.b2)
gt=tf.math.tanh(tf.matmul(self.W3,M_c)+self.b3)
cs=tf.multiply(ft,p_s)+tf.multiply(it,gt)
ot=tf.nn.sigmoid(tf.matmul(self.W4,M_c)+self.b4)
ht=tf.multiply(ot,tf.math.tanh(cs))
output=self.softmax(tf.matmul(self.W5,ht)+self.b5)
return ht,output,cs
def sample_text(self,hprev,begin,p_s,n):
vocab=tf.Variable(tf.zeros((self.vocab_size,1)),tf.float32)
vocab=vocab.numpy()
vocab[begin]=1
letters=[]
for i in range(n):
M=tf.Variable(tf.zeros((self.input_size,1)),name="M")
M=tf.assign(M,tf.concat((hprev,vocab),axis=0))
ft=tf.nn.sigmoid(tf.matmul(self.W1,M)+self.b1)
it=tf.nn.sigmoid(tf.matmul(self.W2,M)+self.b2)
gt=tf.math.tanh(tf.matmul(self.W3,M)+self.b3)
cs=tf.multiply(ft,p_s)+tf.multiply(it,gt)
p_s=cs
ot=tf.sigmoid(tf.matmul(self.W4,M)+self.b4)
ht=tf.multiply(ot,tf.math.tanh(cs))
ht=tf.reshape(ht,(self.hidden_size,1))
output=tf.matmul(self.W5,ht)+self.b5
p=self.softmax(output)
#print(p.numpy())
p=tf.reshape(p,(1,self.vocab_size))
samples = tf.random.categorical(p,1)
sample_selected=tf.cast(samples[0][0].numpy(),tf.int32)
selection_sample_np=[i for i in range(self.vocab_size)]
selection_sample_tf=tf.convert_to_tensor(selection_sample_np)
selected_next_letter=selection_sample_tf[sample_selected]
trial=tf.cast(selected_next_letter,tf.int32)
k=tf.Variable(tf.zeros((self.vocab_size,1)),tf.int32)
k[selected_next_letter,0].assign(1)
letters.append(selected_next_letter)
hprev=ht
return letters
def process_input(self):
char_to_ix={ch:ix for ix,ch in enumerate(self.words)}
ix_to_char={ix:ch for ix,ch in enumerate(self.words)}
return char_to_ix,ix_to_char
def softmax(self,z):
return tf.math.exp(z-max(z))/tf.math.reduce_sum(tf.math.exp(z-max(z)))
def AggregatorNew(self):
losses,iterations=[],[]
char_to_ix,ix_to_char=self.process_input()
mem1=tf.Variable(tf.zeros_like(self.W1))
mem2=tf.Variable(tf.zeros_like(self.W2))
mem3=tf.Variable(tf.zeros_like(self.W3))
mem4=tf.Variable(tf.zeros_like(self.W4))
mem5=tf.Variable(tf.zeros_like(self.W5))
mem6=tf.Variable(tf.zeros_like(self.b1))
mem7=tf.Variable(tf.zeros_like(self.b2))
mem8=tf.Variable(tf.zeros_like(self.b3))
mem9=tf.Variable(tf.zeros_like(self.b4))
mem10=tf.Variable(tf.zeros_like(self.b5))
dW1=tf.Variable(tf.zeros_like(self.W1))
dW2=tf.Variable(tf.zeros_like(self.W2))
dW3=tf.Variable(tf.zeros_like(self.W3))
dW4=tf.Variable(tf.zeros_like(self.W4))
dW5=tf.Variable(tf.zeros_like(self.W4))
db1=tf.Variable(tf.zeros_like(self.b1))
db2=tf.Variable(tf.zeros_like(self.b2))
db3=tf.Variable(tf.zeros_like(self.b3))
db4=tf.Variable(tf.zeros_like(self.b4))
db5=tf.Variable(tf.zeros_like(self.b5))
n=0
p=0
self.loss=tf.Variable(0,dtype=tf.dtypes.float32,name="loss")
smooth_loss =-tf.math.log(1.0/self.vocab_size)*self.sequence_length
while(1):
try:
with DelayedKeyboardInterrupt():
if p+self.sequence_length+1>= len(self.data) or n == 0:
hprev=tf.Variable(np.zeros((self.hidden_size,1)),dtype=tf.float32,name="hprev")
p_s=tf.Variable(tf.zeros((self.hidden_size,1)),name="p_s")
p=0
inputs=[char_to_ix[ch] for ch in self.data[p:p+self.sequence_length]]
targets=[char_to_ix[ch] for ch in self.data[p+1:p+self.sequence_length+1]]
sample_ix = self.sample_text(hprev,inputs[0],p_s,200)
list_of_strings=[ix_to_char[ix.numpy()] for ix in sample_ix]
list_of_strings_tf=tf.convert_to_tensor(list_of_strings)
txt = tf.strings.join(list_of_strings_tf,separator=" ")
print ('----\n %s \n----' % (txt.numpy(), ))
#loss=tf.reduce_mean(xentropy,name="loss")
with tf.GradientTape() as g:
for x, y in zip(inputs,targets):
M_c=self.one_hot_encoding(x,hprev)
hprev,output,p_s=self.feedforward(M_c,p_s)
activation=output[y]
loss=-(tf.math.log(activation))
dW1,dW2,dW3,dW4,dW5,db1,db2,db3,db4,db5=g.gradient(loss,[self.W1,self.W2,self.W3,self.W4,self.W5,self.b1,self.b2,self.b3,self.b4,self.b5])
smooth_loss = smooth_loss * 0.999 + loss * 0.001
except KeyboardInterrupt:
sample_ix = self.sample_text(hprev,inputs[0],p_s,200)
txt = ''.join(ix_to_char[ix] for ix in sample_ix)
print ('----\n %s \n----' % (txt, ))
break
when I use self.softmax() it gives me probability values in the output in the feedforward, however when I use tf.nn.softmax() all values of output are strangely 1.
Second question: Is tensorflow generally slower in cpu as compared to a pure python implementation or i am implementing tensorlow wrongly?
If you are using tf.nn.softmax(), and you don't specify the axis, it defaults to tf.nn.softmax(logits ,axis=1) hence giving a tensor ouput where all values are 1s . In my case I was getting wrong values just because of not providing axis i.e tf.nn.softmax(logits,axis=0)

How to use the tf.case api of TensorFlow correctly?

I want to design a follow function for expanding any 1D/2D/3D matrix to a 4D matrix.
import tensorflow as tf
def inputs_2_4D(inputs):
_ranks = tf.rank(inputs)
return tf.case({tf.equal(_ranks, 3): lambda: tf.expand_dims(inputs, 3),
tf.equal(_ranks, 2): lambda: tf.expand_dims(tf.expand_dims(inputs, 0), 3),
tf.equal(_ranks, 1): lambda: tf.expand_dims(tf.expand_dims(tf.expand_dims(inputs, 0), 0), 3)},
default=lambda: tf.identity(inputs))
def run():
with tf.Session() as sess:
mat_1d = tf.constant([1, 1])
mat_2d = tf.constant([[1, 1]])
mat_3d = tf.constant([[[1, 1]]])
mat_4d = tf.constant([[[[1, 1]]]])
result = inputs_2_4D(mat_1d)
print(result.eval())
The function, however, cannot run well. It can only perform to output a 4-D matrix when the mat_3d and mat-4d tensors are passed into it. There will be some errors information if a 1D or 2D matrix are passed to the function.
When passing mat_3dormat_4dinto inputs_2_4D(), they can be expanded to a 4D matrix or original matrix:
mat_3d -----> [[[[1]
[1]]]]
mat_4d -----> [[[[1 1]]]]
When mat_1dormat_2dmatrixes are passed into inputs_2_4D, error information:
ValueError: dim 3 not in the interval [-2, 1]. for 'case/cond/ExpandDims' (op: 'ExpandDims') with input shapes: [2], [] and with computed input tensors: input[1] = <3>.
I tested another similar function before. That function can run correctly.
import tensorflow as tf
def test_2_4D(inputs):
_ranks = tf.rank(inputs)
return tf.case({tf.equal(_ranks, 3): lambda: tf.constant(3),
tf.equal(_ranks, 2): lambda: tf.constant(2),
tf.equal(_ranks, 1): lambda: tf.constant(1)},
default=lambda: tf.identity(inputs))
def run():
with tf.Session() as sess:
mat_1d = tf.constant([1, 1])
mat_2d = tf.constant([[1, 1]])
mat_3d = tf.constant([[[1, 1]]])
mat_4d = tf.constant([[[[1, 1]]]])
result = test_2_4D(mat_3d)
print(result.eval())
This function can correctly output the corresponding results when passing all of matrixes.
test_2_4D() RESULTS:
mat_1d -----> 1
mat_2d -----> 2
mat_3d -----> 3
mat_4d -----> [[[[1 1]]]]
I don't know why the correct branch in inputs_2_4D() cannot be found while the tf.equal() in each branch were executed. I feel that the 1st and 2nd branches in the function seem to still work if the input matrix is "mat_1d" or "mat_2d". So, the program will crash down. Please help me to analyze this problem!
I think I worked out what the problem is here. Turns out all condition/function pairs are evaluated. This can be revealed by giving the ops different names. The problem is that if your input is, say, rank 2, Tensorflow seems to still evaluate tf.equal(_ranks, 3): lambda: tf.expand_dims(inputs, 3). This leads to a crash because it cannot expand dim 3 for a rank-2 tensor (the maximum allowed value is 2).
This actually makes sense since with tf.case you're basically saying "I don't know which of these cases is going to be true at runtime, so check which one is appropriate and execute the corresponding function". However this means that Tensorflow needs to prepare execution paths for all possible cases, which in this case leads to invalid computations (trying to expand invalid dimensions).
At this point it would be nice to know a little more about your problem, i.e. why exactly you need that function. If you have different inputs and you simply want to bring them all to 4D, but each input always has the same dimensionality, consider simply using Python if-statements. Example:
inputs3d = tf.constant([[[1,1]]]) # this is always 3D
inputs2d = tf.constant([[1,1]]) # this is alwayas 2D
...
def inputs_2_4D(inputs):
_rank = len(inputs.shape.as_list())
if _rank == 3:
return tf.expand_dims(inputs, 3)
elif _rank == 2:
return tf.expand_dims(tf.expand_dims(inputs, 0), 3)
...
This will check the input rank while the graph is being built (not at runtime like tf.case) and really only prepare those expand_dims ops that are appropriate for the given input.
However if you have a single inputs tensor and this could have different ranks at different times of your program this would require a different solution. Please let us know which problem you're trying to solve!
I have implement the functionality I want through 2 ways. Now, I provide my code to share.
The 1st method based on tf.cond:
def inputs_2_4D(inputs):
_rank1d = tf.rank(inputs)
def _1d_2_2d(): return tf.expand_dims(inputs, 0)
def _greater_than_1d(): return tf.identity(inputs)
_tmp_2d = tf.cond(_rank1d < 2, _1d_2_2d, _greater_than_1d)
_rank2d = tf.rank(_tmp_2d)
def _2d_2_3d(): return tf.expand_dims(_tmp_2d, 0)
def _greater_than_2d(): return tf.identity(_tmp_2d)
_tmp_3d = tf.cond(_rank2d < 3, _2d_2_3d, _greater_than_2d)
_rank3d = tf.rank(_tmp_3d)
def _3d_2_4d(): return tf.expand_dims(_tmp_3d, 3)
def _greater_than_3d(): return tf.identity(_tmp_3d)
return (tf.cond(_rank3d < 4, _3d_2_4d, _greater_than_3d))
The 2nd method based on tf.case with tf.cond:
def inputs_2_4D_1(inputs):
_rank = tf.rank(inputs)
def _assign_original(): return tf.identity(inputs)
def _dummy(): return tf.expand_dims(inputs, 0)
_1d = tf.cond(tf.equal(_rank, 1), _assign_original, _dummy)
_2d = tf.cond(tf.equal(_rank, 2), _assign_original, _dummy)
_3d = tf.cond(tf.equal(_rank, 3), _assign_original, _dummy)
def _1d_2_4d(): return tf.expand_dims(tf.expand_dims(tf.expand_dims(_1d, 0), 0), 3)
def _2d_2_4d(): return tf.expand_dims(tf.expand_dims(_2d, 0), 3)
def _3d_2_4d(): return tf.expand_dims(_3d, 3)
return (tf.case({tf.equal(_rank, 1): _1d_2_4d,
tf.equal(_rank, 2): _2d_2_4d,
tf.equal(_rank, 3): _3d_2_4d},
default=_assign_original))
I think the efficiency of the 2nd method should be less than the 1st method's, because the function _dummy() always wastes 2 operations when allocating inputs into _1d,_2d,_3d respectively.

How to use maxout activation function in tensorflow?

I want to use maxout activation function in tensorflow, but I don't know which function should use.
I sent a pull request for maxout, here is the link:
https://github.com/tensorflow/tensorflow/pull/5528
Code is as follows:
def maxout(inputs, num_units, axis=None):
shape = inputs.get_shape().as_list()
if axis is None:
# Assume that channel is the last dimension
axis = -1
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not a multiple of num_units({})'
.format(num_channels, num_units))
shape[axis] = -1
shape += [num_channels // num_units]
outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False)
return outputs
Here is how it works:
I don't think there is a maxout activation but there is nothing stopping yourself from making it yourself. You could do something like the following.
with tf.variable_scope('maxout'):
layer_input = ...
layer_output = None
for i in range(n_maxouts):
W = tf.get_variable('W_%d' % d, (n_input, n_output))
b = tf.get_variable('b_%d' % i, (n_output,))
y = tf.matmul(layer_input, W) + b
if layer_output is None:
layer_output = y
else:
layer_output = tf.maximum(layer_output, y)
Note that this is code I just wrote in my browser so there may be syntax errors but you should get the general idea. You simply perform a number of linear transforms and take the maximum across all the transforms.
How about this code?
This seems to work in my test.
def max_out(input_tensor,output_size):
shape = input_tensor.get_shape().as_list()
if shape[1] % output_size == 0:
return tf.transpose(tf.reduce_max(tf.split(input_tensor,output_size,1),axis=2))
else:
raise ValueError("Output size or input tensor size is not fine. Please check it. Reminder need be zero.")
I refer the diagram in the following page.
From version 1.4 on you can use tf.contrib.layers.maxout.
Maxout is a layer such that it calculates N*M output for a N*1 input, and then it returns the maximum value across the column, i.e., the final output has shape N*1 as well. Basically it uses multiple linear fittings to mimic a complex function.