How to use maxout activation function in tensorflow? - tensorflow

I want to use maxout activation function in tensorflow, but I don't know which function should use.

I sent a pull request for maxout, here is the link:
https://github.com/tensorflow/tensorflow/pull/5528
Code is as follows:
def maxout(inputs, num_units, axis=None):
shape = inputs.get_shape().as_list()
if axis is None:
# Assume that channel is the last dimension
axis = -1
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not a multiple of num_units({})'
.format(num_channels, num_units))
shape[axis] = -1
shape += [num_channels // num_units]
outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False)
return outputs
Here is how it works:

I don't think there is a maxout activation but there is nothing stopping yourself from making it yourself. You could do something like the following.
with tf.variable_scope('maxout'):
layer_input = ...
layer_output = None
for i in range(n_maxouts):
W = tf.get_variable('W_%d' % d, (n_input, n_output))
b = tf.get_variable('b_%d' % i, (n_output,))
y = tf.matmul(layer_input, W) + b
if layer_output is None:
layer_output = y
else:
layer_output = tf.maximum(layer_output, y)
Note that this is code I just wrote in my browser so there may be syntax errors but you should get the general idea. You simply perform a number of linear transforms and take the maximum across all the transforms.

How about this code?
This seems to work in my test.
def max_out(input_tensor,output_size):
shape = input_tensor.get_shape().as_list()
if shape[1] % output_size == 0:
return tf.transpose(tf.reduce_max(tf.split(input_tensor,output_size,1),axis=2))
else:
raise ValueError("Output size or input tensor size is not fine. Please check it. Reminder need be zero.")
I refer the diagram in the following page.

From version 1.4 on you can use tf.contrib.layers.maxout.

Maxout is a layer such that it calculates N*M output for a N*1 input, and then it returns the maximum value across the column, i.e., the final output has shape N*1 as well. Basically it uses multiple linear fittings to mimic a complex function.

Related

Why are gradients disconnected

Consider the following code
#tf.function
def get_derivatives(function_to_diff,X):
f = function_to_diff(X)
## Derivatives
W = X[:,0]
Z = X[:,1]
V = X[:,2]
df_dW = tf.gradients(f, X[:,0])
return df_dW
I wanted get_derivatives to return the partial derivative of function_to_diff with respect to the first element of X.
However, when I run
def test_function(X):
return tf.pow(X[:,0],2) * X[:,1] * X[:,2]
get_derivatives(test_function,X)
I get None.
If I use unconnected_gradients='zero' for tf.graidents, I'd get zeros. In other words, the gradients are disconnected.
Questions
Why are the gradients disconnected?
How can I get the derivative with respect to the first element of X, i.e. how can I restore the connection? I know that if I wrote
def test_function(x,y,z)
return tf.pow(x,2) * y * z
#tf.function
def get_derivatives(function_to_diff,x,y,z):
f = function_to_diff(x,y,z)
df_dW = tf.gradients(f, x)
return df_dW
This could fix the problem. What if my function can only take in one argument, i.e. what if my function looks like test_function(X)? For example, test_function could be a trained neural network that takes in only one argument.

How to calculate cosine similarity given sparse matrix data in TensorFlow?

I'm supposed to change part of a python script on the GitHub website. This code is an attention-based similarity measure, but I want to turn it to cosine similarity.
The respective code is in the layers.py file (inside the call method).
Attention-Based:
def __call__(self, inputs):
x = inputs
# dropout
if self.sparse_inputs:
x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
else:
x = tf.nn.dropout(x, 1-self.dropout)
# graph learning
h = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
N = self.num_nodes
edge_v = tf.abs(tf.gather(h,self.edge[0]) - tf.gather(h,self.edge[1]))
edge_v = tf.squeeze(self.act(dot(edge_v, self.vars['a'])))
sgraph = tf.SparseTensor(indices=tf.transpose(self.edge), values=edge_v, dense_shape=[N, N])
sgraph = tf.sparse_softmax(sgraph)
return h, sgraph
I edited the above code to what I believe are my requirements (cosine similarity). However, when I run the following code, like so:
def __call__(self, inputs):
x = inputs
# dropout
if self.sparse_inputs:
x = sparse_dropout(x, 1-self.dropout, self.num_features_nonzero)
else:
x = tf.nn.dropout(x, 1-self.dropout)
# graph learning
h = dot(x, self.vars['weights'], sparse=self.sparse_inputs)
N = self.num_nodes
h_norm = tf.nn.l2_normalize(h)
edge_v = tf.matmul(h_norm, tf.transpose(h_norm))
h_norm_1 = tf.norm(h_norm)
edge_v /= h_norm_1 * h_norm_1
edge_v = dot(edge_v, self.vars['a']) # It causes an error when I add this line
zero = tf.constant(0, dtype=tf.float32)
where = tf.not_equal(edge_v, zero)
indices = tf.where(where)
values = tf.gather_nd(edge_v, indices)
sgraph = tf.SparseTensor(indices, values, dense_shape= [N,N])
return h, sgraph
The script shows some runtime errors:
Screenshot of error message
I suspect the error here is related to line 226:
edge_v = dot(edge_v, self.vars['a']) # It causes an error when I add this line
Any admonition on how to accomplish this successfully?
Link of the script on GitHub:
https://github.com/jiangboahu/GLCN-tf
Note: I don't want to use built-in functions, because I think they are not precise to do this job.
ETA: It appears that there are some answers around but they seem to tackle different problems, as far, as I understood them.
Thanks a bunch in advance
What's the dot? Have you imported the method?
It should either be:
edge_v = tf.keras.backend.dot(edge_v, self.vars['a'])
or
edge_v = tf.tensordot(edge_v, self.vars['a'])

How to use an optimizer within a forward pass in PyTorch

I want to use an optimizer within the forward pass of a custom defined Function, but it doesn't work. My code is as follows:
class MyFct(Function):
#staticmethod
def forward(ctx, *args):
input, weight, bias = args[0], args[1], args[2]
y = torch.tensor([[0]], dtype=torch.float, requires_grad=True) #initial guess
loss_fn = lambda y_star: (input + weight - y_star)**2
learning_rate = 1e-4
optimizer = torch.optim.Adam([y], lr=learning_rate)
for t in range(5000):
y_star = y
print(y_star)
loss = loss_fn(y_star)
if t % 100 == 99:
print(t, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
return y_star
And that's my test inputs:
x = torch.tensor([[2]], dtype=torch.float, requires_grad=True)
w = torch.tensor([[2]], dtype=torch.float, requires_grad=True)
y = torch.tensor([[6]], dtype=torch.float)
fct= MyFct.apply
y_hat = fct(x, w, None)
I always get the RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.
Also, I've tested the optimization outside of the forward and it works, so I guess it's something with the context? According to the documentation "Tensor arguments that track history (i.e., with requires_grad=True) will be converted to ones that don’t track history before the call, and their use will be registered in the graph", see https://pytorch.org/docs/stable/notes/extending.html. Is this the problem? Is there a way to work around it?
I am new to PyTorch and I wonder what I'm overlooking. Any help and explanation is appreciated.
I think I found an answer here: https://github.com/pytorch/pytorch/issues/8847 , i.e. I need to wrap the oprimization with with torch.enable_grad():.
However, I still don't understand why it's necessary to convert the original Tensors to ones that don’t track history in forward().

Tensorflow: Random selection of masks

I know that this stackoverflow thread already gives some nice examples about conditionals in tensorflow, but I'm still struggling how to solve my issue of randomly selecting among several different masks in tensorflow.
Right now I can only select between two mask tensors a and b:
rand_num = tf.random_uniform([], minval=0, maxval=2.0, dtype=tf.float32, seed=None)
def if_true():
return b
def if_false():
return a
mask_sel = tf.cond(tf.less(rand_num , tf.constant(1.0)),if_true,if_false)
(I still find it weird that one needs to define these two helper functions, but not using them weirdly throws an error.)
Now the question: Lets say I have 4 mask tensors (a,b,c,d) or more to randomly select, what would be the best way to do that in tensorflow?
In python that would be
rand_num = np.random.uniform(low=0,high=4.0)
if (rand_num < 1.0):
mask_sel = a
elif(rand_num < 2.0):
mask_sel = b
elif(rand_num < 3.0):
mask_sel = c
else
mask_sel = d
About the helper functions, they are useful because they allow tensorflow to know which operations will run under each condition, this way it can optimize by running only the selected branch and ignoring the other. Operations outside the helper functions but used by any of them will always be run before tf.cond runs.
The other options is to use tf.select; you won't need the helper functions here but it will always evaluate both sides before running tf.select which can be inefficient if you don't need to.
Now for the main problem 'selecting from more than 2 tesnors', you can use multiple options:
1- Recursively nesting tf.cond operations:
def select_from_list(selector, tensor_list):
length = len(tensor_list)
if length == 0:
raise ValueError('List is empty')
elif length == 1:
return tensor_list[0]
else:
half = length // 2
return tf.cond(tf.less(selector, float(half)), lambda: select_from_list(selector, tensor_list[:half]), lambda: select_from_list(selector - half, tensor_list[half:]))
2- Using tf.case:
def select_from_list(selector, tensor_list):
length = len(tensor_list)
if length == 0:
raise ValueError('List is empty')
elif length == 1:
return tensor_list[0]
else:
def fn(tensor):
return lambda: tensor
pred_fn_pairs = [(tf.less(selector, float(i+1)), fn(tensor)) for i, tensor in enumerate(tensor_list)]
return tf.case(pred_fn_pairs, default=lambda:tensor_list[-1])
You can test any of them using:
def test(selector, value_list, sess):
return select_from_list(float(selector), [tf.constant(value) for value in value_list]).eval(session = sess)
sess = tf.Session()
test(3.5, [4,2,6,7,5], sess)
This should return 7

How can I feed last output y(t-1) as input for generating y(t) in tensorflow RNN?

I want to design a single layer RNN in Tensorflow such that last output (y(t-1)) is participated in updating the hidden state.
h(t) = tanh(W_{ih} * x(t) + W_{hh} * h(t) + **W_{oh}y(t - 1)**)
y(t) = W_{ho}*h(t)
How can I feed last input y(t - 1) as input for updating the hidden state?
Is y(t-1) the last input or output? In both cases it is not a straight fit with the TensorFlow RNN cell abstraction. If your RNN is simple you can just write the loop on your own, then you have full control. Another way that I would use is to pre-process your RNN input, e.g., do something like:
processed_input[t] = tf.concat(input[t], input[t-1])
Then call the RNN cell with processed_input and split there.
One possibility is to use tf.nn.raw_rnn which I found in this article. Check my answer to this related post.
I would call what you described an "autoregressive RNN". Here's an (incomplete) code snippet that shows how you can create one using tf.nn.raw_rnn:
import tensorflow as tf
LSTM_SIZE = 128
BATCH_SIZE = 64
HORIZON = 10
lstm_cell = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE, use_peepholes=True)
class RnnLoop:
def __init__(self, initial_state, cell):
self.initial_state = initial_state
self.cell = cell
def __call__(self, time, cell_output, cell_state, loop_state):
emit_output = cell_output # == None for time == 0
if cell_output is None: # time == 0
initial_input = tf.fill([BATCH_SIZE, LSTM_SIZE], 0.0)
next_input = initial_input
next_cell_state = self.initial_state
else:
next_input = cell_output
next_cell_state = cell_state
elements_finished = (time >= HORIZON)
next_loop_state = None
return elements_finished, next_input, next_cell_state, emit_output, next_loop_state
rnn_loop = RnnLoop(initial_state=initial_state_tensor, cell=lstm_cell)
rnn_outputs_tensor_array, _, _ = tf.nn.raw_rnn(lstm_cell, rnn_loop)
rnn_outputs_tensor = rnn_outputs_tensor_array.stack()
Here we initialize internal state of LSTM with some vector initial_state_tensor, and feed zero array as input at t=0. After that, the output of the current timestep is the input for the next timestep.