tensoflow sparse tensor efficient batching - tensorflow

I have a method (shown below) that gets a batch from a tensorflow SparseTensorValue. However, this method is rather slow (10-20 seconds for a batch of size 32), which is problematic because it's called thousands of times.
def get_batch(index, tensors, batch_size, nItems):
xs, ys = tensors
begin = (index * batch_size)
end = min((index+1)*batch_size, nItems)
y_b = ys[begin:end]
(inds, vals, dsize) = xs
nInds = [[ind[0] - begin, ind[1]] for ind in inds if begin <= ind[0] < end]
nInds = np.array(nInds)
nVals = vals[:nInds.shape[0]]
nDsize = (end - begin, dsize[1])
x_b = tf.SparseTensorValue(nInds, nVals, nDsize)
return (x_b, y_b)
Is there a way to make this method more efficient?

I'd recommend you to write your input pipeline using tf.data instead, then if anything you can offload this rebatching to another core and not block your main thread.

Related

Tensorflow: OOM when batch size too large

My script is failing due to too high memory usage. When I reduce the batch size it works.
#tf.function(autograph=not DEBUG)
def step(prev_state, input_b):
input_b = tf.reshape(input_b, shape=[1,input_b.shape[0]])
state = FastALIFStateTuple(v=prev_state[0], z=prev_state[1], b=prev_state[2], r=prev_state[3])
new_b = self.decay_b * state.b + (tf.ones(shape=[self.units],dtype=tf.float32) - self.decay_b) * state.z
thr = self.thr + new_b * self.beta
z = state.z
i_in = tf.matmul(input_b, W_in)
i_rec = tf.matmul(z, W_rec)
i_t = i_in + i_rec
I_reset = z * thr * self.dt
new_v = self._decay * state.v + (1 - self._decay) * i_t - I_reset
# Spike generation
is_refractory = tf.greater(state.r, .1)
zeros_like_spikes = tf.zeros_like(z)
new_z = tf.where(is_refractory, zeros_like_spikes, self.compute_z(new_v, thr))
new_r = tf.clip_by_value(state.r + self.n_refractory * new_z - 1,
0., float(self.n_refractory))
return [new_v, new_z, new_b, new_r]
#tf.function(autograph=not DEBUG)
def evolve_single(inputs):
accumulated_state = tf.scan(step, inputs, initializer=state0)
Z = tf.squeeze(accumulated_state[1]) # -> [T,units]
if self.model_settings['avg_spikes']:
Z = tf.reshape(tf.reduce_mean(Z, axis=0), shape=(1,-1))
out = tf.matmul(Z, W_out) + b_out
return out # - [BS,Num_labels]
# # - Using a simple loop
# out_store = []
# for i in range(fingerprint_3d.shape[0]):
# out_store.append(tf.squeeze(evolve_single(fingerprint_3d[i,:,:])))
# return tf.reshape(out_store, shape=[fingerprint_3d.shape[0],self.d_out])
final_out = tf.squeeze(tf.map_fn(evolve_single, fingerprint_3d)) # -> [BS,T,self.units]
return final_out
This code snippet is inside a tf.function, but I omitted it since I don't think it's relevant.
As can be seen, I run the code on fingerprint_3d, a tensor that has the dimension [BatchSize,Time,InputDimension], e.g. [50,100,20]. When I run this with BatchSize < 10 everything works fine, although tf.scan already uses a lot of memory for that.
When I now execute the code on a batch of size 50, suddenly I get an OOM, even though I am executing it in an iterative matter (here commented out).
How should I execute this code so that the Batch Size doesn't matter?
Is tensorflow maybe parallelizing my for loop so that it executed over multiple batches at once?
Another unrelated question is the following: What function instead of tf.scan should I use if I only want to accumulate one state variable, compared to the case for tf.scan where it just accumulates all the state variables? Or is that possible with tf.scan?
As mentioned in the discussions here, tf.foldl, tf.foldr, and tf.scan all require keeping track of all values for all iterations, which is necessary for computations like gradients. I am not aware of any ways to mitigate this issue; still, I would also be interested if anyone has a better answer than mine.
When I used
#tf.function
def get_loss_and_gradients():
with tf.GradientTape(persistent=False) as tape:
logits, spikes = rnn.call(fingerprint_input=graz_dict["train_input"], W_in=W_in, W_rec=W_rec, W_out=W_out, b_out=b_out)
loss = loss_normal(tf.cast(graz_dict["train_groundtruth"],dtype=tf.int32), logits)
gradients = tape.gradient(loss, [W_in,W_rec,W_out,b_out])
return loss, logits, spikes, gradients
it works.
When I remove the #tf.function decorator the memory blows up. So it really seems important that tensorflow can create a graph for you computations.

Tensorflow: Random selection of masks

I know that this stackoverflow thread already gives some nice examples about conditionals in tensorflow, but I'm still struggling how to solve my issue of randomly selecting among several different masks in tensorflow.
Right now I can only select between two mask tensors a and b:
rand_num = tf.random_uniform([], minval=0, maxval=2.0, dtype=tf.float32, seed=None)
def if_true():
return b
def if_false():
return a
mask_sel = tf.cond(tf.less(rand_num , tf.constant(1.0)),if_true,if_false)
(I still find it weird that one needs to define these two helper functions, but not using them weirdly throws an error.)
Now the question: Lets say I have 4 mask tensors (a,b,c,d) or more to randomly select, what would be the best way to do that in tensorflow?
In python that would be
rand_num = np.random.uniform(low=0,high=4.0)
if (rand_num < 1.0):
mask_sel = a
elif(rand_num < 2.0):
mask_sel = b
elif(rand_num < 3.0):
mask_sel = c
else
mask_sel = d
About the helper functions, they are useful because they allow tensorflow to know which operations will run under each condition, this way it can optimize by running only the selected branch and ignoring the other. Operations outside the helper functions but used by any of them will always be run before tf.cond runs.
The other options is to use tf.select; you won't need the helper functions here but it will always evaluate both sides before running tf.select which can be inefficient if you don't need to.
Now for the main problem 'selecting from more than 2 tesnors', you can use multiple options:
1- Recursively nesting tf.cond operations:
def select_from_list(selector, tensor_list):
length = len(tensor_list)
if length == 0:
raise ValueError('List is empty')
elif length == 1:
return tensor_list[0]
else:
half = length // 2
return tf.cond(tf.less(selector, float(half)), lambda: select_from_list(selector, tensor_list[:half]), lambda: select_from_list(selector - half, tensor_list[half:]))
2- Using tf.case:
def select_from_list(selector, tensor_list):
length = len(tensor_list)
if length == 0:
raise ValueError('List is empty')
elif length == 1:
return tensor_list[0]
else:
def fn(tensor):
return lambda: tensor
pred_fn_pairs = [(tf.less(selector, float(i+1)), fn(tensor)) for i, tensor in enumerate(tensor_list)]
return tf.case(pred_fn_pairs, default=lambda:tensor_list[-1])
You can test any of them using:
def test(selector, value_list, sess):
return select_from_list(float(selector), [tf.constant(value) for value in value_list]).eval(session = sess)
sess = tf.Session()
test(3.5, [4,2,6,7,5], sess)
This should return 7

How to use maxout activation function in tensorflow?

I want to use maxout activation function in tensorflow, but I don't know which function should use.
I sent a pull request for maxout, here is the link:
https://github.com/tensorflow/tensorflow/pull/5528
Code is as follows:
def maxout(inputs, num_units, axis=None):
shape = inputs.get_shape().as_list()
if axis is None:
# Assume that channel is the last dimension
axis = -1
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not a multiple of num_units({})'
.format(num_channels, num_units))
shape[axis] = -1
shape += [num_channels // num_units]
outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False)
return outputs
Here is how it works:
I don't think there is a maxout activation but there is nothing stopping yourself from making it yourself. You could do something like the following.
with tf.variable_scope('maxout'):
layer_input = ...
layer_output = None
for i in range(n_maxouts):
W = tf.get_variable('W_%d' % d, (n_input, n_output))
b = tf.get_variable('b_%d' % i, (n_output,))
y = tf.matmul(layer_input, W) + b
if layer_output is None:
layer_output = y
else:
layer_output = tf.maximum(layer_output, y)
Note that this is code I just wrote in my browser so there may be syntax errors but you should get the general idea. You simply perform a number of linear transforms and take the maximum across all the transforms.
How about this code?
This seems to work in my test.
def max_out(input_tensor,output_size):
shape = input_tensor.get_shape().as_list()
if shape[1] % output_size == 0:
return tf.transpose(tf.reduce_max(tf.split(input_tensor,output_size,1),axis=2))
else:
raise ValueError("Output size or input tensor size is not fine. Please check it. Reminder need be zero.")
I refer the diagram in the following page.
From version 1.4 on you can use tf.contrib.layers.maxout.
Maxout is a layer such that it calculates N*M output for a N*1 input, and then it returns the maximum value across the column, i.e., the final output has shape N*1 as well. Basically it uses multiple linear fittings to mimic a complex function.

Using TensorArrays in the context of a while_loop to accumulate values

Below I have an implementation of a Tensorflow RNN Cell, designed to emulate Alex Graves' algorithm ACT in this paper: http://arxiv.org/abs/1603.08983.
At a single timestep in the sequence called via rnn.rnn(with a static sequence_length parameter, so the rnn is unrolled dynamically - I am using a fixed batch size of 20), we recursively call ACTStep, producing outputs of size(1,200) where the hidden dimension of the RNN cell is 200 and we have a batch size of 1.
Using the while loop in Tensorflow, we iterate until the accumulated halting probability is high enough. All of this works reasonably smoothly, but I am having problems accumulating states, probabilities and outputs within the while loop, which we need to do in order to create weighted combinations of these as the final cell output/state.
I have tried using a simple list, as below, but this fails when the graph is compiled as the outputs are not in the same frame(is it possible to use the "switch" function in control_flow_ops to forward the tensors to the point at which they are required, ie the add_n function just before we return the values?). I have also tried using the TensorArray structure, but I am finding this difficult to use as it seems to destroy shape information and replacing it manually hasn't worked. I also haven't been able to find much documentation on TensorArrays, presumably as they are, I imagine, mainly for internal TF use.
Any advice on how it might be possible to correctly accumulate the variables produced by ACTStep would be much appreciated.
class ACTCell(RNNCell):
"""An RNN cell implementing Graves' Adaptive Computation time algorithm"""
def __init__(self, num_units, cell, epsilon, max_computation):
self.one_minus_eps = tf.constant(1.0 - epsilon)
self._num_units = num_units
self.cell = cell
self.N = tf.constant(max_computation)
#property
def input_size(self):
return self._num_units
#property
def output_size(self):
return self._num_units
#property
def state_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
with vs.variable_scope(scope or type(self).__name__):
# define within cell constants/ counters used to control while loop
prob = tf.get_variable("prob", [], tf.float32,tf.constant_initializer(0.0))
counter = tf.get_variable("counter", [],tf.float32,tf.constant_initializer(0.0))
tf.assign(prob,0.0)
tf.assign(counter, 0.0)
# the predicate for stopping the while loop. Tensorflow demands that we have
# all of the variables used in the while loop in the predicate.
pred = lambda prob,counter,state,input,\
acc_state,acc_output,acc_probs:\
tf.logical_and(tf.less(prob,self.one_minus_eps), tf.less(counter,self.N))
acc_probs = []
acc_outputs = []
acc_states = []
_,iterations,_,_,acc_states,acc_output,acc_probs = \
control_flow_ops.while_loop(pred,
self.ACTStep,
[prob,counter,state,input,acc_states,acc_outputs,acc_probs])
# TODO:fix last part of this, need to use the remainder.
# TODO: find a way to accumulate the regulariser
# here we take a weighted combination of the states and outputs
# to use as the actual output and state which is passed to the next timestep.
next_state = tf.add_n([tf.mul(x,y) for x,y in zip(acc_probs,acc_states)])
output = tf.add_n([tf.mul(x,y) for x,y in zip(acc_probs,acc_outputs)])
return output, next_state
def ACTStep(self,prob,counter,state,input, acc_states,acc_outputs,acc_probs):
output, new_state = rnn.rnn(self.cell, [input], state, scope=type(self.cell).__name__)
prob_w = tf.get_variable("prob_w", [self.cell.input_size,1])
prob_b = tf.get_variable("prob_b", [1])
p = tf.nn.sigmoid(tf.matmul(prob_w,new_state) + prob_b)
acc_states.append(new_state)
acc_outputs.append(output)
acc_probs.append(p)
return [tf.add(prob,p),tf.add(counter,1.0),new_state, input,acc_states,acc_outputs,acc_probs]
I'm going to preface this response that this is NOT a complete solution, but rather some commentary on how to improve your cell.
To start off, in your ACTStep function, you call rnn.rnn for one timestep (as defined by [input]. If you're doing a single timestep, it is probably more efficient to simple use the actual self.cell call function. You'll see this same mechanism used in tensorflow rnncell wrappers
You mentioned that you have tried using TensorArrays. Did you pack and unpack the tensorarrays appropriately? Here is a repo where you'll find under model.py the tensorarrays are packed and unpacked properly.
You also asked if there is a function in control_flow_ops that will require all the tensors to be accumulated. I think you are looking for tf.control_dependencies
You can list all of your output tensors operations in control_dependicies and that will require tensorflow to compute all tensors up into that point.
Also, it looks like your counter variable is trainable. Are you sure you want this to be the case? If you're adding plus one to your counter, that probably wouldn't yield the correct result. On the other hand, you could have purposely kept it trainable to differentiate it at the end for the ponder cost function.
Also I believe the Remainder function should be in your script:
remainder = 1.0 - tf.add_n(acc_probs[:-1])
#note that there is a -1 in the list as you do not want to grab the last probability
Here is my version of your code edited:
class ACTCell(RNNCell):
"""An RNN cell implementing Graves' Adaptive Computation time algorithm
Notes: https://www.evernote.com/shard/s189/sh/fd165646-b630-48b7-844c-86ad2f07fcda/c9ab960af967ef847097f21d94b0bff7
"""
def __init__(self, num_units, cell, max_computation = 5.0, epsilon = 0.01):
self.one_minus_eps = tf.constant(1.0 - epsilon) #episolon is 0.01 as found in the paper
self._num_units = num_units
self.cell = cell
self.N = tf.constant(max_computation)
#property
def input_size(self):
return self._num_units
#property
def output_size(self):
return self._num_units
#property
def state_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
with vs.variable_scope(scope or type(self).__name__):
# define within cell constants/ counters used to control while loop
prob = tf.constant(0.0, shape = [batch_size])
counter = tf.constant(0.0, shape = [batch_size])
# the predicate for stopping the while loop. Tensorflow demands that we have
# all of the variables used in the while loop in the predicate.
pred = lambda prob,counter,state,input,acc_states,acc_output,acc_probs:\
tf.logical_and(tf.less(prob,self.one_minus_eps), tf.less(counter,self.N))
acc_probs, acc_outputs, acc_states = [], [], []
_,iterations,_,_,acc_states,acc_output,acc_probs = \
control_flow_ops.while_loop(
pred,
self.ACTStep, #looks like he purposely makes the while loop here
[prob, counter, state, input, acc_states, acc_outputs, acc_probs])
'''mean-field updates for states and outputs'''
next_state = tf.add_n([x*y for x,y in zip(acc_probs,acc_states)])
output = tf.add_n([x*y for x,y in zip(acc_probs,acc_outputs)])
remainder = 1.0 - tf.add_n(acc_probs[:-1]) #you take the last off to avoid a negative ponder cost #the problem here is we need to take the sum of all the remainders
tf.add_to_collection("ACT_remainder", remainder) #if this doesnt work then you can do self.list based upon timesteps
tf.add_to_collection("ACT_iterations", iterations)
return output, next_state
def ACTStep(self,prob, counter, state, input, acc_states, acc_outputs, acc_probs):
'''run rnn once'''
output, new_state = rnn.rnn(self.cell, [input], state, scope=type(self.cell).__name__)
prob_w = tf.get_variable("prob_w", [self.cell.input_size,1])
prob_b = tf.get_variable("prob_b", [1])
halting_probability = tf.nn.sigmoid(tf.matmul(prob_w,new_state) + prob_b)
acc_states.append(new_state)
acc_outputs.append(output)
acc_probs.append(halting_probability)
return [p + prob, counter + 1.0, new_state, input,acc_states,acc_outputs,acc_probs]
def PonderCostFunction(self, time_penalty = 0.01):
'''
note: ponder is completely different than probability and ponder = roe
the ponder cost function prohibits the rnn from cycling endlessly on each timestep when not much is needed
'''
n_iterations = tf.get_collection_ref("ACT_iterations")
remainder = tf.get_collection_ref("ACT_remainder")
return tf.reduce_sum(n_iterations + remainder) #completely different from probability
This is a complicated paper to implement that I have been working on myself. I wouldn't mind collaborating with you to get it done in Tensorflow. If you're interested, please add me at LeavesBreathe on Skype and we can go from there.

How can I feed last output y(t-1) as input for generating y(t) in tensorflow RNN?

I want to design a single layer RNN in Tensorflow such that last output (y(t-1)) is participated in updating the hidden state.
h(t) = tanh(W_{ih} * x(t) + W_{hh} * h(t) + **W_{oh}y(t - 1)**)
y(t) = W_{ho}*h(t)
How can I feed last input y(t - 1) as input for updating the hidden state?
Is y(t-1) the last input or output? In both cases it is not a straight fit with the TensorFlow RNN cell abstraction. If your RNN is simple you can just write the loop on your own, then you have full control. Another way that I would use is to pre-process your RNN input, e.g., do something like:
processed_input[t] = tf.concat(input[t], input[t-1])
Then call the RNN cell with processed_input and split there.
One possibility is to use tf.nn.raw_rnn which I found in this article. Check my answer to this related post.
I would call what you described an "autoregressive RNN". Here's an (incomplete) code snippet that shows how you can create one using tf.nn.raw_rnn:
import tensorflow as tf
LSTM_SIZE = 128
BATCH_SIZE = 64
HORIZON = 10
lstm_cell = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE, use_peepholes=True)
class RnnLoop:
def __init__(self, initial_state, cell):
self.initial_state = initial_state
self.cell = cell
def __call__(self, time, cell_output, cell_state, loop_state):
emit_output = cell_output # == None for time == 0
if cell_output is None: # time == 0
initial_input = tf.fill([BATCH_SIZE, LSTM_SIZE], 0.0)
next_input = initial_input
next_cell_state = self.initial_state
else:
next_input = cell_output
next_cell_state = cell_state
elements_finished = (time >= HORIZON)
next_loop_state = None
return elements_finished, next_input, next_cell_state, emit_output, next_loop_state
rnn_loop = RnnLoop(initial_state=initial_state_tensor, cell=lstm_cell)
rnn_outputs_tensor_array, _, _ = tf.nn.raw_rnn(lstm_cell, rnn_loop)
rnn_outputs_tensor = rnn_outputs_tensor_array.stack()
Here we initialize internal state of LSTM with some vector initial_state_tensor, and feed zero array as input at t=0. After that, the output of the current timestep is the input for the next timestep.