How to implement a sliding window in tensorflow? - tensorflow

I have created a sliding window algorithm using numpy that slides over a wav audio file and feeds slices of it to my NN in tensorflow, which detects features in the audio slices. Once tensorflow does its thing, it returns its output to numpy land, where I reassemble the slices into an array of predictions that match each sample position of the original file:
import tensorflow as tf
import numpy as np
import nn
def slide_predict(layers, X, modelPath):
output = None
graph = tf.Graph()
with graph.as_default():
input_layer_size, hidden_layer_size, num_labels = layers
X_placeholder = tf.placeholder(tf.float32, shape=(None, input_layer_size), name='X')
Theta1 = tf.Variable(nn.randInitializeWeights(input_layer_size, hidden_layer_size), name='Theta1')
bias1 = tf.Variable(nn.randInitializeWeights(hidden_layer_size, 1), name='bias1')
Theta2 = tf.Variable(nn.randInitializeWeights(hidden_layer_size, num_labels), name='Theta2')
bias2 = tf.Variable(nn.randInitializeWeights(num_labels, 1), name='bias2')
hypothesis = nn.forward_prop(X_placeholder, Theta1, bias1, Theta2, bias2)
sess = tf.Session(graph=graph)
saver = tf.train.Saver()
init = tf.global_variables_initializer()
sess.run(init)
saver.restore(sess, modelPath)
window_size = layers[0]
pad_amount = (window_size * 2) - (X.shape[0] % window_size)
X = np.pad(X, (pad_amount, 0), 'constant')
for w in range(window_size):
start = w
end = -window_size + w
X_shifted = X[start:end]
X_matrix = X_shifted.reshape((-1, window_size))
prediction = sess.run(hypothesis, feed_dict={X_placeholder: X_matrix})
output = prediction if (output is None) else np.hstack((output, prediction))
sess.close()
output.shape = (X.size, -1)
return output
Unfortunately, this algorithm is quite slow. I placed some logs along the way and by far the slowest portion is the part where I actually run my tensorflow graph. This could be due to the actual tensorflow calculations being slow (if so, I'm probably just SOL), but I am wondering if a large part of the slowness isn't because I am transferring large audio files repeatedly back and forth in and out of tensorflow. So my questions are:
1) Is feeding a placeholder repeatedly like this going to be noticeably slower than feeding it once and calculating the values for X inside tensorflow?
2) If yes, whats the best way to implement a sliding window algorithm inside tensorflow to do this calculation?

The first issue is that your algorithm is has quadratic time complexity in window_size, because of calling np.hstack() in each iteration to build the output array, which copies both the current values of output and prediction into a new array:
for w in range(window_size):
# ...
output = prediction if (output is None) else np.hstack((output, prediction))
Instead of calling np.hstack() in every iteration, it would be more efficient to build a Python list of the prediction arrays, and call np.hstack() on them once, after the loop terminates:
output_list = []
for w in range(window_size):
# ...
prediction = sess.run(...)
output_list.append(prediction)
output = np.hstack(output_list)
The second issue is that feeding large values to TensorFlow can be inefficient, if the amount of computation in the sess.run() call is small, because those values are (currently) copied into C++ (and the results are copied out. One useful strategy for this is to try and move the sliding window loop into the TensorFlow graph, using the tf.map_fn() construct. For example, you could restructure your program as follows:
# NOTE: If you call this function often, you may want to (i) move the `np.pad()`
# into the graph as `tf.pad()`, and (ii) replace `X_t` with a placeholder.
X = np.pad(X, (pad_amount, 0), 'constant')
X_t = tf.convert_to_tensor(X)
def window_func(w):
start = w
end = w - window_size
X_matrix = tf.reshape(X_t[start:end], (-1, window_size))
return nn.forward_prop(X_matrix, Theta1, bias1, Theta2, bias2)
output_t = tf.map_fn(window_func, tf.range(window_size))
# ...
output = sess.run(output_t)

Related

keras - `sample_weight` results in NaN when zero passed - also not efficient for unbalanced data

I am designing a model with two outputs, y and dy, where I have much more training data for y than dy while the location (x) of those data points are the same (please check the image bellow).
I am handling this issue with sample_weight in keras.model.fit. There are two concerns:
If I pass 'zero' for a sample weight, after the first training, it results into NaN. I instead have to pass a very small number, which I am not sure how it affects the training.
This is inefficient if I have multiple outputs with many of them have available training data at very few locations. Because, all the training data will be included in the updates. Is there any other way to handle this case?
Note that Keras works fine training the model, however, I am looking for more efficient way to also be able to pass zero for unwanted weights.
Please check the code bellow:
import numpy as np
import keras as k
import tensorflow as tf
from matplotlib.pyplot import plot, show, legend
# Note this is needed to handle lambda layers as Keras' gradient does not work in this setup.
def custom_grad(y, x):
return tf.gradients(y, x, unconnected_gradients='zero', colocate_gradients_with_ops=True)
# Setting up keras model.
x = k.Input((1,), name='x', dtype='float32')
lay = k.layers.Dense(10, activation='tanh')(x)
lay = k.layers.Dense(10, activation='tanh')(lay)
y = k.layers.Dense(1, name='y')(lay)
dy = k.layers.Lambda(lambda f: custom_grad(f, x), name='dy')(y)
model = k.Model(x, [y, dy])
# Preparing training data.
num_samples = 10000
x_true = np.linspace(0.0, np.pi, num_samples)
y_true = np.sin(x_true)
dy_true = np.zeros_like(y_true)
# for dy, we only have values at certain points -
# say 10% of what is available for yfrom initial and the end.
percentage = 0.1
dy_ids = np.concatenate((np.arange(0, num_samples*percentage, dtype=int),
np.arange(num_samples*(1-percentage), 10000, dtype=int)))
dy_true[dy_ids] = np.cos(x_true[dy_ids])
# I use sample weight to circumvent unbalanced available data.
y_sample_weight = np.ones_like(y_true)
dy_sample_weight = np.zeros_like(y_true) + 1.0e-8
dy_sample_weight[dy_ids] = num_samples/dy_ids.size
assert abs(dy_sample_weight.sum() - num_samples) <= 1.0e-3
# training the model.
model.compile("adam", loss="mse")
model.fit(x_true, [y_true, dy_true],
sample_weight=[y_sample_weight, dy_sample_weight],
epochs=50, shuffle=True)
[y_pred, dy_pred] = model.predict(x_true)
# expected outputs.
plot(x_true, y_true, '.k', label='y true')
plot(x_true[dy_ids], dy_true[dy_ids], '.r', label='dy true')
plot(x_true, y_pred, '--b', label='y pred')
plot(x_true, dy_pred, '--b', label='dy pred')
legend()
show()

When is a random number generated in a Keras Lambda layer?

I would like to apply simple data augmentation (multiplication of the input vector by a random scalar) to a fully connected neural network implemented in Keras. Keras has nice functionality for image augmentation, but trying to use this seemed awkward and slow for my input (1-tensors), whose training data set fits in my computer's memory.
Instead, I imagined that I could achieve this using a Lambda layer, e.g. something like this:
x = Input(shape=(10,))
y = x
y = Lambda(lambda z: random.uniform(0.5,1.0)*z)(y)
y = Dense(units=5, activation='relu')(y)
y = Dense(units=1, activation='sigmoid')(y)
model = Model(x, y)
My question concerns when this random number will be generated. Will this fix a single random number for:
the entire training process?
each batch?
each training data point?
Using this will create a constant that will not change at all, because random.uniform is not a keras function. You defined this operation in the graph as constant * tensor and the factor will be constant.
You need random functions "from keras" or "from tensorflow". For instance, you can take K.random_uniform((1,), 0.5, 1.).
This will be changed per batch. You can test it by training this code for a lot of epochs and see the loss changing.
from keras.layers import *
from keras.models import Model
from keras.callbacks import LambdaCallback
import numpy as np
ins = Input((1,))
outs = Lambda(lambda x: K.random_uniform((1,))*x)(ins)
model = Model(ins,outs)
print(model.predict(np.ones((1,1))))
print(model.predict(np.ones((1,1))))
print(model.predict(np.ones((1,1))))
model.compile('adam','mae')
model.fit(np.ones((100000,1)), np.ones((100000,1)))
If you want it to change for each training sample, then get a fixed batch size and generate a tensor with random numbers for each sample: K.random_uniform((batch_size,), .5, 1.).
You should probably get better performance if you do it in your own generator and model.fit_generator(), though:
class MyGenerator(keras.utils.Sequence):
def __init__(self, inputs, outputs, batchSize, minRand, maxRand):
self.inputs = inputs
self.outputs = outputs
self.batchSize = batchSize
self.minRand = minRand
self.maxRand = maxRand
#if you want shuffling
def on_epoch_end(self):
indices = np.array(range(len(self.inputs)))
np.random.shuffle(indices)
self.inputs = self.inputs[indices]
self.outputs = self.outputs[indices]
def __len__(self):
leng,rem = divmod(len(self.inputs), self.batchSize)
return (leng + (1 if rem > 0 else 0))
def __getitem__(self,i):
start = i*self.batchSize
end = start + self.batchSize
x = self.inputs[start:end] * random.uniform(self.minRand,self.maxRand)
y = self.outputs[start:end]
return x,y

Why batch_normalization in tensorflow does not give expected results?

I would like to see the output of batch_normalization layer in a small example, but apparently I am doing something wrong so I get the same output as the input.
import tensorflow as tf
import keras.backend as K
K.set_image_data_format('channels_last')
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3)
x = np.random.rand(4, 2, 2, 3) # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
K.set_session(sess)
a = sess.run(outp, feed_dict={X:x, K.learning_phase(): 0})
print(a-x) # print the difference between input and normalized output
The input and output of the above code are almost identical. Can anyone point out the problem to me?
Remember that batch_normalization behaves differently at train and test time. Here, you have never "trained" your batch normalization, so the moving average it has learned is random but close to 0, and the moving variance factor close to 1, so the output is almost the same as the input. If you use K.learning_phase(): 1 you'll already see some differences (because it will normalize using the batch's average and standard deviation); if you first learn on a lot of examples and then test on some other ones you'll also see the normalization occuring, because the learnt mean and standard deviation will not be 0 and 1.
To see better the effects of batch norm, I'd also suggest you to multiply your input by a big number (say 100), so that you have a clear difference between unnormalized and normalized vectors, that will help you test what's going on.
EDIT: In your code as is, it seems that the update of the moving mean and moving variance is never done. You need to make sure the update ops are run, as indicated in batch_normalization's doc. The following lines should make it work:
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
Below is my full working code (I got rid of Keras because I don't know it well, but you should be able to re-add it).
import tensorflow as tf
import numpy as np
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
is_training = tf.placeholder(tf.bool, shape=()) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
x = np.random.rand(4, 2, 2, 3) * 100 # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
initial = sess.run(outp, feed_dict={X:x, is_training: False})
for i in range(10000):
a = sess.run(outp, feed_dict={X:x, is_training: True})
if (i % 1000 == 0):
print("Step %i: " %i, a-x) # print the difference between input and normalized output
final = sess.run(outp, feed_dict={X: x, is_training: False})
print("initial: ", initial)
print("final: ", final)
assert not np.array_equal(initial, final)

How to find intermediate outputs of LSTM by running tf.nn.dynamic_rnn in tensorflow

I am new to tensorflow and have recently read about LSTM from various blogs like Understanding LSTM Networks, Colah, The Unreasonable Effectiveness of Recurrent Neural Networks, Karparthy etc.
I found this Code on the web:
import numpy as np
import tensorflow as tf
def length(sequence):
used = tf.sign(tf.reduce_max(tf.abs(sequence), reduction_indices=2))
length = tf.reduce_sum(used, reduction_indices=1)
length = tf.cast(length, tf.int32)
return length
num_neurons = 10
num_layers = 3
max_length = 8
frame_size = 5
# dropout = tf.placeholder(tf.float32)
cell = tf.contrib.rnn.LSTMCell(num_neurons, state_is_tuple= True)
# cell = DropoutWrapper(cell, output_keep_prob=dropout)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers)
sequence = tf.placeholder(tf.float32, [None, max_length, frame_size])
output, state = tf.nn.dynamic_rnn(
cell,
sequence,
dtype=tf.float32,
sequence_length=length(sequence),
)
if __name__ == '__main__':
sample = np.random.random((8, max_length, frame_size)) + 0.1
# sample[np.ix_([0,1],range(50,max_length))] = 0
# drop = 0.2
with tf.Session() as sess:
init_op = init_op = tf.global_variables_initializer()
sess.run(init_op)
o, s = sess.run([output, state], feed_dict={sequence: sample})
# print "Output shape is ", o.shape()
# print "state shape is ", s.shape()
print "Output is ", o
print "State is ", s
Pertaining to the above code with state_is_tuple= True, I have some doubts.
Q. What is the simple meaning of outputs and state which tf.nn.dynamic_rnn returns.
I read on the internet that output is the output of last layer at several time steps and
state is the final state.
My intermediate doubt is, what do we mean by "output of last layer at several time steps"
I looked into dynamic_rnn code as my main task is to find
(https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/python/ops/rnn.py)
Q. ***All the intermediate output of LSTM by calling dynamic_rnn in the same fashion as the above code. How can I do it.
I also read dynamic_rnn internally calls _dynamic_rnn.
This _dynamic_rnn returns final_output and final_state. Apart from final_output. I want all the intermediate outputs.
My take is to write custom _dynamic_rnn as defined in
https://github.com/tensorflow/tensorflow/blob/r1.1/tensorflow/python/ops/rnn.py
Please help.

SGD converges but batch learning does not, simple regression in tensorflow

I have run into an issue where batch learning in tensorflow fails to converge to the correct solution for a simple convex optimization problem, whereas SGD converges. A small example is found below, in the Julia and python programming languages, I have verified that the same exact behaviour results from using tensorflow from both Julia and python.
I'm trying to fit the linear model y = s*W + B with parameters W and B
The cost function is quadratic, so the problem is convex and should be easily solved using a small enough step size. If I feed all data at once, the end result is just a prediction of the mean of y. If, however, I feed one datapoint at the time (commented code in julia version), the optimization converges to the correct parameters very fast.
I have also verified that the gradients computed by tensorflow differs between the batch example and summing up the gradients for each datapoint individually.
Any ideas on where I have failed?
using TensorFlow
s = linspace(1,10,10)
s = [s reverse(s)]
y = s*[1,4] + 2
session = Session(Graph())
s_ = placeholder(Float32, shape=[-1,2])
y_ = placeholder(Float32, shape=[-1,1])
W = Variable(0.01randn(Float32, 2,1), name="weights1")
B = Variable(Float32(1), name="bias3")
q = s_*W + B
loss = reduce_mean((y_ - q).^2)
train_step = train.minimize(train.AdamOptimizer(0.01), loss)
function train_critic(s,targets)
for i = 1:1000
# for i = 1:length(y)
# run(session, train_step, Dict(s_ => s[i,:]', y_ => targets[i]))
# end
ts = run(session, [loss,train_step], Dict(s_ => s, y_ => targets))[1]
println(ts)
end
v = run(session, q, Dict(s_ => s, y_ => targets))
plot(s[:,1],v, lab="v (Predicted value)")
plot!(s[:,1],y, lab="y (Correct value)")
gui();
end
run(session, initialize_all_variables())
train_critic(s,y)
Same code in python (I'm not a python user so this might be ugly)
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import sklearn.datasets
import tensorflow as tf
from tensorflow.python.framework.ops import reset_default_graph
s = np.linspace(1,10,50).reshape((50,1))
s = np.concatenate((s,s[::-1]),axis=1).astype('float32')
y = np.add(np.matmul(s,[1,4]), 2).astype('float32')
reset_default_graph()
rng = np.random
s_ = tf.placeholder(tf.float32, [None, 2])
y_ = tf.placeholder(tf.float32, [None])
weight_initializer = tf.truncated_normal_initializer(stddev=0.1)
with tf.variable_scope('model'):
W = tf.get_variable('W', [2, 1],
initializer=weight_initializer)
B = tf.get_variable('B', [1],
initializer=tf.constant_initializer(0.0))
q = tf.matmul(s_, W) + B
loss = tf.reduce_mean(tf.square(tf.sub(y_ , q)))
optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss)
num_epochs = 200
train_cost= []
with tf.Session() as sess:
init = tf.initialize_all_variables()
sess.run(init)
for e in range(num_epochs):
feed_dict_train = {s_: s, y_: y}
fetches_train = [train_op, loss]
res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)
train_cost = [res[1]]
print train_cost
The answer turned out to be that when I fed in the targets, I fed a vector and not an Nx1 matrix. The operation y_-q then turned into a broadcast operation and instead of returning the elementwise difference, it returned an NxN matrix with the desired difference along the diagonal. In Julia, I solved this by modifying the line
train_critic(s,y)
to
train_critic(s,reshape(y, length(y),1))
to ensure y being a matrix.
A subtle error that took me a very long time to find! Part of the confusion was that TensorFlow seems to treat vectors as row vectors and not as column vectors like Julia, hence the broadcast operation in y_-q