Hey there I want to use a CNN + RNN in order to do a regression task on images and I am not sure how to properly handle sequence length and states.
I thought about doing the following: Use the CNN to extract features for one frame. Put the flattened activation maps into the LSTM and save the state. Reduce the LSTM output to my regression value. For the next frame I would do the same with restoring the state of the LSTM with the previous iteration.
But that feels completly wrong since I am building a RNN around my LSTM cell which is not how its supposed to be right?
But if I input a sequence of frames into the LSTM (after CNN application to all of them) I get multiple outputs and a state. If I reuse that state I dont see the point for the frame sequence at all. I am totally confused.
Currently I am doing this but that is not working better than just a CNN for applied on every frame...
with tf.variable_scope('CNN'):
for time_step in xrange(sequence_length):
if time_step > 0: tf.get_variable_scope().reuse_variables()
cnn_res = CNN(images[time_step], normalizer_params=normalizer_params, regularizer=regularizer)
cnn_outputs.append(cnn_res)
cnn_outputs = tf.pack(cnn_outputs)
with tf.variable_scope('RNN'):
lstm_cell = LSTMBlockCell(128)
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] *3)
(rnn_outputs, state) = tf.nn.dynamic_rnn(cell, cnn_outputs,initial_state=initial_state, time_major=True,dtype=tf.float32)
rnn_outputs = rnn_outputs[sequence_length-1] # Using only last output for sequence, also tried to take every output into account.
rnn_outputs = layers.flatten(rnn_outputs)
Some fully connected layers reduce the rnn_outputs to my single value.
Actually what I want to do something is this (only that I want to get a value for the currently received frame without having any future frames): How do you pass video features from a CNN to an LSTM? but I have a hard time realizing this in tensorflow
Related
I am a newbie in ML. I have a regression problem. The input layer consists of 7 Parameter. Output layer consists of 128 parameters (which are frequency bands) So I have trained my model and saved the model as well as input- and outputScaler (which is MinMaxScaler for scaling a new input array and retransforming the prediction)
After training, I want to give totally new input arrays without a pause. Just imagine like : we are having (1,7) shape of new inputs in every second. And I am forward them by one by to our loaded model.
loaded_model = tf.keras.models.load_model("model")
inputScaler = joblib.load(input_sc)
outputScaler = joblib.load(output_sc)
new_input = inputScaler.fit_transform(new_input )
prediction = loaded_model.predict(new_input )
inversed_predictData = outputScaler.inverse_transform(prediction)
sum_pred = np.sum(inversed_predictData, axis=1)
sum_pred
We are getting "new_input" in every second and aiming to get a new prediction (sum_pred). But it returns always the first one.
I think the model has to be reset in some way.
I expect to get a different prediction values because I am feeding the model always with a new dataset. (Btw: some parameters have very similar values but still they are different.)
I am reproducing the Sequential MNIST experiment from this paper, where they use a Recurrent Neural Network with 6 layers and apply batch normalization after each layer.
They seem to use sequence-wise normalization meaning that the outputs are normalized not only across batches but across time steps as well. This is a problem, because it means that I cannot modify the BasicRNNCell to do the batch normalization in the cell's call method. For that to work the method would have to know what it outputs in the future time steps.
So, my current solution is for each layer to:
Unroll the RNN layer
Add a batch normalization layer after this
In code it looks like this:
layer_input = network_input
for layer in range(6):
cell = BasicRNNCell(128)
layer_output, _ = tf.nn.dynamic_rnn(cell, layer_input)
layer_output = tf.layers.batch_normalization(layer_output)
layer_input = layer_output
network_output = layer_output
My question: Unrolling the RNN for every layer seems like the brute force way to achieve sequence-wise batch normalization after each layer. Is there a more efficient way, for example one that uses MultiRNNCell?
I am trying to use train an LSTM to behave like a controller. Essential this is a many to many problem. I have 7 input features and with each feature being a sequence of 40 values. My output has two features, also being a sequence of 40 values.
I have 2 layers. First layer has four LSTM cells, and second has two LSTM cells. The code is given below.
The code runs and produces output as expected but I am unable to reduced the training error (Mean square error). The error just stops improving after the first 1000 epochs.
I tried using different batch sizes. But I am getting high error even if it the batch size is one. I tried the same network with a simple sine function, and it is working properly i.e. the error is decreasing. Is this because my sequence length is too large, due to which the vanishing gradient problem is occurring. What can I do to improve training error?
#Specify input and ouput features
Xfeatures = 7 #Number of input features
Yfeatures = 2 #Number of input features
num_steps = 40
# reset everything to rerun in jupyter
tf.reset_default_graph()
# Placeholder for the inputs in a given iteration.
u = tf.placeholder(tf.float32, [train_batch_size,num_steps,Xfeatures])
u_NN = tf.placeholder(tf.float32, [train_batch_size,num_steps,Yfeatures])
with tf.name_scope('Normalization'):
#L2 normalization for input data
Xnorm = tf.nn.l2_normalize(u_opt, 0, epsilon=1e-12, name='Normalize')
lstm1= tf.contrib.rnn.BasicLSTMCell(lstm1_size)
lstm2 = tf.contrib.rnn.BasicLSTMCell(lstm2_size)
stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm1, lstm2])
print(lstm1.output_size)
print(stacked_lstm.output_size)
LSTM_outputs, states = tf.nn.dynamic_rnn(stacked_lstm, Xnorm, dtype=tf.float32)
#Loss
mean_square_error = tf.losses.mean_squared_error(u_NN,LSTM_outputs)
train_step = tf.train.AdamOptimizer(learning_rate).minimize(mean_square_error)
#Initialization and training session
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
#print(sess.run([LSTM_outputs],feed_dict={u_opt:InputX1}))
print(sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}))
for i in range(training_epochs):
sess.run([train_step],feed_dict={u_opt:InputX1,u_NN:InputY1})
if i%display_epoch ==0:
print("Training loss is:",sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}),"at itertion:",i)
print(sess.run([mean_square_error],feed_dict={u_opt:InputX1,u_NN:InputY1}))
print(sess.run([LSTM_outputs],feed_dict={u_opt:InputX1}))
What do you mean with: "First layer has four LSTM cells, and second has two LSTM cells. The code is given below"? Probably you intend the states of the cells.
Your code is not complete but I can try give you some advices.
If your training error is not going down, a possibility is that your net is not well dimensioned. Probably your lstm1_size and lstm2_size are not enough large to capture the characteristics of your data.
LSTMs help you in accumulating the past of a given sequences in a state vector. Usually, the state vector is not used itself as the predictor but it is projected to the output space using a standard feedforward layer. Probably you can just keep a single layer of recursion (a single LSTM layer) and than project the outputs of the layer using a feedforward layer (i.e. g(W*LSTM_outputs+b), where g is a non-linear activation).
I'm making my first steps learning TF and have some trouble training RNNs.
My toy problem goes like this: a two layers LSTM + dense layer network is fed with raw audio data and should test whether a certain frequency is present in the sound.
so the network should 1 to 1 map float(audio data sequence) to float(pre-chosen frequency volume)
I've got this to work on Keras and seen a similar TFLearn solution but would like to implement this on bare Tensorflow in a relatively efficient way.
what i've done:
lstm = rnn_cell.BasicLSTMCell(LSTM_SIZE,state_is_tuple=True,forget_bias=1.0)
lstm = rnn_cell.DropoutWrapper(lstm)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * 2,state_is_tuple=True)
outputs, states = rnn.dynamic_rnn(stacked_lstm, in, dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])
last = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
network= tf.matmul(last, W) + b
# cost function, optimizer etc...
during training I fed this with (BATCH_SIZE, SEQUENCE_LEN,1) batches and it seems like the loss converged correctly but I can't figure out how to predict with the trained network.
My (awful lot of) questions:
how do i make this network return a sequence right from Tensorflow without going back to python for each sample(feed a sequence and predict a sequence of the same size)?
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
after some research:
how do i make this network return a sequence right from Tensorflow
without going back to python for each sample(feed a sequence and
predict a sequence of the same size)?
you can use state_saving_rnn
class Saver():
def __init__(self):
self.d = {}
def state(self, name):
if not name in self.d:
return tf.zeros([1,LSTM_SIZE],tf.float32)
return self.d[name]
def save_state(self, name, val):
self.d[name] = val
return tf.identity('save_state_name') #<-important for control_dependencies
outputs, states = rnn.state_saving_rnn(stacked_lstm, inx, Saver(),
('lstmstate', 'lstmstate2', 'lstmstate3', 'lstmstate4'),sequence_length=[EVAL_SEQ_LEN])
#4 states are for two layers of lstm each has hidden and CEC variables to restore
network = [tf.matmul(outputs[-1], W) for i in xrange(EVAL_SEQ_LEN)]
one problem is that state_saving_rnn is using rnn() and not dynamic_rnn() therefore unroll at compile time EVAL_SEQ_LEN steps you might want to re-implement state_saving_rnn with dynamic_rnn if you want to input long sequences
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
you can use dynamic_rnn and supply initial_state. this is probably just as efficient as state_saving_rnn. look at state_saving_rnn implementations for reference
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
dynamic_rnn does do unrolling at runtime similarly to compile time rnn(). I guess it returns all the steps for you to branch the graph in some other places - after less time steps. in a network that use [one time step input * current state -> one output, new state] like the one described above it's not needed in testing but could be used for training truncated time back propagation
I have seen two different ways of calling lstm on tensorflow and I am confused on what is the difference of one method with the other. And in which situation to use one or the other
The first one is to create an lstm and then call it immediatly like the code below
lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
initial_state = tf.zeros([batch_size, lstm.state_size])
for i in range(num_steps):
# The value of state is updated after processing each batch of words.
output, state = lstm(words[:, i], state)
And the second one is call lstm cell through rnn.rnn() like below.
# Define a lstm cell with tensorflow
lstm = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Split data because rnn cell needs a list of inputs for the RNN inner loop
inputToLstmSplited = tf.split(0, n_steps, inputToLstm) # n_steps * (batch_size, n_hidden)
inputToLstmSplitedFiltered = tf.matmul(inputToLstmSplited, weights['hidden']) + biases['hidden']
# Get lstm cell out
outputs, states = rnn.rnn(lstm, inputToLstmSplited, initial_state=istate)
The second effectively does the same as the loop in the first, returning a list of all the outputs collected in the loop and the final state. It does it a bit more efficiently though and with a number of safety checks. It also supports useful features like variable sequence lengths. The first option is presented in Tensorflow tutorials to give you an idea of how an RNN is unravelled, but the second option is preferred for "production" code.