Random Initialisation of Hidden State of LSTM in keras - tensorflow

I used a model for my music generation project. The model is created as follows
self.model.add(LSTM(self.hidden_size, input_shape=(self.input_length,self.notes_classes),return_sequences=True,recurrent_dropout=dropout) ,)
self.model.add(LSTM(self.hidden_size,recurrent_dropout=dropout,return_sequences=True))
self.model.add(LSTM(self.hidden_size,return_sequences=True))
self.model.add(BatchNorm())
self.model.add(Dropout(dropout))
self.model.add(Dense(256))
self.model.add(Activation('relu'))
self.model.add(BatchNorm())
self.model.add(Dropout(dropout))
self.model.add(Dense(256))
self.model.add(Activation('relu'))
self.model.add(BatchNorm())
self.model.add(Dense(self.notes_classes))
self.model.add(Activation('softmax'))
After Training this model with 70% accuracy, Whenever I generate music, it always gives same kind of starting notes with little variation for whatever the input notes. I think it is possible to solve this condition by initialising the hidden state of the LSTM, at the start of the generation. How can I do that?

There are two states, the state_h which is the last step output; and the state_c which is the carry on state or memory.
You should use a functional API model to have more than one input:
main_input = Input((self.input_length,self.notes_classes))
state_h_input = Input((self.hidden_size,))
state_c_input = Input((self.hidden_size, self.hidden_size))
out = LSTM(self.hidden_size, return_sequences=True,recurrent_dropout=dropout,
initial_state=[state_h_input, state_c_input])(main_input)
#I'm not changing the following layers, they should have their own states if you want to
out = LSTM(self.hidden_size,recurrent_dropout=dropout,return_sequences=True)(out)
out = LSTM(self.hidden_size,return_sequences=True)(out)
out = BatchNorm()(out)
out = Dropout(dropout)(out)
out = Dense(256)(out)
out = Activation('relu')(out)
out = BatchNorm()(out)
out = Dropout(dropout)(out)
out = Dense(256)(out)
out = Activation('relu')(out)
out = BatchNorm()(out)
out = Dense(self.notes_classes)(out)
out = Activation('softmax')(out)
self.model = Model([main_input, state_h_input, state_c_input], out)
Following this approach, it's even possible to use outputs of other layers as initial states, if you want trainable initial states.
The big change is that you will need to pass the states for training and predicting:
model.fit([original_inputs, state_h_data, state_c_data], y_train)
Where you can use zeros for the states during training.

Related

Can I define models in a loop in Keras?

I am trying to train three different models with the same architecture on three different sets of data. This is what I have right now:
models_deaths = {}
models_deaths_histories = {}
epochs_per_severity = {"LARGE": 1000, "MEDIUM":100, "SMALL":5}
for severity in severity_map:
inputs_deaths= Input(name='in_deaths'+severity,shape=(x_train[severity].shape[1], x_train[severity].shape[2]))
x_deaths = Dense(64)(inputs_deaths)
inputs_aux_deaths = Input(name='in_aux_deaths'+severity, shape=[x_aux_train[severity].shape[1]])
x_deaths = ConditionalRNN(128, name='LSTM_deaths'+severity, cell='LSTM')([x_deaths, inputs_aux_deaths])
predictions_deaths = Dense(1, name='output_deaths'+severity, activation='relu')(x_deaths)
models_deaths[severity] = Model(inputs=[inputs_deaths, inputs_aux_deaths], outputs=predictions_deaths)
models_deaths[severity].compile(optimizer='adam', loss='mean_squared_error', metrics=['mse'])
models_deaths_histories[severity] = models_deaths[severity].fit([x_train[severity], x_aux_train[severity]],
y_train_deaths[severity],
epochs=epochs_per_severity[severity],
batch_size=256)
I'm changing the name of the layers, so I thought it would be fine, but I'm getting weird performance, and I'm wondering if it's because it's using the same layers in memory. Thoughts?

How to apply class weights in linear classifier for binary classification?

This is the linear classifier that I am using to perform binary classification, here is code snippet:
my_optimizer = tf.train.AdagradOptimizer(learning_rate = learning_rate)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer,5.0)
# Create a linear classifier object
linear_classifier = tf.estimator.LinearClassifier(
feature_columns = feature_columns,
optimizer = my_optimizer
)
linear_classifier.train(input_fn = training_input_fn, steps = steps)
The dataset is imbalanced, there are only two classes yes/no. The number of NO class examples are 36548 while number of YES class examples are 4640.
How can I apply balancing to this data? I have been searching around and I could find stuff related to class weights etc but I couldn't find how can I create class weights and how to apply to the train method of tensor flow.
Here is how I am calculating losses:
training_probabilities = linear_classifier.predict(input_fn = training_predict_input_fn)
training_probabilities = np.array([item['probabilities'] for item in training_probabilities])
validation_probabilities = linear_classifier.predict(input_fn=validation_predict_input_fn)
validation_probabilities = np.array([item['probabilities'] for item in validation_probabilities])
training_log_loss = metrics.log_loss(training_targets, training_probabilities)
validation_log_loss = metrics.log_loss(validation_targets, validation_probabilities)
I assume that you are using the log_loss function from sklearn for computing your loss. If that is the case you can add class weights by using the argument sample_weight and pass on an array containing the weight to be given for each data point. sample_weight is an rolled out version of class_weights. You can compute sample_weight array by passing on the sample weights as given here.
Add the following lines to your code:
sample_wts = compute_sample_weight("balanced", training_targets)
training_log_loss = metrics.log_loss(training_targets, training_probabilities, sample_weight= sample_wts)

how to merge 'Conv-BN-Scale' into a single 'Conv' layer for tensorflow?

For faster inference one model, I want to merge 'Conv-BN-Scale' into a single 'Conv' layer for my tensorflow model, but I can not find some useful complete example about how to do it?
Anyone can give some advises or complete code example?
Thanks!
To merge two layers, you will need to pass a Tensor and get a tensor back that is after both the layers are applied, suppose your input tensor is X.
def MlConvBnScale(X ,kernel,strides , padding = 'SAME' , scale = False, beta_initializer = 0.1, gamma_initializer = 0.1, moving_mean_initializer = 0.1, moving_variance_initializer = 0.1):
convLout = tf.nn.conv2d(X,
filter = Kernel,
strides = strides,
padding = padding)
return tf.nn.batch_normalization(convLout,
scale = scale,
beta_initializer = beta_initializer,
gamma_initializer = gamma_initializer,
moving_mean_initializer = moving_mean_intializer,
moving_variance_initializer = moving_variance_initializer )
And that will return a tensor after performing both the operations, I have taken default values of variables but you can modify them in your function call, and in case your input is not already a tensor but a numpy array you can use tf.convert_to_tensor() from this link https://www.tensorflow.org/api_docs/python/tf/convert_to_tensor, and in case you are struggling with kernel/filter and its application, check out this thread. What does tf.nn.conv2d do in tensorflow?
If you have any queries or run into trouble implementing it, comment down below and we will see.

Soft attention from scratch for video sequences

I am trying to implement soft attention for video sequences classification. As there are a lot of implementations and examples about NLP so I tried following this schema but for video 1. Basically a LSTM with an Attention Model in between.
1 https://blog.heuritech.com/2016/01/20/attention-mechanism/
My code for my attention layer is the following which I am not sure it is implemented correctly.
def attention_layer(self, input, context):
# Input is a Tensor: [batch_size, lstm_units]
# Input (Seq_length, batch_size, lstm_units)
# Context is a LSTMStateTuple: [batch_size, lstm_units]. Hidden_state, output = StateTuple
hidden_state, _ = context
weights_y = tf.get_variable("att_weights_Y", [self.lstm_units, self.lstm_units], initializer=tf.contrib.layers.xavier_initializer())
weights_c = tf.get_variable("att_weights_c", [self.lstm_units, self.lstm_units], initializer=tf.contrib.layers.xavier_initializer())
z_ = []
for feat in input:
# Equation => M = tanh(Wc c + Wy y)
Wcc = tf.matmul(hidden_state, weights_c)
Wyy = tf.matmul(feat, weights_y)
m = tf.add(Wcc, Wyy)
m = tf.tanh(m, name='M_matrix')
# Equation => s = softmax(m)
s = tf.nn.softmax(m, name='softmax_att')
z = tf.multiply(feat, s)
z_.append(z)
out = tf.stack(z_, axis=1)
out = tf.reduce_sum(out, 1)
return out, s
So, adding this layer in between my LSTMs (or at the begining of my 2 LSTM) makes the training so slow. More specifically, it takes a lot of time when I declare my optimizer:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
My questions are:
Is the implementation correct? If it is, is there a way to optimize it in order to make it train properly?
I was not able to make it work with the seq2seq APIs. Is there any API with Tensorflow that allows me tackle this specific issue?
Does it actually makes sense to use this for sequence classification?

TensorFlow: LSTM State Saving/Updating within Graph

I am working with Reinforcement Learning and wanting to reduce the amount of data I feed through the sess.run() during training to speed up learning.
I was looking into the LSTM and with the need to look forward and reset to find proper Q values, I crafted a solution such as this with tf.case():
CurrentStateOption = tf.Variable(0, trainable=False, name='SavedState')
with tf.name_scope("LSTMLayer") as scope:
initializer = tf.random_uniform_initializer(-.1, .1)
lstm_cell_L1 = tf.nn.rnn_cell.LSTMCell(self.input_sizes, forget_bias=1.0, initializer=initializer, state_is_tuple=True)
self.cell_L1 = tf.nn.rnn_cell.MultiRNNCell([lstm_cell_L1] *self.NumberLSTMLayers, state_is_tuple=True)
self.state = self.cell_L1.zero_state(1,tf.float64)
self.SavedState = self.cell_L1.zero_state(1,tf.float64) #tf.Variable(state, trainable=False, name='SavedState')
#SaveCond = tf.cond(tf.equal(CurrentStateOption,tf.constant(1)), self.SaveState, self.SameState)
#RestoreCond = tf.cond(tf.equal(CurrentStateOption,tf.constant(-1)), self.RestoreState, self.SameState)
#ZeroCond = tf.cond(tf.less(CurrentStateOption,tf.constant(-1)), self.ZeroState, self.SameState)
self.state = tf.case({tf.equal(CurrentStateOption,tf.constant(1)): self.SaveState, tf.equal(CurrentStateOption,tf.constant(-1)): self.RestoreState,
tf.less(CurrentStateOption,tf.constant(-1)): self.ZeroState}, default=self.SameState, exclusive=True)
RunConditions = tf.group([SaveCond, RestoreCond, ZeroCond])
self.Xinputs = [tf.concat(1,[Xinputs])]
outputs, stateFINAL_L1 = rnn.rnn(self.cell_L1,self.Xinputs, initial_state=self.state, dtype=tf.float32)
def RestoreState(self):
#self.state = self.state.assign(self.SavedState)
self.state = self.SavedState
return self.state
def ZeroState(self):
self.state = self.cell_L1.zero_state(1,tf.float64)
return self.state
def SaveState(self):
#self.SavedState = self.SavedState.assign(self.state)
self.SavedState = self.state
return self.SavedState
def SameState(self):
return self.state
This seems to work well in concept as now I can feed an INT to instruct the LSTM Graph what to do with the state. If I Pass "1" it will save the state before executing, if I pass "-1" it will Restore the last saved state, if I pass "< -1" it will zero the state. If "0" it will use what is in the LSTM from last run (inference). I have tried a few different approaches, include a simpler tf.cond() approach.
The issue I think stems from the tf.case() Op needing tensors, but the LSTM state is a Tuple (and non-tuple is going to be depreciated). This became clear when I tried to tf.assign() the value to the graph variable.
My end goal is to leave the "state" within the graph, but pass an INT to instruct what to do with the state. In the future I would like to have multiple "store" locations for various look-backs.
Any ideas how to handle tf.case() type of structure with tuples vs tensors?
I believe having one tf.case() per element in the state tuple should work, since the tuple is just a python tuple.