Can I define models in a loop in Keras? - tensorflow

I am trying to train three different models with the same architecture on three different sets of data. This is what I have right now:
models_deaths = {}
models_deaths_histories = {}
epochs_per_severity = {"LARGE": 1000, "MEDIUM":100, "SMALL":5}
for severity in severity_map:
inputs_deaths= Input(name='in_deaths'+severity,shape=(x_train[severity].shape[1], x_train[severity].shape[2]))
x_deaths = Dense(64)(inputs_deaths)
inputs_aux_deaths = Input(name='in_aux_deaths'+severity, shape=[x_aux_train[severity].shape[1]])
x_deaths = ConditionalRNN(128, name='LSTM_deaths'+severity, cell='LSTM')([x_deaths, inputs_aux_deaths])
predictions_deaths = Dense(1, name='output_deaths'+severity, activation='relu')(x_deaths)
models_deaths[severity] = Model(inputs=[inputs_deaths, inputs_aux_deaths], outputs=predictions_deaths)
models_deaths[severity].compile(optimizer='adam', loss='mean_squared_error', metrics=['mse'])
models_deaths_histories[severity] = models_deaths[severity].fit([x_train[severity], x_aux_train[severity]],
y_train_deaths[severity],
epochs=epochs_per_severity[severity],
batch_size=256)
I'm changing the name of the layers, so I thought it would be fine, but I'm getting weird performance, and I'm wondering if it's because it's using the same layers in memory. Thoughts?

Related

How can i extract the encoded part of multi-modal autoencoder and convert the .h5 model to a numpy array?

I am making a deep multimodal autoencoder model which takes two inputs and produces a two outputs (which are the reconstructed inputs). The two inputs are with shape of (1000, 50) and (1000,60) respectively and the model has 3 hidden layers and aim to concatenate the two latent layer of input1 and input2.
I would like to extract the encoded part of my model and save the data as a numpy array.
here is the complete code of the model :
input_X = Input(shape=(X[0].shape))
dense_X = Dense(40,activation='relu')(input_X)
dense1_X = Dense(20,activation='relu')(dense_X)
latent_X= Dense(2,activation='relu')(dense1_X)
input_X1 = Input(shape=(X1[0].shape))
dense_X1 = Dense(40,activation='relu')(input_X1)
dense1_X1 = Dense(20,activation='relu')(dense_X1)
latent_X1= Dense(2,activation='relu')(dense1_X1)
Concat_X_X1 = concatenate([latent_X, latent_X1])
decoding_X = Dense(20,activation='relu')(Concat_X_X1)
decoding1_X = Dense(40,activation='relu')(decoding_X)
output_X = Dense(X[0].shape[0],activation='sigmoid')(decoding1_X)
decoding_X1 = Dense(20,activation='relu')(Concat_X_X1)
decoding1_X1 = Dense(40,activation='relu')(decoding_X1)
output_X1 = Dense(X1[0].shape[0],activation='sigmoid')(decoding1_X1)
multi_modal_autoencoder = Model([input_X, input_X1], [output_X, output_X1], name='multi_modal_autoencoder')
encoder = Model([input_X, input_X1], Concat_X_X1)
encoder.save('encoder.h5')
multi_modal_autoencoder.compile(optimizer=keras.optimizers.Adam(lr=0.001),loss='mse')
model = multi_modal_autoencoder.fit([X,X1], [X, X1], epochs=70, batch_size=150)
With h5py package you can get into your .h5 file and extract exactly what you want:
f = h5py.File('encoder.h5', 'r')
keys = list(f.keys())
values = f.get('some_key')
You can hierarchically use .get many times to go deeper into your .h5 file to extract what you need.

Random Initialisation of Hidden State of LSTM in keras

I used a model for my music generation project. The model is created as follows
self.model.add(LSTM(self.hidden_size, input_shape=(self.input_length,self.notes_classes),return_sequences=True,recurrent_dropout=dropout) ,)
self.model.add(LSTM(self.hidden_size,recurrent_dropout=dropout,return_sequences=True))
self.model.add(LSTM(self.hidden_size,return_sequences=True))
self.model.add(BatchNorm())
self.model.add(Dropout(dropout))
self.model.add(Dense(256))
self.model.add(Activation('relu'))
self.model.add(BatchNorm())
self.model.add(Dropout(dropout))
self.model.add(Dense(256))
self.model.add(Activation('relu'))
self.model.add(BatchNorm())
self.model.add(Dense(self.notes_classes))
self.model.add(Activation('softmax'))
After Training this model with 70% accuracy, Whenever I generate music, it always gives same kind of starting notes with little variation for whatever the input notes. I think it is possible to solve this condition by initialising the hidden state of the LSTM, at the start of the generation. How can I do that?
There are two states, the state_h which is the last step output; and the state_c which is the carry on state or memory.
You should use a functional API model to have more than one input:
main_input = Input((self.input_length,self.notes_classes))
state_h_input = Input((self.hidden_size,))
state_c_input = Input((self.hidden_size, self.hidden_size))
out = LSTM(self.hidden_size, return_sequences=True,recurrent_dropout=dropout,
initial_state=[state_h_input, state_c_input])(main_input)
#I'm not changing the following layers, they should have their own states if you want to
out = LSTM(self.hidden_size,recurrent_dropout=dropout,return_sequences=True)(out)
out = LSTM(self.hidden_size,return_sequences=True)(out)
out = BatchNorm()(out)
out = Dropout(dropout)(out)
out = Dense(256)(out)
out = Activation('relu')(out)
out = BatchNorm()(out)
out = Dropout(dropout)(out)
out = Dense(256)(out)
out = Activation('relu')(out)
out = BatchNorm()(out)
out = Dense(self.notes_classes)(out)
out = Activation('softmax')(out)
self.model = Model([main_input, state_h_input, state_c_input], out)
Following this approach, it's even possible to use outputs of other layers as initial states, if you want trainable initial states.
The big change is that you will need to pass the states for training and predicting:
model.fit([original_inputs, state_h_data, state_c_data], y_train)
Where you can use zeros for the states during training.

Soft attention from scratch for video sequences

I am trying to implement soft attention for video sequences classification. As there are a lot of implementations and examples about NLP so I tried following this schema but for video 1. Basically a LSTM with an Attention Model in between.
1 https://blog.heuritech.com/2016/01/20/attention-mechanism/
My code for my attention layer is the following which I am not sure it is implemented correctly.
def attention_layer(self, input, context):
# Input is a Tensor: [batch_size, lstm_units]
# Input (Seq_length, batch_size, lstm_units)
# Context is a LSTMStateTuple: [batch_size, lstm_units]. Hidden_state, output = StateTuple
hidden_state, _ = context
weights_y = tf.get_variable("att_weights_Y", [self.lstm_units, self.lstm_units], initializer=tf.contrib.layers.xavier_initializer())
weights_c = tf.get_variable("att_weights_c", [self.lstm_units, self.lstm_units], initializer=tf.contrib.layers.xavier_initializer())
z_ = []
for feat in input:
# Equation => M = tanh(Wc c + Wy y)
Wcc = tf.matmul(hidden_state, weights_c)
Wyy = tf.matmul(feat, weights_y)
m = tf.add(Wcc, Wyy)
m = tf.tanh(m, name='M_matrix')
# Equation => s = softmax(m)
s = tf.nn.softmax(m, name='softmax_att')
z = tf.multiply(feat, s)
z_.append(z)
out = tf.stack(z_, axis=1)
out = tf.reduce_sum(out, 1)
return out, s
So, adding this layer in between my LSTMs (or at the begining of my 2 LSTM) makes the training so slow. More specifically, it takes a lot of time when I declare my optimizer:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
My questions are:
Is the implementation correct? If it is, is there a way to optimize it in order to make it train properly?
I was not able to make it work with the seq2seq APIs. Is there any API with Tensorflow that allows me tackle this specific issue?
Does it actually makes sense to use this for sequence classification?

Why do I get high prediction score for an image of untrained category?

Im using tensorflow for image classification, for 5 categories(5 car parts). After training for 100 epochs, during prediction, when I tested an image (which does not even looks like any of my trained category of images) it matches to one among those 5 category with the score more than 98%. (i have 1200 training images per category)
(For ex, i have trained my model with wheel,mirror,door,steering,headlamp. My testing image is Lily flower. My output is 99% with wheel) why?
Refer the parameters in my code.
def imagerecog(features,labels,mode,params):
input_layer = features["images"]
assert input_layer.shape[1:] == params['input_shape']
convs = []
pools = []
for i in range(params["conv_layers"]):
if i == 0:
convs.append(tf.layers.conv2d(inputs=input_layer,filters=params['filters'][i],
kernel_size=params['kernel_size'],strides=[1,1],
activation=tf.nn.relu,padding="same",name = "conv%d"%i))
else:
convs.append(tf.layers.conv2d(inputs=pools[i-1],filters=params['filters'][i],
kernel_size=params['kernel_size'],strides=[1,1],
activation=tf.nn.relu,padding="same",name = "conv%d"%i))
pools.append(tf.layers.max_pooling2d(inputs=convs[i], pool_size=[2,2], strides=[2,2]))
flat = tf.layers.flatten(pools[-1])
dense1 = tf.layers.dense(inputs=flat, units=params["hidden_units"], name="dense1", activation=tf.nn.relu)
dropout = tf.layers.dropout(inputs=dense1, rate=params["drop_rate"] ,training=mode==tf.estimator.ModeKeys.TRAIN,
name="dropout")
logits = tf.layers.dense(inputs=dropout, units=params["n_classes"], name="logits")
probs = tf.nn.sigmoid(logits, name="probs")
top_5_scores, top_5_class = tf.nn.top_k(probs, k=2, name="scores")
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions={"classes":top_5_class, "scores": top_5_scores})
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
if mode == tf.estimator.ModeKeys.EVAL:
acc = tf.metrics.accuracy(labels=labels,predictions=top_5_class[:,0])
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops={"accuracy": acc})
opt = tf.train.AdamOptimizer().minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=opt)
def inp_fn(folder,image_size):
classes = os.listdir(folder)
def fn():
images = []
labels = []
for i,cls in enumerate(classes):
imgs = os.listdir(folder+"/"+cls)
print(cls,i)
for img in imgs:
img = tf.image.decode_jpeg(tf.read_file(folder+"/"+cls+"/"+img),3,name="jpeg_decode")
img = tf.image.rgb_to_grayscale(img)
img = tf.image.resize_images(img,image_size)
images.append(img)
labels.append(i)
return tf.data.Dataset.from_tensor_slices(({"images":images},labels)).batch(100)
return fn
params = {"input_shape":[200,300,1],
"conv_layers": 3,
"filters":[20,20,20],
"kernel_size":[5,5],
"hidden_units": 9000,
"drop_rate":0.4,
"n_classes":5}
epoch=100
for a in range(epoch):
print("Epoch=",a)
estim.train(inp_fn("train",params['input_shape'][:-1]))
def pred_inp_fn(folder,image_size):
def fn():
files = os.listdir(folder)
images = []
for file in files:
img = tf.image.decode_jpeg(tf.read_file(folder+"/"+file),3)
img = tf.image.rgb_to_grayscale(img)
img = tf.image.resize_images(img,image_size)
images.append(img)
return tf.data.Dataset.from_tensor_slices({"images":images}).batch(100)
return fn
results = estim.predict(pred_inp_fn("predict",params['input_shape'][:-1]))
for res in results:
print(res)
Well, because you did not train for that category. This is an ever-present issue with neural networks (and some other ML techniques), the response of a model to unseen classes of inputs (in the case of classification) is not an even probability distribution "by default", but something unpredictable, and frequently a strong response for one of the classes (possibly the most frequent one, but not necessarily). If you think about it, all of your training examples belonged 100% to a single class, so the model will tend to give answers with the score concentrated in a single category. I wrote another answer to a similar question with a couple of alternatives to model a "non-of-the-others" class, and you can probably look up more literature on the topic. You can also look into other kinds of models, like the object detection API, if they suit better your needs. The point is that you cannot expect your model to exhibit a behavior for which it was not explicitly trained.
You trained your model on only 5 classes. So your model is just like a baby who thinks there are only five objects in the world and tries to relate anything to one of these objects: the one it thinks is the most similar one.
One solution would be to train your model on 6 classes instead of 5, where the 6th class is the "unknown" class which includes any other object in the world (other than the five classes).
You can easily gather training data for the 6th class (it can be an image of anything but the other 5 classes) and train your model.

In what order does TensorFlow evaluate nodes in a computation graph?

I am having a strange bug in TensorFlow. Consider the following code, part of a simple feed-forward neural network:
output = (tf.matmul(layer_3,w_out) + b_out)
prob = tf.nn.sigmoid(output);
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = output, targets = y_, name=None))
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(loss, var_list = model_variables)`
(Notice that prob is not used to define the loss function. This is because sigmoid_cross_entropy applies sigmoid internally in its definition)
I later run the optimizer in the following line:
result,step_loss,_ = sess.run(fetches = [output,loss,optimizer],feed_dict = {x_ : np.array([[x,y,x*x,y*y,x*y]]), y_ : [[1,0]]});
The above works just fine. However, if I instead run the following line to run the code, the network seems to perform terribly, even though there shouldn't be any difference!
result,step_loss,_ = sess.run(fetches = [prob,loss,optimizer],feed_dict = {x_ : np.array([[x,y,x*x,y*y,x*y]]), y_ : [[1,0]]});
I have a feeling it has something to do with the order in which TF computes the nodes in the graph during a session, but I'm not sure. What could the issue be?
It's not an issue with the graph, it's just that you are looking at different things.
In the first example you provide:
result,step_loss,_ = sess.run(fetches = [output,loss,optimizer],feed_dict = {x_ : np.array([[x,y,x*x,y*y,x*y]]), y_ : [[1,0]]})
you are saving the result of running the output op in the result python variable.
In the second one:
result,step_loss,_ = sess.run(fetches = [prob,loss,optimizer],feed_dict = {x_ : np.array([[x,y,x*x,y*y,x*y]]), y_ : [[1,0]]})
you are saving the result of the prob op in the result python variable.
Since both ops are different it is to be expected that the values returned by them would be different.
You could run
logits, activation, step_loss, _ = sess.run(fetches = [output, prob, loss, optimizer], ...)
to check your results.