ConvLSTM2D prediction is the same as the image at t-1 - tensorflow

I am trying to predict the next image in a sequence of images, and I'm not too sure why LSTMs aren't cutting it for me. My predicted image seems to always be a copy of the image at the previous timestep. This is the model that I used. I also had similar results when using Conv3D on my images, but I'm not too sure why this is so. My input has been normalized to be in range [0,1], and I multiplied my output by 255 because my Ys weren't normalized.
This is my LSTM model
lstm = tf.keras.models.Sequential([
tf.keras.layers.Lambda(lambda x: tf.expand_dims(x,axis=-1),input_shape=X.shape[1:]),
tf.keras.layers.GaussianNoise(0.05),
tf.keras.layers.ConvLSTM2D(25,padding='same',kernel_size=(3,3),return_sequences=True,stateful=False),
tf.keras.layers.ConvLSTM2D(25,padding='same',kernel_size=(3,3),return_sequences=True),
tf.keras.layers.ConvLSTM2D(25,padding='same',kernel_size=(3,3),return_sequences=False),
tf.keras.layers.Conv2D(1,padding='same',kernel_size=(1,1),trainable=False),
tf.keras.layers.Lambda(lambda x: tf.keras.backend.squeeze(x,axis=-1)),
tf.keras.layers.Lambda(lambda x: 255. * tf.clip_by_value(x,0.,1.))
])
And this is my Conv3D model
conv = tf.keras.models.Sequential([
tf.keras.layers.Lambda(lambda x: tf.expand_dims(x,axis=-1),input_shape=X.shape[1:]),
tf.keras.layers.GaussianNoise(0.05),
tf.keras.layers.Conv3D(25,padding='same',data_format='channels_last',kernel_size=(5,3,3)),
tf.keras.layers.LeakyReLU(),
tf.keras.layers.Conv3D(25,padding='same',data_format='channels_last',kernel_size=(5,3,3)),
tf.keras.layers.LeakyReLU(),
tf.keras.layers.Conv3D(1,padding='same',data_format='channels_last',kernel_size=(1,1,1), trainable=True),
tf.keras.layers.Lambda(lambda x: tf.keras.backend.squeeze(x,axis=-1)),
tf.keras.layers.Conv2D(1,kernel_size=(1,1),data_format='channels_first',trainable=True),
tf.keras.layers.Lambda(lambda x: tf.keras.backend.squeeze(x,axis=-3)),
tf.keras.layers.LeakyReLU(),
tf.keras.layers.Lambda(lambda x: 255. * tf.clip_by_value(x,0.,1.))
])
I tried making a SSIM loss function but it makes my model predict that the next image will be super bright, and performs much worse than simply using mse.
This is the loss function I made. I know it looks extreme but all my images are very similar in structure to each other, so I believe this harshness is warranted. There were no NaN errors during training.
def custom_err(y_true,y_pred):
#ssim has range [-1,1], with -1 being the worst and 1 being the best
def SSIM(y_true,y_pred):
ssim= tf.image.ssim(tf.expand_dims(y_true,-1),tf.expand_dims(y_pred,-1),255.)
return ssim
ssim=SSIM(y_true,y_pred)
return 10**(abs(ssim - 1) * 20) - 1

Related

Categorical_crossentropy loss function has value of 0.0000e +00 for a BiLSTM sentiment analysis model

This is the graph of my model
Model
Code format:
def model_creation(vocab_size, embedding_dim, embedding_matrix,
rnn_units, batch_size,
train_embed=False):
model = Sequential(
[
Embedding(vocab_size, embedding_dim,
weights=[embedding_matrix], trainable=train_embed, mask_zero=True),
Bidirectional(LSTM(rnn_units, return_sequences=True, dropout=0.5)),
Bidirectional(LSTM(rnn_units, dropout=0.25)),
Dense(1, activation="softmax")
])
return model
The embedding layer receive an embedding matrix with value from Word2Vec
This is the code for the embedding matrix:
Embedding Matrix
def create_embedding_matrix(encoder,dict_w2v):
embedding_dim = 50
embedding_matrix = np.zeros((encoder.vocab_size, embedding_dim))
for word in encoder.tokens:
embedding_vector = dict_w2v.get(word)
if embedding_vector is not None: # dictionary contains word
test = encoder.encode(word)
token_id = encoder.encode(word)[0]
embedding_matrix[token_id] = embedding_vector
return embedding_matrix
Dataset
I'm using the amazon product dataset https://jmcauley.ucsd.edu/data/amazon/
This is what the dataframe look like
I'm only interested in overall and reviewText
overall is my Label and reviewText is my Feature
overall has a range of [1,5]
Problem
During training with categorical_crossentropy loss the is at 0.0000e +00, I don't think loss can be minimized well so accuracy is always at 0.1172
Did I configure my model wrong or is there any problem? How do I fix my loss function issue ? Please tell me if it's not clear enough I'll provide more information. I'm not sure what the problem is

Create a weighted MSE loss function in Tensorflow

I want to train a recurrent neural network using Tensorflow. My model outputs a 1 by 100 vector for each training sample. Assume that y = [y_1, y_2, ..., y_100] is my output for training sample x and the expected output is y'= [y'_1, y'_2, ..., y'_100].
I wish to write a custom loss function that calculates the loss of this specific sample as follows:
Loss = 1/sum(weights) * sqrt(w_1*(y_1-y'_1)^2 + ... + w_100*(y_100-y'_100)^2)
which weights = [w_1,...,w_100] is a given weight array.
Could someone help me with implementing such a custom loss function? (I also use mini-batches while training)
I want to underline that you have 2 possibilities according to your problem:
[1] If the weights are equal for all your samples:
You can build a loss wrapper. Here a dummy example:
n_sample = 200
X = np.random.uniform(0,1, (n_sample,10))
y = np.random.uniform(0,1, (n_sample,100))
W = np.random.uniform(0,1, (100,)).astype('float32')
def custom_loss_wrapper(weights):
def loss(true, pred):
sum_weights = tf.reduce_sum(weights) * tf.cast(tf.shape(pred)[0], tf.float32)
resid = tf.sqrt(tf.reduce_sum(weights * tf.square(true - pred)))
return resid/sum_weights
return loss
inp = Input((10,))
x = Dense(256)(inp)
pred = Dense(100)(x)
model = Model(inp, pred)
model.compile('adam', loss=custom_loss_wrapper(W))
model.fit(X, y, epochs=3)
[2] If the weights are different between samples:
You should build your model usind add_loss in order to dinamically take into account the weights for each sample. Here a dummy example:
n_sample = 200
X = np.random.uniform(0,1, (n_sample,10))
y = np.random.uniform(0,1, (n_sample,100))
W = np.random.uniform(0,1, (n_sample,100))
def custom_loss(true, pred, weights):
sum_weights = tf.reduce_sum(weights)
resid = tf.sqrt(tf.reduce_sum(weights * tf.square(true - pred)))
return resid/sum_weights
inp = Input((10,))
true = Input((100,))
weights = Input((100,))
x = Dense(256)(inp)
pred = Dense(100)(x)
model = Model([inp,true,weights], pred)
model.add_loss(custom_loss(true, pred, weights))
model.compile('adam', loss=None)
model.fit([X,y,W], y=None, epochs=3)
When using add_loss you should pass all the tensors involved in the loss as input layer and pass them inside the loss for the computation.
At inference time you can compute predictions as always, simply removing the true and weights as input:
final_model = Model(model.input[0], model.output)
final_model.predict(X)
You can implement custom weighted mse in the following way
import numpy as np
from tensorflow.keras import backend as K
def custom_mse(class_weights):
def weighted_mse(gt, pred):
# Formula:
# w_1*(y_1-y'_1)^2 + ... + w_100*(y_100-y'_100)^2 / sum(weights)
return K.sum(class_weights * K.square(gt - pred)) / K.sum(class_weights)
return weighted_mse
y_true = np.array([[0., 1., 1, 0.], [0., 0., 1., 1.]])
y_pred = np.array([[0., 1, 0., 1.], [1., 0., 1., 1.]])
weights = np.array([0.25, 0.50, 1., 0.75])
print(y_true.shape, y_pred.shape, weights.shape)
(2, 4) (2, 4) (4,)
loss = custom_mse(class_weights=weights)
loss(y_true, y_pred).numpy()
0.8
Using it with model compilation.
model.compile(loss=custom_mse(weights))
This will compute mse with the provided weighted matrices. However, in your question, you quote sqrt..., from which I presume you meant root mse (rmse). To do that you can use K.sqrt(K.sum(...)) / K.sum(...) in the custom function of custom_mse.
FYI, you may also interest to look at class_weights and sample_weights during Model. fit. From source:
class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss
function (during training only). This can be useful to tell the model
to "pay more attention" to samples from an under-represented class.
sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only).
You can either pass a flat (1D) Numpy array with the same length as
the input samples (1:1 mapping between weights and samples), or in the
case of temporal data, you can pass a 2D array with shape (samples,
sequence_length), to apply a different weight to every timestep of
every sample. This argument is not supported when x is a dataset,
generator, or keras.utils.Sequence instance, instead provides the
sample_weights as the third element of x.
And also loss_weights in Model.compile, from source
loss_weights: Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of
different model outputs. The loss value that will be minimized by the
model will then be the weighted sum of all individual losses, weighted
by the loss_weights coefficients. If a list, it is expected to have a
1:1 mapping to the model's outputs. If a dict, it is expected to map
output names (strings) to scalar coefficients.
A class version of the weighted mean squared error loss function.
class WeightedMSE(object):
def __init__(self):
pass
def __call__(self, y_true, y_pred, weights):
sum_weights = tf.reduce_sum(weights)
resid = tf.reduce_sum(weights * tf.square(y_true - y_pred))
return resid / sum_weights

Neural Network isn't learning anything meaningful using Triplet Loss

I am working on a triplet loss based model for this Kaggle competition.
Short Description- In this competition, we have been challenged to build an algorithm to identify individual whales in images by analyzing a database of containing more than 25,000 images, gathered from research institutions and public contributors.
https://www.kaggle.com/c/humpback-whale-identification?rvi=1
I have decided to use a Siamese network architecture and train it to give me encodings which I can then use to calculate the distance between two pictures of whales. If this distance is below a particular threshold the two pictures belong to the same whale and if this distance is greater then, they aren't the same whale.
This is the Triplet loss function(learnt it from Andrew's deeplearning specialization) I used but i also normalized the encoding's to make the loss function more interpretable(easier to determine margin and split point) across different models(if that makes sense).(First, tried it without the normalization and when it didnt work i tried normalizing.) I also have tried changing alpha(margin) and varied it from 0.2 to 0.6.
from tensorflow.nn import l2_normalize as norm_l2
def triplet_loss(y_true, y_pred, alpha = 0.3):
"""
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 128)
positive -- the encodings for the positive images, of shape (None, 128)
negative -- the encodings for the negative images, of shape (None, 128)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
anchor, positive, negative = norm_l2(anchor), norm_l2(positive), norm_l2(negative)
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)), axis = -1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)), axis = -1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
return loss
This is an example of one of the model architectures i tried out. I have tried using pretrained Facenet, ResNet, DenseNet and Xception till now. I have tried Freezing different numbers of layers in each.
R = tf.keras.applications.ResNet50(include_top=False, weights = 'imagenet', input_shape=(224,224,3))
lr = 0.0001
optimizer = Adam(learning_rate=lr)
R.compile(optimizer=optimizer, loss = triplet_loss)
for layer in R.layers[0:30]:
layer.trainable = False
em_Rmodel = Sequential([
R,
GlobalAveragePooling2D(),
#tf.keras.layers.GlobalMaxPooling2D(),
Dense(512, activation='relu'),
bn(),
Dense(256, activation = 'sigmoid'),
Dense(128, activation = 'sigmoid')
])
def make_tripletModel(model):
#I was manually changing the input shape to fit the default shape of pretrained networks
A = Input(shape = (224, 224, 3), name='anchor')
P = Input(shape = (224, 224, 3), name = 'anchorPositive')
N = Input(shape = (224, 224, 3), name = 'anchorNegative')
enc_A = model(A)
enc_P = model(P)
enc_N = model(N)
tripletModel = Model(inputs=[A, P, N], outputs=[enc_A, enc_P, enc_N])
return tripletModel
tripletModel = make_tripletModel(em_Rmodel)
I have been training using semi-hard triplets and have also been augmenting data properly to generate more training images.
This is the batch generator that i used for training. crop_batch is a function that crops images to show only the whale's tail, using which one can identify whales. It uses a DenseNet trained on more than 1000 images with whale tails and the bounding box surrounding it. Does the work sufficiently well.
def batch_generator_RN(batch_size = batch_size, ishape = (256, 256, 3), model_input_shape = (224, 224, 3)):
triplet_generator = get_triplets()
y_val = np.zeros((batch_size, 2, 1))
anchors = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
positives = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
negatives = np.zeros((batch_size, ishape[0], ishape[1], ishape[2]))
while True:
for i in range(batch_size):
anchors[i], positives[i], negatives[i] = next(triplet_generator)
anc = crop_batch(anchors, batch_size= batch_size, img_shape=model_input_shape)
pos = crop_batch(positives, batch_size= batch_size, img_shape=model_input_shape)
neg = crop_batch(negatives, batch_size= batch_size, img_shape=model_input_shape)
x_data = {'anchor': anc,
'anchorPositive': pos,
'anchorNegative': neg
}
yield (x_data, [y_val, y_val, y_val])
And finally, this, in general, is how i have been trying to train these models. I have tried reducing and increasing learning rate, batch_size = 16.
lr = 0.0001
optimizer = Adam(learning_rate=lr)
tripletModel.compile(optimizer = optimizer, loss = triplet_loss)
es = EarlyStopping(monitor='loss', patience=20, min_delta=0.05, restore_best_weights=True)
#mc = ModelCheckpoint('Rmodel.h5', monitor='loss', save_best_only=True, save_weights_only=True)
rlr = ReduceLROnPlateau(monitor='loss',min_delta=0.05,factor = 0.1,patience = 5, verbose = 1, min_lr = 0)
gen = batch_generator(batch_size)
tripletModel.fit(gen, steps_per_epoch=64, epochs = 40, callbacks=[es, rlr])
So after training all these models, in some models the triplet loss does go down for a while but then plateaus and basically learns nothing meaningful(which basically means that just by looking at the distance between two embeddings i cant figure out if they are the same whale or not.). In other models, immediately after the first or the second epoch the weights converge and don't change at all and doesn't learning anything.
I have tried a very wide range of learning rates and i am pretty sure that it isnt the problem.
Please tell me if i should add all the code files for you to understand the problem better. The reason i havent done it yet because i havent cleaned it but will gladly do so if required. Thanks.
When you say that it doesn't learn anything, is it that the loss reaches a plateau and thus it stops decreasing or it does decrease significantly but when you predict the embeddings of both same and different whales are are similar in value?
The triples_loss() fn and batch_generator_RN() fn are correct, the problem is not related to the data generation.
However, I suspect that your learning rate is too high while you freeze a lot of layers, i.e. numerous trainable parameters are frozen, which may lead to your network being unable to converge.
My suggestion is to unfreeze all the layers and decrease the learning rate to 0.00001 and start training again, regardless of the architecture that you use (Xception/ResNet etc.)

Loss not converging in Polynomial regression in Tensorflow

import numpy as np
import tensorflow as tf
#input data:
x_input=np.linspace(0,10,1000)
y_input=x_input+np.power(x_input,2)
#model parameters
W = tf.Variable(tf.random_normal([2,1]), name='weight')
#bias
b = tf.Variable(tf.random_normal([1]), name='bias')
#placeholders
#X=tf.placeholder(tf.float32,shape=(None,2))
X=tf.placeholder(tf.float32,shape=[None,2])
Y=tf.placeholder(tf.float32)
x_modified=np.zeros([1000,2])
x_modified[:,0]=x_input
x_modified[:,1]=np.power(x_input,2)
#model
#x_new=tf.constant([x_input,np.power(x_input,2)])
Y_pred=tf.add(tf.matmul(X,W),b)
#algortihm
loss = tf.reduce_mean(tf.square(Y_pred -Y ))
#training algorithm
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
#initializing the variables
init = tf.initialize_all_variables()
#starting the session session
sess = tf.Session()
sess.run(init)
epoch=100
for step in xrange(epoch):
# temp=x_input.reshape((1000,1))
#y_input=temp
_, c=sess.run([optimizer, loss], feed_dict={X: x_modified, Y: y_input})
if step%50==0 :
print c
print "Model paramters:"
print sess.run(W)
print "bias:%f" %sess.run(b)
I'm trying to implement Polynomial regression(quadratic) in Tensorflow. The loss isn't converging. Could anyone please help me out with this. The similar logic is working for linear regression though!
First there is a problem in your shapes, for Y_pred and Y:
Y has unknown shape, and is fed with an array of shape (1000,)
Y_pred has shape (1000, 1)
Y - Y_pred will then have shape (1000, 1000)
This small code will prove my point:
a = tf.zeros([1000]) # shape (1000,)
b = tf.zeros([1000, 1]) # shape (1000, 1)
print (a-b).get_shape() # prints (1000, 1000)
You should use consistent types:
y_input = y_input.reshape((1000, 1))
Y = tf.placeholder(tf.float32, shape=[None, 1])
Anyway, the loss is exploding because you have very high values (input between 0 and 100, you should normalize it) and thus very high loss (around 2000 at the beginning of training).
The gradient is very high and the parameters explode, and the loss gets to infinite.
The quickest fix is to lower your learning rate (1e-5 converges for me, albeit very slowly at the end). You can make it higher after the loss converges to around 1.

Gradients are always zero

I have written an algorithm using tensorflow framework and faced with the problem, that tf.train.Optimizer.compute_gradients(loss) returns zero for all weights. Another problem is if I put batch size larger than about 5, tf.histogram_summary for weights throws an error that some of values are NaN.
I cannot provide here a reproducible example, because my code is quite bulky and I am not so good in TF for make it shorter. I will try to paste here some fragments.
Main loop:
images_ph = tf.placeholder(tf.float32, shape=some_shape)
labels_ph = tf.placeholder(tf.float32, shape=some_shape)
output = inference(BATCH_SIZE, images_ph)
loss = loss(labels_ph, output)
train_op = train(loss, global_step)
session = tf.Session()
session.run(tf.initialize_all_variables())
for i in xrange(MAX_STEPS):
images, labels = train_dataset.get_batch(BATCH_SIZE, yolo.INPUT_SIZE, yolo.OUTPUT_SIZE)
session.run([loss, train_op], feed_dict={images_ph : images, labels_ph : labels})
Train_op (here is the problem occures):
def train(total_loss)
opt = tf.train.AdamOptimizer()
grads = opt.compute_gradients(total_loss)
# Here gradients are zeros
for grad, var in grads:
if grad is not None:
tf.histogram_summary("gradients/" + var.op.name, grad)
return opt.apply_gradients(grads, global_step=global_step)
Loss (the loss is calculated correctly, since it changes from sample to sample):
def loss(labels, output)
return tf.reduce_mean(tf.squared_difference(labels, output))
Inference: a set of convolution layers with ReLU followed by 3 fully connected layers with sigmoid activation in the last layer. All weights initialized by truncated normal rv's. All labels are vectors of fixed length with real numbers in range [0,1].
Thanks in advance for any help! If you have some hypothesis for my problem, please share I will try them. Also I can share the whole code if you like.