I'm trying to make a transfer learning with inceptionv1 but the classifier is not working well predicting one image, what is wrong?
from skimage.transform import resize
m = tf.keras.Sequential([hub.KerasLayer("https://tfhub.dev/google/imagenet/inception_v1/classification/4")]) # load the tensorflow hub model
m.build([None, 224, 224, 3])
rimg = resize(img, output_shape=(1,224,224,3),anti_aliasing=True) # resize and reshape the image to [1,224,224,3]
rimg = (rimg-np.min(rimg))/(np.max(rimg)-np.min(rimg)).astype(np.float32) # normalize the image to a [0,1] range
logits = m(rimg) # feed the image into the model to obtain the logits
probs = np.exp(logits)/(np.sum(np.exp(logits))) # convert logits to probabilities
You're applying min-max normalization while dividing each pixel value by 255 should be used instead. Specifically, the least intense pixel value possible (0) should be mapped to 0 while the maximum (255) should be mapped to 1. Thus, an image like [64, 128] should be mapped to [0.25, 0.5] while your normalization maps it to [0, 1] instead.
Related
I'm working with time series, and understand that keras.layers.Masking and keras.layers.Embedding are useful to create a mask value in the network which indicates timesteps to 'skip'. The mask value is propagated throughout the network to be used by any layers that support it.
The Keras documentation doesn't specify any further impacts of the mask value. My expectation is that the mask would be applied through all functions in model training and evaluation, but I don't see any evidence in support of this.
Does the mask value impact back-propagation?
Does the mask value impact the loss function or the metrics?
Would it be wise or foolish to use the sample_weight parameter in model.compile() to tell Keras to 'ignore' the masked timesteps in the loss function?
I've performed some experiments to answer these questions.
Here's my sample code:
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
# Fix the random seed for repeatable results
np.random.seed(5)
tf.random.set_seed(5)
x = np.array([[[3, 0], [1, 4], [3, 2], [4, 0], [4, 5]],
[[1, 2], [3, 1], [1, 3], [5, 1], [3, 5]]], dtype='float64')
# Choose some values to be masked out
mask = np.array([[False, False, True, True, True],
[ True, True, False, False, True]]) # True:keep. False:ignore
samples, timesteps, features_in = x.shape
features_out = 1
y_true = np.random.rand(samples, timesteps, features_out)
# y_true[~mask] = 1e6 # TEST MODIFICATION
# Apply the mask to x
mask_value = 0 # Set to any value
x[~mask] = [mask_value] * features_in
input_tensor = keras.Input(shape=(timesteps, features_in))
this_layer = input_tensor
this_layer = keras.layers.Masking(mask_value=mask_value)(this_layer)
this_layer = keras.layers.Dense(10)(this_layer)
this_layer = keras.layers.Dense(features_out)(this_layer)
model = keras.Model(input_tensor, this_layer)
model.compile(loss='mae', optimizer='adam')
model.fit(x=x, y=y_true, epochs=100, verbose=0)
y_pred = model.predict(x)
print("y_pred = ")
print(y_pred)
print("model weights = ")
print(model.get_weights()[1])
print(f"{'model.evaluate':>14s} = {model.evaluate(x, y_true, verbose=0):.5f}")
# See if the loss computed by model.evaluate() is equal to the masked loss
error = y_true - y_pred
masked_loss = np.abs(error[mask]).mean()
unmasked_loss = np.abs(error).mean()
print(f"{'masked loss':>14s} = {masked_loss:.5f}")
print(f"{'unmasked loss':>14s} = {unmasked_loss:.5f}")
Which outputs
y_pred =
[[[-0.28896046]
[-0.28896046]
[ 0.1546848 ]
[-1.1596009 ]
[ 1.5819632 ]]
[[ 0.59000516]
[-0.39362794]
[-0.28896046]
[-0.28896046]
[ 1.7996234 ]]]
model weights =
[-0.06686568 0.06484845 -0.06918766 0.06470951 0.06396528 0.06470013
0.06247645 -0.06492618 -0.06262784 -0.06445726]
model.evaluate = 0.60170
masked loss = 1.00283
unmasked loss = 0.90808
mask and loss calculation
Surprisingly, the 'mae' (mean absolute error) loss calculation does NOT exclude the masked timesteps from the calculation. Instead, it assumes that these timesteps have zero loss - a perfect prediction. Therefore, every masked timestep actually reduces the calculated loss!
To explain in more detail: the above sample code input x has 10 timesteps. 4 of them are removed by the mask, so 6 valid timesteps remain. The 'mean absolute error' loss calculation sums the losses for the 6 valid timesteps, then divides by 10 instead of dividing by 6. This looks like a bug to me.
output values are masked
Output values of masked timesteps do not impact the model training or evaluation (as it should be).
This can be easily tested by setting:
y_true[~mask] = 1e6
The model weights, predictions and losses remain exactly the same.
input values are masked
Input values of masked timesteps do not impact the model training or evaluation (as it should be).
Similarly, I can change mask_value from 0 to any other number, and the resulting model weights, predictions, and losses remain exactly the same.
In summary:
Q1: Effectively yes - the mask impacts the loss function, which is used through backpropagation to update the weights.
Q2: Yes, but the mask impacts the loss in an unexpected way.
Q3: Initially foolish - the mask should already be applied to the loss calculation. However, perhaps sample_weights could be valuable to correct the unexpected method of the loss calculation...
Note that I'm using Tensorflow 2.7.0.
I have been struggling through this on a related issue, namely implementing a mask to a multi-output model where some samples are missing labels for different outputs. Here, construct features, labels, sample_weights from a dataset and labels and sample_weights are dictionaries with equivalent keys. The weights are 0,1 for each sample indicating if it should contribute to the calculation for the relevant loss.
I had hoped that sample_weights would contribute to the loss as they do when I pass the metric equivalents for the losses via weight_metrics in model.compile
I've found that sample_weight does not seem to address this problem. I can tell from the training metrics that the task_loss values are different from task_metric values when sample weights are used.
I've given up on this and decided to go ahead and use masking. The masked loss values are low in your case (and in mine) because tensorflow sees the modeled output as perfection - I hope this means it does not see a gradient for this points and so parameters aren't tuned in response.
I am trying to create an object localization model to detect license plate in an image of a car. I used VGG16 model and excluded the top layer to add my own dense layers, with the final layer having 4 nodes and sigmoid activation to get (xmin, ymin, xmax, ymax).
I used the functions provided by keras to read image, and resize it to (224, 244, 3), and also used preprocess_input() function to process the input. I also tried to manually process the image by resizing with padding to maintain proportion, and normalize the input by dividing by 255.
Nothing seems to work when I train. I get 0% train and test accuracy. Below is my code for this model.
def get_custom(output_size, optimizer, loss):
vgg = VGG16(weights="imagenet", include_top=False, input_tensor=Input(shape=IMG_DIMS))
vgg.trainable = False
flatten = vgg.output
flatten = Flatten()(flatten)
bboxHead = Dense(128, activation="relu")(flatten)
bboxHead = Dense(32, activation="relu")(bboxHead)
bboxHead = Dense(output_size, activation="sigmoid")(bboxHead)
model = Model(inputs=vgg.input, outputs=bboxHead)
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
return model
X and y were of shapes (616, 224, 224, 3) and (616, 4) respectively. I divided the coordinates by the length of the respective sides so each value in y is in range (0,1).
I'll link my python notebook below from github so you can see the full code. I am using google colab to train the model.
https://github.com/gauthamramesh3110/image_processing_scripts/blob/main/License_Plate_Detection.ipynb
Thanks in advance. I am really in need of help here.
If you're doing object localization task then you shouldn't using 'accuracy' as your metrics, because docs of compile() said:
When you pass the strings 'accuracy' or 'acc', we convert this to one
of tf.keras.metrics.BinaryAccuracy,
tf.keras.metrics.CategoricalAccuracy,
tf.keras.metrics.SparseCategoricalAccuracy based on the loss function
used and the model output shape
You should using tf.keras.metrics.MeanAbsoluteError, IoU(Intersection Over Union) or mAP(Mean Average Precision) instead
I am aware that I can visualize the weights of the layers in a histogram using tensorboard Understanding TensorBoard (weight) histograms
My question, is it possible to "split" a fully connected layer into two separate histograms ? Because I have input coming from 2 sources that is concatenated before before going through a fully connected layer and I want to see the weight distribution for the 2 sources. Below I have a simple example where a and b are concatenated before being passed through a fully connected layer.
a is of size 1024 and b of size 256. The out layer has 1024 units.
out = tf.matmul(tf.concat(values=(a, b), axis=1), weight) + bias
Assuming your weight to have shape 1280 x 1024, you can first split your weight as
weight_a = tf.slice(weight, [0, 0], [1024, 1024])
weight_b = tf.slice(weight, [1024, 0], [1280, 1024])
Now, you can visualize weight_a and weight_b.
The slicing can be generalized as well but since you explicitly specified the size of each tensor, the above is the quickest method.
It is is a common practice in convolutional neural networks to oversample a given image during inference,
I.e to create a batch from different transformation of the same image (most common - different crops and mirroring), transfer the entire batch through the network and average (or another kind of reducing function) over the results to get a single prediction (caffe example),
How can this approach be implemented in tensorflow?
You can take a look at the TF cnn tutorial. In particular, the function distorted_inputs does the image preprocessing step.
In short, there are a couple of TF functions in the tf.image package that help with distorting the images. You can use either them or regular numpy functions to create an extra dimension for the output, for which you can average the results:
Before:
input_place = tf.placeholder(tf.float32, [None, 256, 256, 3])
prediction = some_model(input_place) # size: [None]
sess.run(prediction, feed_dict={input_place: batch_of_images})
After:
input_place = tf.placeholder(tf.float32, [None, NUM_OF_DISTORTIONS, 256, 256, 3])
prediction = some_model(input_place) # make sure it is of size [None, NUM_DISTORTIONS]
new_prediction = tf.reduce_mean(prediction, axis=1)
new_batch = np.zeros(batch_size, NUM_OF_DISTORTIONS, 256, 256, 3)
for i in xrange(len(batch_of_images)):
for f in xrange(len(distortion_functions)):
new_batch[i, f, :, :, :] = distortion_functions[f](batch_of_images[i])
sess.run(new_prediction, feed_dict={input_place: new_batch})
Take a look at TF's image-related functions. You could apply those transformations at test time to some input image, and stack all of them together to make a batch.
I imagine you could also do this using OpenCV or some other image processing tool. I don't see a need to do it in the computation graph. You could create the batches beforehand, and pass it through in feed_dict.
I'm trying to use the Tensorflow's CTC implementation under contrib package (tf.contrib.ctc.ctc_loss) without success.
First of all, anyone know where can I read a good step-by-step tutorial? Tensorflow's documentation is very poor on this topic.
Do I have to provide to ctc_loss the labels with the blank label interleaved or not?
I could not be able to overfit my network even using a train dataset of length 1 over 200 epochs. :(
How can I calculate the label error rate using tf.edit_distance?
Here is my code:
with graph.as_default():
max_length = X_train.shape[1]
frame_size = X_train.shape[2]
max_target_length = y_train.shape[1]
# Batch size x time steps x data width
data = tf.placeholder(tf.float32, [None, max_length, frame_size])
data_length = tf.placeholder(tf.int32, [None])
# Batch size x max_target_length
target_dense = tf.placeholder(tf.int32, [None, max_target_length])
target_length = tf.placeholder(tf.int32, [None])
# Generating sparse tensor representation of target
target = ctc_label_dense_to_sparse(target_dense, target_length)
# Applying LSTM, returning output for each timestep (y_rnn1,
# [batch_size, max_time, cell.output_size]) and the final state of shape
# [batch_size, cell.state_size]
y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), # num_proj=num_classes
data,
dtype=tf.float32,
sequence_length=data_length,
)
# For sequence labelling, we want a prediction for each timestamp.
# However, we share the weights for the softmax layer across all timesteps.
# How do we do that? By flattening the first two dimensions of the output tensor.
# This way time steps look the same as examples in the batch to the weight matrix.
# Afterwards, we reshape back to the desired shape
# Reshaping
logits = tf.transpose(y_rnn1, perm=(1, 0, 2))
# Get the loss by calculating ctc_loss
# Also calculates
# the gradient. This class performs the softmax operation for you, so inputs
# should be e.g. linear projections of outputs by an LSTM.
loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length))
# Define our optimizer with learning rate
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)
# Decoding using beam search
decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)
Thanks!
Update (06/29/2016)
Thank you, #jihyeon-seo! So, we have at input of RNN something like [num_batch, max_time_step, num_features]. We use the dynamic_rnn to perform the recurrent calculations given the input, outputting a tensor of shape [num_batch, max_time_step, num_hidden]. After that, we need to do an affine projection in each tilmestep with weight sharing, so we've to reshape to [num_batch*max_time_step, num_hidden], multiply by a weight matrix of shape [num_hidden, num_classes], sum a bias undo the reshape, transpose (so we will have [max_time_steps, num_batch, num_classes] for ctc loss input), and this result will be the input of ctc_loss function. Did I do everything correct?
This is the code:
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)
# Reshaping to share weights accross timesteps
x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])
self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1
# Reshaping
self._logits = tf.reshape(self._logits, [max_length, -1, num_classes])
# Calculating loss
loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)
self.cost = tf.reduce_mean(loss)
Update (07/11/2016)
Thank you #Xiv. Here is the code after the bug fix:
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)
# Reshaping to share weights accross timesteps
x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])
self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1
# Reshaping
self._logits = tf.reshape(self._logits, [-1, max_length, num_classes])
self._logits = tf.transpose(self._logits, (1,0,2))
# Calculating loss
loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)
self.cost = tf.reduce_mean(loss)
Update (07/25/16)
I published on GitHub part of my code, working with one utterance. Feel free to use! :)
I'm trying to do the same thing.
Here's what I found you may be interested in.
It was really hard to find the tutorial for CTC, but this example was helpful.
And for the blank label, CTC layer assumes that the blank index is num_classes - 1, so you need to provide an additional class for the blank label.
Also, CTC network performs softmax layer. In your code, RNN layer is connected to CTC loss layer. Output of RNN layer is internally activated, so you need to add one more hidden layer (it could be output layer) without activation function, then add CTC loss layer.
See here for an example with bidirectional LSTM, CTC, and edit distance implementations, training a phoneme recognition model on the TIMIT corpus. If you train on that corpus's training set, you should be able to get phoneme error rates down to 20-25% after 120 epochs or so.