How do i simple resize mnist dataset with cv2 library? - resize

Im trying to make a handwritten digit recognition program with machine learning. As the title says, i want to simple resize mnist dataset with cv2 library. And i want the resize to be done before normalizing the data, so that its the resized data that gets through my ANN(Artificial Neural Network) model. But im not really sure how to do that, this is the code i have:
model.add = modell
train = keras.utils.normalize(train)
test = keras.utils.normalize(test)
model=Sequential()
modell(Flatten())
modell(Dense(200, activation='sigmoid'))
modell(Dense(200, activation='sigmoid'))
modell(Dense(10, activation="sigmoid"))
model.fit(img, lab, epochs=1)
print(model.evaluate(test, test))
pred = model.predict(test[:1])
p=np.argmax(pred, axis=1)
print("Machine", p)
print("Correct", test[:1])
I have tried to put this in to my code, right before normalizing my data:
for i in train_img:
photo = cv2.reshape(train_img[i] 19 19)
img.append(photo)
But it gives me this error:
TypeError: 'tuple' object is not callable
How do i do it?

in your code i is in train_img. Therefore i is an image and can not be used as an index
so try this
new_train_img = []
for img in train_img:
new_train.append(cv2.resize(img,(14,14)))

Related

Bounding Box regression using Keras transfer learning gives 0% accuracy. The output layer with Sigmoid activation only outputs 0 or 1

I am trying to create an object localization model to detect license plate in an image of a car. I used VGG16 model and excluded the top layer to add my own dense layers, with the final layer having 4 nodes and sigmoid activation to get (xmin, ymin, xmax, ymax).
I used the functions provided by keras to read image, and resize it to (224, 244, 3), and also used preprocess_input() function to process the input. I also tried to manually process the image by resizing with padding to maintain proportion, and normalize the input by dividing by 255.
Nothing seems to work when I train. I get 0% train and test accuracy. Below is my code for this model.
def get_custom(output_size, optimizer, loss):
vgg = VGG16(weights="imagenet", include_top=False, input_tensor=Input(shape=IMG_DIMS))
vgg.trainable = False
flatten = vgg.output
flatten = Flatten()(flatten)
bboxHead = Dense(128, activation="relu")(flatten)
bboxHead = Dense(32, activation="relu")(bboxHead)
bboxHead = Dense(output_size, activation="sigmoid")(bboxHead)
model = Model(inputs=vgg.input, outputs=bboxHead)
model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
return model
X and y were of shapes (616, 224, 224, 3) and (616, 4) respectively. I divided the coordinates by the length of the respective sides so each value in y is in range (0,1).
I'll link my python notebook below from github so you can see the full code. I am using google colab to train the model.
https://github.com/gauthamramesh3110/image_processing_scripts/blob/main/License_Plate_Detection.ipynb
Thanks in advance. I am really in need of help here.
If you're doing object localization task then you shouldn't using 'accuracy' as your metrics, because docs of compile() said:
When you pass the strings 'accuracy' or 'acc', we convert this to one
of tf.keras.metrics.BinaryAccuracy,
tf.keras.metrics.CategoricalAccuracy,
tf.keras.metrics.SparseCategoricalAccuracy based on the loss function
used and the model output shape
You should using tf.keras.metrics.MeanAbsoluteError, IoU(Intersection Over Union) or mAP(Mean Average Precision) instead

fine-tuning huggingface DistilBERT for multi-class classification on custom dataset yields weird output shape on prediction

I'm trying to fine-tune huggingface's implementation of distilbert for multi-class classification (100 classes) on a custom dataset following the tutorial at https://huggingface.co/transformers/custom_datasets.html.
I'm doing so using Tensorflow, and fine-tuning in native tensorflow, that is, I use the following part of the tutorial for dataset creation:
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
dict(test_encodings),
test_labels
))
And this one for fine-tuning:
from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
Everything seems to go fine with fine-tuning, but when I try to predict on the test dataset using model.predict(test_dataset) as argument (with 2000 examples), the model seems to yield one prediction per token rather than one prediction per sequence...
That is, instead of getting an output of shape (1, 2000, 100), I get an output of shape (1, 1024000, 100), where 1024000 is the number of test examples (2000) * the sequence length (512).
Any hint on what's going on here?
(Sorry if this is naive, I'm very new to tensorflow).
I had exactly the same problem. I do not know why it's happening, as it should by the right code by looking at the tutorial.
But for me it worked to create numpy arrays out of the train_encodings and pass them directly to the fit method instead of creating the Dataset.
x1 = np.array(list(dict(train_encodings).values()))[0]
x2 = np.array(list(dict(train_encodings).values()))[1]
model.fit([x1,x2], train_labels, epochs=20)

Isues with saving and loading tensorflow model which uses hugging face transformer model as its first layer

Hi I am having some serious problems saving and loading a tensorflow model which is combination of hugging face transformers + some custom layers to do classfication. I am using the latest Huggingface transformers tensorflow keras version. The idea is to extract features using distilbert and then run the features through CNN to do classification and extraction. I have got everything to work as far as getting the correct classifications.
The problem is in saving the model once trained and then loading the model again.
I am using tensorflow keras and tensorflow version 2.2
Following is the code to design the model, train it, evaluate it and then save and load it
bert_config = DistilBertConfig(dropout=0.2, attention_dropout=0.2, output_hidden_states=False)
bert_config.output_hidden_states = False
transformer_model = TFDistilBertModel.from_pretrained(DISTIL_BERT, config=bert_config)
input_ids_in = tf.keras.layers.Input(shape=(BERT_LENGTH,), name='input_token', dtype='int32')
input_masks_in = tf.keras.layers.Input(shape=(BERT_LENGTH,), name='masked_token', dtype='int32')
embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]
x = tf.keras.layers.Bidirectional(
tf.keras.layers.LSTM(50, return_sequences=True, dropout=0.1,
recurrent_dropout=0, recurrent_activation="sigmoid",
unroll=False, use_bias=True, activation="tanh"))(embedding_layer)
x = tf.keras.layers.GlobalMaxPool1D()(x)
outputs = []
# lots of code here to define the dense layers to generate the outputs
# .....
# .....
model = Model(inputs=[input_ids_in, input_masks_in], outputs=outputs)
for model_layer in model.layers[:3]:
logger.info(f"Setting layer {model_layer.name} to not trainable")
model_layer.trainable = False
rms_optimizer = RMSprop(learning_rate=0.001)
model.compile(loss=SigmoidFocalCrossEntropy(), optimizer=rms_optimizer)
# the code to fit the model (which works)
# then code to evaluate the model (which also works)
# finally saving the model. This too works.
tf.keras.models.save_model(model, save_url, overwrite=True, include_optimizer=True, save_format="tf")
However, when I try to load the saved model using the following
tf.keras.models.load_model(
path, custom_objects={"Addons>SigmoidFocalCrossEntropy": SigmoidFocalCrossEntropy})
I get the following load error
ValueError: The two structures don't have the same nested structure.
First structure: type=TensorSpec str=TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs')
Second structure: type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='inputs/input_ids')}
More specifically: Substructure "type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='inputs/input_ids')}" is a sequence, while substructure "type=TensorSpec str=TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs')" is not
Entire first structure:
.
Entire second structure:
{'input_ids': .}
I believe the issue is because TFDistilBertModel layer can be called using a dictionary input from DistilBertTokenizer.encode() and that happens to be the first layer. So the model compiler on load expects that to be the input signature to the call model. However, the inputs defined to the model are two tensors of shape (None, 128)
So how do I tell the load function or the save function to assume the correct signatures?
I solved the issue.
The issue was the object transformer_model in the above code is itself not a layer. So if we want to embed it inside another keras layer we should use the internal keras layer that is wrapped in the model
So changing the line
embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in[0]
to
embedding_layer = transformer_model.distilbert([input_ids_in, input_masks_in])[0]
makes everything work. Hope this helps someone else. Took a long time to debug through tf.keras code to figure this one out although in hindsight it is obvious. :)
I suffered the same problem, casually, yesterday. My solution is very similar to yours, I supposed that the problem was due to how tensorflow keras processes custom models so, the idea was to use the layers of the custom model inside my model. This has the advantage of not calling explicitly the layer by its name (in my case, it is useful for easy building more generic models using different pretrained encoders):
sent_encoder = getattr(transformers, self.model_name).from_pretrained(self.shortcut_weights).layers[0]
I don't explored all the models of HuggingFace, but a few that I tested seem to be a custom model with only one custom layer.
Your solution also works like a charm, in fact, both solutions are the same if "distilbert" references to ".layers[0]".

Keras models in tensorflow

I'm building image processing network in tensorflow and I want to make use of texture loss. Texture loss seems simple to implement if you have pretrained model loaded.
I'm using TF to build the computational graph for my model and I want to incorporate Keras.application.VGG19 model to get output from layer 'block4_conv4'.
The problem is: I have two TF tensors target and result from my main model, how to feed them into keras VGG19 in the same session to compute their diff and use it in main loss for my model?
It seems following code does the trick
with tf.variable_scope("") as scope:
phi_func = VGG19(include_top=False, weights=None, input_shape=(128, 128, 3))
text_1 = phi_func(predicted)
scope.reuse_variables()
text_2 = phi_func(x)
text_loss = tf.reduce_mean((text_1 - text_2)**2)
right after session created I call phi_func.load_weights(path) to initiate weights

Making predictions with a TensorFlow model

I followed the given mnist tutorials and was able to train a model and evaluate its accuracy. However, the tutorials don't show how to make predictions given a model. I'm not interested in accuracy, I just want to use the model to predict a new example and in the output see all the results (labels), each with its assigned score (sorted or not).
In the "Deep MNIST for Experts" example, see this line:
We can now implement our regression model. It only takes one line! We
multiply the vectorized input images x by the weight matrix W, add the
bias b, and compute the softmax probabilities that are assigned to
each class.
y = tf.nn.softmax(tf.matmul(x,W) + b)
Just pull on node y and you'll have what you want.
feed_dict = {x: [your_image]}
classification = tf.run(y, feed_dict)
print classification
This applies to just about any model you create - you'll have computed the prediction probabilities as one of the last steps before computing the loss.
As #dga suggested, you need to run your new instance of the data though your already predicted model.
Here is an example:
Assume you went though the first tutorial and calculated the accuracy of your model (the model is this: y = tf.nn.softmax(tf.matmul(x, W) + b)). Now you grab your model and apply the new data point to it. In the following code I calculate the vector, getting the position of the maximum value. Show the image and print that maximum position.
from matplotlib import pyplot as plt
from random import randint
num = randint(0, mnist.test.images.shape[0])
img = mnist.test.images[num]
classification = sess.run(tf.argmax(y, 1), feed_dict={x: [img]})
plt.imshow(img.reshape(28, 28), cmap=plt.cm.binary)
plt.show()
print 'NN predicted', classification[0]
2.0 Compatible Answer: Suppose you have built a Keras Model as shown below:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Then Train and Evaluate the Model using the below code:
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
After that, if you want to predict the class of a particular image, you can do it using the below code:
predictions_single = model.predict(img)
If you want to predict the classes of a set of Images, you can use the below code:
predictions = model.predict(new_images)
where new_images is an Array of Images.
For more information, refer this Tensorflow Tutorial.
The question is specifically about the Google MNIST tutorial, which defines a predictor but doesn't apply it. Using guidance from Jonathan Hui's TensorFlow Estimator blog post, here is code which exactly fits the Google tutorial and does predictions:
from matplotlib import pyplot as plt
images = mnist.test.images[0:10]
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x":images},
num_epochs=1,
shuffle=False)
mnist_classifier.predict(input_fn=predict_input_fn)
for image,p in zip(images,mnist_classifier.predict(input_fn=predict_input_fn)):
print(np.argmax(p['probabilities']))
plt.imshow(image.reshape(28, 28), cmap=plt.cm.binary)
plt.show()