MobileNetV2 preprocessing inconsistence on TF 2.2 - tensorflow

In this segmentation tutorial, the preprocessing normalizes image values into [0, 1].
However, according to the document of MobileNetV2 (https://www.tensorflow.org/api_docs/python/tf/keras/applications/mobilenet_v2/preprocess_input), the preprocess step normalizes data to the interval [-1, 1].
Which preprocessing is the correct one and why?

If you want to train your own network from scratch, you can apply any normalization you see fit, or even no normalization at all, it's your choice!
If, instead, you want to reuse the pretrained models (e.g.: by setting weights='imagenet' in the definition of MobileNetV2), then you should use the specific preprocessing in https://www.tensorflow.org/api_docs/python/tf/keras/applications/mobilenet_v2/preprocess_input, since this model has been trained with this specific preprocessing (normalization to [-1, 1]).
Although you should, you could also treat the MobileNetV2 pretrained model as a static blackbox transformation, and plug in any normalization you want. The downside: you could almost surely make better use of this black box by applying the standard normalization.

Related

Should I delete last 7 layers of VGG16 as I am going to use it as a pretrained model for a signature verification task?

As far as I know, cnn's last layers identify objects as a whole, this is irrelevant to the dataset with signatures. Thus, I want to remove them and add additional layers on top of the model, freezing the VGG16 from training. How would the removal of layers potentially affect the model's performance, or should I just leave and delete only dense layers?
I need to add additional layers on top anyway for the school report about the effect of convolutional layers' configurations on the model's performance.
p.s my dataset is really small it contains nearly 700 samples, which is extremely small n i know that(i tried augmenting data)
I have a dataset with Chinese signatures, but I thought that it is better to train it separately//
I am not proficient in this field and I started my acquaintance from deep learning, so pls correct me if you noticed any misconception in my explanation?/
Easiest way is to use VGG with include_top=False, weights='imagenet, and set pooling = max. This will instantiate the model with imagenet weights, the top classification layer is removed and the output of the VGG model is a flat vector you can feed directly into a dense layer. My typical code for this is shown below. In the final layer class_count is the number of classes in the training data.
base_model=tf.keras.applications.VGG16(include_top=False, weights="imagenet",input_shape=img_shape, pooling='max')
x=base_model.output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x = Dense(256, kernel_regularizer = regularizers.l2(l = 0.016),activity_regularizer=regularizers.l1(0.006),
bias_regularizer=regularizers.l1(0.006) ,activation='relu')(x)
x=Dropout(rate=.45, seed=123)(x)
output=Dense(class_count, activation='softmax')(x)
model=Model(inputs=base_model.input, outputs=output)
How would the removal of layers potentially affect the model's performance, or should I just leave and delete only dense layers?
This is hard to answer because what performance are you talking about? VGG16 originally were build to Imagenet problem with 1000 classes, so if you use it without any modifications probably won't work at all.
Now, if you are talking about transfer learning, so yes, the last dense layers could be replaced to classify your dataset, because the model created with cnn layers in VGG16 is a good pattern recognizer. The fully connected layers at the end work as a classifier for this patterns and you should replace it and train it again for your specific problem. VGG16 has 3 dense layers (FC1, FC2 and FC3) at end, keras only allow you to remove all three, so if you want replace just the last one, you will need to remove all three and rebuild the FC1 and FC2.
The key is what you are going to train after that, you could:
Use original weights (imagenet) in cnn layers and start you trainning from that, just finetunning with a small learning rate. A good choice when you dataset is similar to original and you have a good amount of it.
Use original weights (imagenet) in cnn layers, but freeze them, and just training the weights in the dense layers you replaced. A good choice when your dataset is small.
Don't use the original weights and retrain all the model. Usually not a good choice, because you will need to be an expert to tunning the parameters, tons of data and computacional power to make it work.

Purpose of pooling layer after text embedding layer

I'm following the tutorial on the tensorflow site (https://www.tensorflow.org/tutorials/text/word_embeddings#create_a_simple_model) to learn word embeddings, and a confusion that I have is about the purpose of having a Globalaveragepooling layer right after the embedding layer as follows:
model = keras.Sequential([
layers.Embedding(encoder.vocab_size, embedding_dim),
layers.GlobalAveragePooling1D(),
layers.Dense(16, activation='relu'),
layers.Dense(1)
])
I understand what pooling means and how it's done. If someone can explain why we need a pooling layer, and what would change if we didn't use it, I'd appreciate it.
The purpose of this tutorial is to get you to understand word-embeddings through a simple toy task: binary sentiment analysis.
To start with, they make you code a simple model: take the average of all embeddings in a sentence and add a feed-forward neural net to classify this aggregated input. GlobalAveragePooling1D does this averaging.
Obviously in the real world you'd want to use more complex models as RNNs, LSTMs, bidirectional models, atrous-convolution-based models or Transformers but that's not the point in this tutorial.
The "simple model" they mention being a feed-forward neural net, it expects a fixed input dimension so when you have sequential data of variable length you need to address this somehow: averaging, padding, cropping etc. Here they average with this GlobalAveragePooling1D layer

Tensorflow embeddings

I know what embeddings are and how they are trained. Precisely, while referring to the tensorflow's documentation, I came across two different articles. I wish to know what exactly is the difference between them.
link 1: Tensorflow | Vector Representations of words
In the first tutorial, they have explicitly trained embeddings on a specific dataset. There is a distinct session run to train those embeddings. I can then later on save the learnt embeddings as a numpy object and use the
tf.nn.embedding_lookup() function while training an LSTM network.
link 2: Tensorflow | Embeddings
In this second article however, I couldn't understand what is happening.
word_embeddings = tf.get_variable(“word_embeddings”,
[vocabulary_size, embedding_size])
embedded_word_ids = tf.gather(word_embeddings, word_ids)
This is given under the training embeddings sections. My doubt is: does the gather function train the embeddings automatically? I am not sure since this op ran very fast on my pc.
Generally: What is the right way to convert words into vectors (link1 or link2) in tensorflow for training a seq2seq model? Also, how to train the embeddings for a seq2seq dataset, since the data is in the form of separate sequences for my task unlike (a continuous sequence of words refer: link 1 dataset)
Alright! anyway, I have found the answer to this question and I am posting it so that others might benefit from it.
The first link is more of a tutorial that steps you through the process of exactly how the embeddings are learnt.
In practical cases, such as training seq2seq models or Any other encoder-decoder models, we use the second approach where the embedding matrix gets tuned appropriately while the model gets trained.

DeepLearning Anomaly Detection for images

I am still relatively new to the world of Deep Learning. I wanted to create a Deep Learning model (preferably using Tensorflow/Keras) for image anomaly detection. By anomaly detection I mean, essentially a OneClassSVM.
I have already tried sklearn's OneClassSVM using HOG features from the image. I was wondering if there is some example of how I can do this in deep learning. I looked up but couldn't find one single code piece that handles this case.
The way of doing this in Keras is with the KerasRegressor wrapper module (they wrap sci-kit learn's regressor interface). Useful information can also be found in the source code of that module. Basically you first have to define your Network Model, for example:
def simple_model():
#Input layer
data_in = Input(shape=(13,))
#First layer, fully connected, ReLU activation
layer_1 = Dense(13,activation='relu',kernel_initializer='normal')(data_in)
#second layer...etc
layer_2 = Dense(6,activation='relu',kernel_initializer='normal')(layer_1)
#Output, single node without activation
data_out = Dense(1, kernel_initializer='normal')(layer_2)
#Save and Compile model
model = Model(inputs=data_in, outputs=data_out)
#you may choose any loss or optimizer function, be careful which you chose
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Then, pass it to the KerasRegressor builder and fit with your data:
from keras.wrappers.scikit_learn import KerasRegressor
#chose your epochs and batches
regressor = KerasRegressor(build_fn=simple_model, nb_epoch=100, batch_size=64)
#fit with your data
regressor.fit(data, labels, epochs=100)
For which you can now do predictions or obtain its score:
p = regressor.predict(data_test) #obtain predicted value
score = regressor.score(data_test, labels_test) #obtain test score
In your case, as you need to detect anomalous images from the ones that are ok, one approach you can take is to train your regressor by passing anomalous images labeled 1 and images that are ok labeled 0.
This will make your model to return a value closer to 1 when the input is an anomalous image, enabling you to threshold the desired results. You can think of this output as its R^2 coefficient to the "Anomalous Model" you trained as 1 (perfect match).
Also, as you mentioned, Autoencoders are another way to do anomaly detection. For this I suggest you take a look at the Keras Blog post Building Autoencoders in Keras, where they explain in detail about the implementation of them with the Keras library.
It is worth noticing that Single-class classification is another way of saying Regression.
Classification tries to find a probability distribution among the N possible classes, and you usually pick the most probable class as the output (that is why most Classification Networks use Sigmoid activation on their output labels, as it has range [0, 1]). Its output is discrete/categorical.
Similarly, Regression tries to find the best model that represents your data, by minimizing the error or some other metric (like the well-known R^2 metric, or Coefficient of Determination). Its output is a real number/continuous (and the reason why most Regression Networks don't use activations on their outputs). I hope this helps, good luck with your coding.

Why slim.nets.vgg use conv2d instead of fully_connected layers?

In tensorflow/models/slim/nets, here is the link of relative snippets of vgg. I'am curious about why slim.nets.vgg use conv2d instead of fully_connected layers, although it works out the same way actually? Is it for the purpose of speed?
I appreciate some explanations. Thanks!
After a while, I think there is at least one reason that it can avoid weights converting mistakes.
Tensorflow/slim as well as other high-level libraries allows tensor formats either BHWC(batch_size, height, width, channel. Same below) by default or BCHW(for better performance).
When converting weights between these two formats, the weights [in_channel, out_channel] of first fc (fully connected layer, after conv layer) have to been reshaped to [last_conv_channel, height, width, out_channel] for example, then transposed to [height, width, last_conv_channel, out_channel] and reshaped again to [in_channel, out_channel].
And if you use conv weights rather than fully connected weights, such transformation will explicitly applied on fc layer (conv weights actually). Surely will avoid converting mistakes.