Incresaing the image input resolution for a pretrained Resnet model in CNTK - cntk

I am using CNTK's model edit language to load a pre-trained Resnet model, add a new last layer, and refine the model on some new dataset.
I would also like to change the network architecture to accept higher resolution images as input (which is possible since my net is fully convolutional without the last fc layer). Does anyone know how to do that / of a relevant link?
Thanks!
ps. This is how my current .mel file looks like (for an AlexNet):
#load pre-trained model
model1 = LoadModel("$OrigModel$") #, format=cntk)
SetDefaultModel(model1)
#parameters from original ndl file
cMap4 = 512
fcWScale = 1.13
fcBValue = 0
labelDimNew=10
#add new final layer
newL = DnnLayer(cMap4, $labelDimNew$, pool5, fcWScale, fcBValue)
labelsNew = Input($labelDimNew$, tag="label")
SetInput(ce, 0, labelsNew)
SetInput(ce, 1, newL.z)
SetInput(err, 0, labelsNew)
SetInput(err, 1, newL.z)
SetProperty(newL.z, "output", "true")
#remove old final layer (note: make sure these deletes happen in reverse order)
DeleteNode(OutputNodes.z)
DeleteNode(OutputNodes.t)
DeleteNode(OutputNodes.b)
DeleteNode(OutputNodes.W)
DeleteNode(labels)
#rename nodes to have same name as before. this might not be necessary.
Rename(labelsNew, labels)
Rename(newL.*, OutputNodes.*)

One option would be to have an adapter function that takes the higher res function and plug its output to the input of your pre-trained model. In python it should be straightforward. I am not sure about how to extend that to .mel though.

Related

Why is FinBert with Tensorflow showing different predictions on local computer vs on HuggingFace's web interface?

To set the context, if i go to : https://huggingface.co/ProsusAI/finbert and input the following sentence on their hosted API form
Stocks rallied and the British pound gained.
I get the sentiment as 89.8% positive,6.7% neutral and the rest negative, which is as one would expect.
However if I download the tensorflow version of the model from :https://huggingface.co/ProsusAI/finbert/tree/main along with the respective Json files, and it run it locally I get the output as
array([[0.2945392 , 0.4717328 , 0.23372805]] which corresponds to a ~ 30% positive sentiment.
The code i am using locally is as follows ( modfin is the local folder where i have stored the t5_model.h5 alongwith the other files)
model = TFAutoModelForSequenceClassification.from_pretrained("C:/Users/Downloads/modfin",config="C:/Users/Downloads/modfin/config.json",num_labels=3)
tokenizer = AutoTokenizer.from_pretrained("C:/Users/Downloads/modfin",config="C:/Users/Downloads/modfin/tokenizer_config.json")
inputs = tokenizer(sentences, padding = True, truncation = True, return_tensors='tf')
outputs = model(**inputs)
nn.softmax(outputs[0]).numpy()
for the model I also get a warning as follows
All model checkpoint layers were used when initializing TFBertForSequenceClassification.
Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at C:/Users/Downloads/modfin and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
which is strange since I would assume a pre-trained model such as finbert should already be fine-tuned. when i replace TFAutoModelForSequenceClassification with TFAutoModel i see that the transofrmer that is chosen automatically is 'TFBertModel' whose output is 10 X768 tensor which i am not able to interpret into sentiment classes. any help here would be greatly appreciated

Switch between the heads of a model during inference

I have 200 neural networks which I trained using transfer learning on text. They all share the same weights except for their heads which are trained on different tasks. Is it possible to merge those networks into a single model to use with Tensorflow such that when I call it with input (text, i) it returns me the prediction task i. The idea here is to only store the shared weights once to save on model size and also to only evaluate the head of the task we want to predict in order to save on computations. The important bit is to wrap all of that into a Tensorflow model as I want to make it easier to serve it on google-ai-platform .
Note: It is fine to train all the heads independently, I just want to put all of them together into a single model for the inference part
You probably have a model like the following:
# Create the model
inputs = Input(shape=(height, width, channels), name='data')
x = layers.Conv2D(...)(inputs)
# ...
x = layers.GlobalAveragePooling2D(name='penultimate_layer')(x)
x = layers.Dense(num_class, name='task0', ...)(x)
model = models.Model(inputs=inputs, outputs=[x])
Until now the model only has one output. You can add multiple outputs at model creation, or later on. You can add a new head like this:
last_layer = model.get_layer('penultimate_layer').output
output_heads = []
taskID = 0
while True:
try:
head = model.get_layer("task"+str(taskID))
output_heads.append(head.output)
taskID += 1
except:
break
# add new head
new_head = layers.Dense(num_class, name='task'+str(taskID), ...)(last_layer)
output_heads.append(new_head)
model = models.Model(inputs=model.input, outputs=output_heads)
Now since every head has a name you can load your specific weights, calling the head by name. The weights to load are the weights of the last layer of (an)other_model. You should have something like this:
model.get_layer("task0").set_weights(other_model.layers[-1].get_weights())
When you want to obtain predictions, all you need to know is the task ID of the head you want to look at:
taskID=0 # obtain predictions from head 0
outputs = model(test_data, training=False)
predictions = outputs[taskID]
If you want to train new heads later on, while still sharing the same backbone, you just have to freeze the other heads, otherwise even those will be trained, and you don't want that:
for layer in model.layers:
if "task" in layer.name:
layer.trainable = False
# code to add the new head ...
Training new tasks, so a new set of classes, in a later moment is called task-incremental learning. The major issue with this is catastrophic forgetting: it is pretty easy to still forget prior knowledge while training new tasks. Even if the heads are frozen, the backbone obviously isn't. If you do this you'll have to apply some technique to avoid this.

OpenVino converted model not returning same score values as original model (Sigmoid)

I've converted a Keras model for use with OpenVino. The original Keras model used sigmoid to return scores ranging from 0 to 1 for binary classification. After converting the model for use with OpenVino, the scores are all near 0.99 for both classes but seem slightly lower for one of the classes.
For example, test1.jpg and test2.jpg (from opposite classes) yield scores of 0.00320357 and 0.9999, respectively.
With OpenVino, the same images yield scores of 0.9998982 and 0.9962392, respectively.
Edit* One suspicion is that the input array is still accepted by the OpenVino model but is somehow changed in shape or "scrambled" and therefore is never a match for class one? In other words, if you fed it random noise, the score would also always be 0.9999. Maybe I'd have to somehow get the OpenVino model to accept the original shape (1,180,180,3) instead of (1,3,180,180) so I don't have to force the input into a different shape than the one the original model accepted? That's weird though because I specified the shape when making the xml and bin for openvino:
python3 /opt/intel/openvino_2021/deployment_tools/model_optimizer/mo_tf.py --saved_model_dir /Users/.../Desktop/.../model13 --output_dir /Users/.../Desktop/... --input_shape=\[1,180,180,3]
However, I know from error messages that the inference engine is expecting (1,3,180,180) for some unknown reason. Could that be the problem? The other suspicion is something wrong with how the original model was frozen. I'm exploring different ways to freeze the original model (keras model converted to pb) in case the problem is related to that.
I checked to make sure the Sigmoid activation function is being used in the OpenVino implementation (same activation as the Keras model) and it looks like it is. Why, then, are the values not the same? Any help would be much appreciated.
The code for the OpenVino inference is:
import openvino
from openvino.inference_engine import IECore, IENetwork
from skimage import io
import sys
import numpy as np
import os
def loadNetwork(model_xml, model_bin):
ie = IECore()
network = ie.read_network(model=model_xml, weights=model_bin)
input_placeholder_key = list(network.input_info)[0]
input_placeholder = network.input_info[input_placeholder_key]
output_placeholder_key = list(network.outputs)[0]
output_placeholder = network.outputs[output_placeholder_key]
return network, input_placeholder_key, output_placeholder_key
batch_size = 1
channels = 3
IMG_HEIGHT = 180
IMG_WIDTH = 180
#loadNetwork('saved_model.xml','saved_model.bin')
image_path = 'test.jpg'
def load_source(path_to_image):
image = io.imread(path_to_image)
img = np.resize(image,(180,180))
return img
img_new = load_source('test2.jpg')
#Batch?
def classify(image):
device = 'CPU'
network, input_placeholder_key, output_placeholder_key = loadNetwork('saved_model.xml','saved_model.bin')
ie = IECore()
exec_net = ie.load_network(network=network, device_name=device)
res = exec_net.infer(inputs={input_placeholder_key: image})
print(res)
res = res[output_placeholder_key]
return res
result = classify(img_new)
print(result)
result = result[0]
top_result = np.argmax(result)
print(top_result)
print(result[top_result])
And the result:
{'StatefulPartitionedCall/model/dense/Sigmoid': array([[0.9962392]], dtype=float32)}
[[0.9962392]]
0
0.9962392
Generally, Tensorflow is the only network with the shape NHWC while most others use NCHW. Thus, the OpenVINO Inference Engine satisfies the majority of networks and uses the NCHW layout. Model must be converted to NCHW layout in order to work with Inference Engine.
The conversion of the native model format into IR involves the process where the Model Optimizer performs the necessary transformation to convert the shape to the layout required by the Inference Engine (N,C,H,W). Using the --input_shape parameter with the correct input shape of the model should suffice.
Besides, most TensorFlow models are trained with images in RGB order. In this case, inference results using the Inference Engine samples may be incorrect. By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with --reverse_input_channels argument.
I suggest you validate this by inferring your model with the Hello Classification Python Sample instead since this is one of the official samples provided to test the model's functionality.
You may refer to this "Intel Math Kernel Library for Deep Neural Network" for deeper explanation regarding the input shape.

How to do fine-tuning in tensorflow with notop layers and define my own input image size

There are many examples about how to do fine-tuning with tensorflow. Almost all these examples are try to resize our images to the specified size that the existing model needs. Like for example, 224×224 is the input size that vgg19 needs. However, in keras, we can change the input size by setting the include_top to false:
base_model = VGG19(include_top=False, weights="imagenet", input_shape=(input_size, input_size, input_channels))
Then we do not have to fix the image size to be 224×224 anymore. Can we do such kind of fine-tuning by using official pre-trained models in tensorflow? I cannot find the solutions up till now, anyone help me?
Yes, it is possible to do this kind of fine-tuning. You would just have to ensure that you also fine-tune some of the first few layers (to account for changed input) of the original network in addition to the last few layers (to account for changed output).
I work with TensorFlow using Keras. If you are open to that, then there is a code snippet that shows the general fine-tuning flow here:
https://keras.io/applications/
Specifically, I had to write the following code to make it work for my case:
#img_width,img_height is the size of your new input, 3 is the number of channels
input_tensor = Input(shape=(img_width, img_height, 3))
base_model =
keras.applications.vgg19.VGG19(include_top=False,weights='imagenet', input_tensor=input_tensor)
#instantiate whatever other layers you need
model = Model(inputs=base_model.inputs, outputs=predictions)
#predictions is the new logistic layer added to account for new classes
Hope this helps.

Use inception v3 with batch of images in tensorflow

In one of my project of computer vision, I use public pre-trained inception-v3 available here: http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz. This network is at the beginning of my classification chain (a lot of stuff is performed on logits produced by the network). I would like to feed this network with a batch of images (instead of sequentially processing images) in order to make it faster.
However, the provided network had been "frozen", and it can only process one image at a time.
Is there any solution to "unfreeze" a graph and adapt it so that I can use it on batch of images?
(N.B : I found related topics on the internet, but they all suggest to take a more recent network available for instance here :
http://download.tensorflow.org/models/image/imagenet/inception-v3-2016-03-01.tar.gz. This is not what I would like to do since a lot of stuff has been tuned on the output of the frozen model.)
Not sure if this is too late, but here is the code snippet that I used:
# First loan model into and old graph
proto_file = ... # downloaded inception protofile
graph_def = tf.GraphDef.FromString(open(proto_file, 'rb').read())
to_delete = {“DecodeJpeg", "Cast", "ExpandDims", "pool_3/_reshape", "softmax"}
graph_def = delete_ops_from_graph(graph_def, to_delete)
new_graph = tf.Graph()
with new_graph.as_default():
x = tf.placeholder(tf.uint8, [None, None, None, 3], name="batched_inputs")
x_cast = tf.cast(x, dtype=tf.float32)
y = tf.import_graph_def(graph_def, input_map={"ExpandDims:0": x_cast}, return_elements=["pool_3:0"],name="")
...
Now new_graph is the graph that has batch dimension (takes in 4d tensor NHWC). Note that this is good if you want to use inception-2015-12-05.tgz as a feature extractor. You would need to take the output from output = new_graph.get_tensor_by_name("pool_3:0")
For the definition of delete_ops_from_graph, see Tensorflow: delete nodes from graph