Do we have to transform input before evaluation using CNTK C++ Library? - cntk

I've trained cntk model and now I want to evaluate it in my c++ code using CNTKLibrary. Configuration for training had this line:
featNorm = features - 128
p1 = MyLayer (featNorm, 32, 0.1557/256)
Does this mean that I have to deduct 128 from pixel values when I create NDArrayView to pass as input data for evaluation, or it is written somewhere in model and CNTKLibrary will normalize input data itself?

Because the normalization was done as part of the graph, you don't need to subtract 128 in your C++ code.

Related

OpenVino converted model not returning same score values as original model (Sigmoid)

I've converted a Keras model for use with OpenVino. The original Keras model used sigmoid to return scores ranging from 0 to 1 for binary classification. After converting the model for use with OpenVino, the scores are all near 0.99 for both classes but seem slightly lower for one of the classes.
For example, test1.jpg and test2.jpg (from opposite classes) yield scores of 0.00320357 and 0.9999, respectively.
With OpenVino, the same images yield scores of 0.9998982 and 0.9962392, respectively.
Edit* One suspicion is that the input array is still accepted by the OpenVino model but is somehow changed in shape or "scrambled" and therefore is never a match for class one? In other words, if you fed it random noise, the score would also always be 0.9999. Maybe I'd have to somehow get the OpenVino model to accept the original shape (1,180,180,3) instead of (1,3,180,180) so I don't have to force the input into a different shape than the one the original model accepted? That's weird though because I specified the shape when making the xml and bin for openvino:
python3 /opt/intel/openvino_2021/deployment_tools/model_optimizer/mo_tf.py --saved_model_dir /Users/.../Desktop/.../model13 --output_dir /Users/.../Desktop/... --input_shape=\[1,180,180,3]
However, I know from error messages that the inference engine is expecting (1,3,180,180) for some unknown reason. Could that be the problem? The other suspicion is something wrong with how the original model was frozen. I'm exploring different ways to freeze the original model (keras model converted to pb) in case the problem is related to that.
I checked to make sure the Sigmoid activation function is being used in the OpenVino implementation (same activation as the Keras model) and it looks like it is. Why, then, are the values not the same? Any help would be much appreciated.
The code for the OpenVino inference is:
import openvino
from openvino.inference_engine import IECore, IENetwork
from skimage import io
import sys
import numpy as np
import os
def loadNetwork(model_xml, model_bin):
ie = IECore()
network = ie.read_network(model=model_xml, weights=model_bin)
input_placeholder_key = list(network.input_info)[0]
input_placeholder = network.input_info[input_placeholder_key]
output_placeholder_key = list(network.outputs)[0]
output_placeholder = network.outputs[output_placeholder_key]
return network, input_placeholder_key, output_placeholder_key
batch_size = 1
channels = 3
IMG_HEIGHT = 180
IMG_WIDTH = 180
#loadNetwork('saved_model.xml','saved_model.bin')
image_path = 'test.jpg'
def load_source(path_to_image):
image = io.imread(path_to_image)
img = np.resize(image,(180,180))
return img
img_new = load_source('test2.jpg')
#Batch?
def classify(image):
device = 'CPU'
network, input_placeholder_key, output_placeholder_key = loadNetwork('saved_model.xml','saved_model.bin')
ie = IECore()
exec_net = ie.load_network(network=network, device_name=device)
res = exec_net.infer(inputs={input_placeholder_key: image})
print(res)
res = res[output_placeholder_key]
return res
result = classify(img_new)
print(result)
result = result[0]
top_result = np.argmax(result)
print(top_result)
print(result[top_result])
And the result:
{'StatefulPartitionedCall/model/dense/Sigmoid': array([[0.9962392]], dtype=float32)}
[[0.9962392]]
0
0.9962392
Generally, Tensorflow is the only network with the shape NHWC while most others use NCHW. Thus, the OpenVINO Inference Engine satisfies the majority of networks and uses the NCHW layout. Model must be converted to NCHW layout in order to work with Inference Engine.
The conversion of the native model format into IR involves the process where the Model Optimizer performs the necessary transformation to convert the shape to the layout required by the Inference Engine (N,C,H,W). Using the --input_shape parameter with the correct input shape of the model should suffice.
Besides, most TensorFlow models are trained with images in RGB order. In this case, inference results using the Inference Engine samples may be incorrect. By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with --reverse_input_channels argument.
I suggest you validate this by inferring your model with the Hello Classification Python Sample instead since this is one of the official samples provided to test the model's functionality.
You may refer to this "Intel Math Kernel Library for Deep Neural Network" for deeper explanation regarding the input shape.

batch_size and max_time in LSTM Tensorflow

Background:
I am trying to model multi-layered LSTM in tensorflow. I am using a common function to unroll the LSTM:
tf.nn.dynamic_rnn
Here I am using time_major = True, so my data has to be of format [max_time, batch_size, depth].
According to my understanding max_time is the time step of the series. My input is [224], I am passing into FC layer in the beginning to make it to size of labels
Question:
I am using datapipeline to get the labels in batch(32) of one_hot vector(length=70).
70 length vector corresponds to 1 time-step.
So how can I make the input to [32, 32 ,70]? Presently I have [32,70](batch_size,num_classes).
Please correct my understandings if wrong.
Can I just pass [1,32,70], so that tensorflow detects on its own, that each batch is of time step 1?

Tensorflow Lite for variable sized input

I have a model much like the tensorflow speech command demo except it takes a variable sized 1D array as input. Now I find it difficult to convert this model to TF lite using tflite_convert which requires input_shape for input.
It's said that tf lite requires fixed size input for efficiency and you can resize input during inference as part of your model. However, I think it would involve truncating the input which I don't want. Is there any way to make this work with TF lite?
You can convert your model using a fixed shape as in --input_shape=64, then at inference-time you would do:
interpreter->ResizeInputTensor(interpreter->inputs()[0], {128});
interpreter->AllocateTensors();
// ... populate your input tensors with 128 entries ...
interpreter->Invoke();
// ... read your output tensor ...

Semantic Segmentation with Encoder-Decoder CNNs

Appologizes for misuse of technical terms.
I am working on a project of semantic segmentation via CNNs ; trying to implement an architecture of type Encoder-Decoder, therefore output is the same size as the input.
How do you design the labels ?
What loss function should one apply ? Especially in the situation of heavy class inbalance (but the ratio between the classes is variable from image to image).
The problem deals with two classes (objects of interest and background). I am using Keras with tensorflow backend.
So far, I am going with designing expected outputs to be the same dimensions as the input images, applying pixel-wise labeling. Final layer of model has either softmax activation (for 2 classes), or sigmoid activation ( to express probability that the pixels belong to the objects class). I am having trouble with designing a suitable objective function for such a task, of type:
function(y_pred,y_true),
in agreement with Keras.
Please,try to be specific with the dimensions of tensors involved (input/output of the model). Any thoughts and suggestions are much appreciated. Thank you !
Actually when you use a TensorFlow backend you could simply apply a predefined Keras objectives in a following manner:
output = Convolution2D(number_of_classes, # 1 for binary case
filter_height,
filter_width,
activation = "softmax")(input_to_output) # or "sigmoid" for binary
...
model.compile(loss = "categorical_crossentropy", ...) # or "binary_crossentropy" for binary
And then feed either a one-hot encoded feature map or matrix of shape (image_height, image_width) with integer encoded classes (remember than in this case you should use sparse_categorical_crossentropy as a loss).
To deal with a class inbalance (I guess it's beacuse of a backgroud class) I strongly recommend you to read carefully answers to this Stack Overflow question.
I suggest starting with a base architecture used in practice like this one in nerve-segmentation: https://github.com/EdwardTyantov/ultrasound-nerve-segmentation. Here a dice_loss is used as a loss function. This works very well for a two class problem as has been shown in literature: https://arxiv.org/pdf/1608.04117.pdf.
Another loss function that has been widely used is cross entropy for such a problem. For problems like yours most commonly long and short skip connections are deployed to stabilize training as denoted in the paper above.
Two ways :
You could try 'flattening':
model.add(Reshape(NUM_CLASSES,HEIGHT*WIDTH)) #shape : HEIGHT x WIDTH x NUM_CLASSES
model.add(Permute(2,1)) # now itll be NUM_CLASSES x HEIGHT x WIDTH
#Use some activation here- model.activation()
#You can use Global averaging or Softmax
One hot encoding every pixel:
In this case your final layer should Upsample/Unpool/Deconvolve to HEIGHT x WIDTH x CLASSES. So your output is essentially of the shape: (HEIGHT,WIDTH,NUM_CLASSES).

What is the best way to run saved model with different batch size in TensorFlow?

I trained Cifar10 example model from TensorFlow's repository with batch_size 128 and it worked fine. Then I froze graph and managed to run it with C++ just like they do it in their C++ label image example.
The only problem was that I had to artificially generate tensor of shape [128, image_height, image_width, channels] to classify single image with C++ because saved model expects input of 128 samples in a batch since that is number of samples that comes from queue.
I tried training Cifar10 example with batch_size = 1 and then I managed to classify examples one by one when I run model with C++, but that doesn't seem like a great solution. I also tried manually changing tensor shapes in saved graph file but it didn't work.
My question is what is the best way to train model with fixed batch size (like 32, 64, 128 etc.) and then save model so that it can be used with batch size of arbitrary length? If that's not possible, then how to save model to be able to classify samples one by one.
It sounds like the problem is that TensorFlow is "baking in" the batch size to other tensors in the graph (e.g. if the graph contains tf.shape(t) for some tensor t whose shape depends on the batch size, the batch size might be stored in the graph as a constant). The solution is to change your program slightly so that tf.train.batch() returns tensors with a variable batch size.
The tf.train.batch() method accepts a tf.Tensor for the batch_size argument. Perhaps the simplest way to modify your program for variable-sized batches would be to define a placeholder for the batch size:
# Define a scalar tensor for the batch size, so that you can alter it at
# Session.run()-time.
batch_size_tensor = tf.placeholder(tf.int32, shape=[])
input_tensors = tf.train.batch(..., batch_size=batch_size_tensor, ...)
This would prevent the batch size from being baked into your GraphDef, so you should be able to feed values of any batch size in C++. However, this modification would require you to feed a value for the batch size on every step, which is slightly tedious.
Assuming that you always want to train with batch size 128, but retain the flexibility to change the batch size later, you could use a tf.placeholder_with_default() to specify that the batch size should be 128 when you don't feed an alternative value:
# Define a scalar tensor for the batch size, so that you can alter it at
# Session.run()-time.
batch_size_tensor = tf.placeholder_with_default(128, shape=[])
input_tensors = tf.train.batch(..., batch_size=batch_size_tensor, ...)
Is there a reason you need fixed batch size in the graph?
I think a good way is to build a graph with a variable batch size - by putting None as the first dimension. During training, you can then pass the batch size flag to your data provider, so it feeds the desired amount of data in each iteration.
After the model is trained, you can export the graph using tf.train.Saver(), which exports the metagraph. To do inference, you can load the exported files and just evaluate with any number of examples - also just one.
Note, this is different from the frozen graph.