CNTK Transfer Learning with LSTM: appending pretrained network to another network - cntk

I have a pretrained Seq-to-Seq slot tagger network as which in its simplest form is follows:
Network_1 = Sequential ([
Embedding(emb_dim)
Recurrence(LSTM(LSTM_dim))
Dense(num_labels)
])
I would like to use the output of this as initial layers in another network. Basically I would like to concatenate the embeddings from the network_1 (pretrained) to an embedding layer in the network_2 as follows:
Network_2 = Sequential ([
Concat_embeddings ( Embedding(emb_dim), Network_1_embed() )
Recurrence(LSTM(LSTM_dim))
(Label('encoded_h'), Label('encoded_c'))
])
def Network_1_embed():
loaded_model = load_model(path_to_network_1_saved_model);
cloned_model = loaded_model.clone(CloneMethod.freeze);
return cloned_model
def Concat_embeddings(emb1, emb2):
X=Placeholder();
return splice(emb1(X), emb2(X))
This is giving me the following error
ValueError: Times: The 1 leading dimensions of the right operand with shape '[50360]' do not match the left operand's trailing dimensions with shape '[293]'
For reference, we get [293] since emb_dim=256, and num_network_1_labels=37, while [50360] is the vocabulary size of the network_2 input. The Network_1 also had the same vocabulary mapping when being trained, so it can take the same input, and output a 37 dimensional vector for each token.
How do I make this work?
Thanks

I think your problem is that you are using the entire Network_1 as the embedding, instead of just its embedding layer.
One way would be to define embed separately and train it through Network_1:
embed = Embedding(emb_dim)
Network_1 = Sequential ([
embed,
Recurrence(LSTM(LSTM_dim)),
Dense(num_labels)
])
Then train Network_1, but save embed:
embed.save(EMBED_PATH)
Explanation: Since Network_1 just invokes embed, they share parameters, so that training Network_1 will train embed's parameters. Saving embed then gives you the embedding layer trained by Network_1. Quite straight-forward, actually.
Then, to train your second model (in a second script), load embed from disk and just use it:
Network_1_embed = load_model(EMBED_PATH)
Network_2 = Sequential ([
( Embedding(emb_dim), Network_1_embed() ),
splice,
Recurrence(LSTM(LSTM_dim)),
(Label('encoded_h'), Label('encoded_c'))
])
Note the use of a function tuple as the first item passed to Sequential(). The tuple means to apply both functions to the same input, and generates two outputs, which are then the input to the subsequent function, splice.
To keep embed constant, clone it with Freeze option as you already did in your example.
(I am not in front of a computer with the latest CNTK and cannot test this, so it is possible that I made a mistake.)

Related

OpenVino converted model not returning same score values as original model (Sigmoid)

I've converted a Keras model for use with OpenVino. The original Keras model used sigmoid to return scores ranging from 0 to 1 for binary classification. After converting the model for use with OpenVino, the scores are all near 0.99 for both classes but seem slightly lower for one of the classes.
For example, test1.jpg and test2.jpg (from opposite classes) yield scores of 0.00320357 and 0.9999, respectively.
With OpenVino, the same images yield scores of 0.9998982 and 0.9962392, respectively.
Edit* One suspicion is that the input array is still accepted by the OpenVino model but is somehow changed in shape or "scrambled" and therefore is never a match for class one? In other words, if you fed it random noise, the score would also always be 0.9999. Maybe I'd have to somehow get the OpenVino model to accept the original shape (1,180,180,3) instead of (1,3,180,180) so I don't have to force the input into a different shape than the one the original model accepted? That's weird though because I specified the shape when making the xml and bin for openvino:
python3 /opt/intel/openvino_2021/deployment_tools/model_optimizer/mo_tf.py --saved_model_dir /Users/.../Desktop/.../model13 --output_dir /Users/.../Desktop/... --input_shape=\[1,180,180,3]
However, I know from error messages that the inference engine is expecting (1,3,180,180) for some unknown reason. Could that be the problem? The other suspicion is something wrong with how the original model was frozen. I'm exploring different ways to freeze the original model (keras model converted to pb) in case the problem is related to that.
I checked to make sure the Sigmoid activation function is being used in the OpenVino implementation (same activation as the Keras model) and it looks like it is. Why, then, are the values not the same? Any help would be much appreciated.
The code for the OpenVino inference is:
import openvino
from openvino.inference_engine import IECore, IENetwork
from skimage import io
import sys
import numpy as np
import os
def loadNetwork(model_xml, model_bin):
ie = IECore()
network = ie.read_network(model=model_xml, weights=model_bin)
input_placeholder_key = list(network.input_info)[0]
input_placeholder = network.input_info[input_placeholder_key]
output_placeholder_key = list(network.outputs)[0]
output_placeholder = network.outputs[output_placeholder_key]
return network, input_placeholder_key, output_placeholder_key
batch_size = 1
channels = 3
IMG_HEIGHT = 180
IMG_WIDTH = 180
#loadNetwork('saved_model.xml','saved_model.bin')
image_path = 'test.jpg'
def load_source(path_to_image):
image = io.imread(path_to_image)
img = np.resize(image,(180,180))
return img
img_new = load_source('test2.jpg')
#Batch?
def classify(image):
device = 'CPU'
network, input_placeholder_key, output_placeholder_key = loadNetwork('saved_model.xml','saved_model.bin')
ie = IECore()
exec_net = ie.load_network(network=network, device_name=device)
res = exec_net.infer(inputs={input_placeholder_key: image})
print(res)
res = res[output_placeholder_key]
return res
result = classify(img_new)
print(result)
result = result[0]
top_result = np.argmax(result)
print(top_result)
print(result[top_result])
And the result:
{'StatefulPartitionedCall/model/dense/Sigmoid': array([[0.9962392]], dtype=float32)}
[[0.9962392]]
0
0.9962392
Generally, Tensorflow is the only network with the shape NHWC while most others use NCHW. Thus, the OpenVINO Inference Engine satisfies the majority of networks and uses the NCHW layout. Model must be converted to NCHW layout in order to work with Inference Engine.
The conversion of the native model format into IR involves the process where the Model Optimizer performs the necessary transformation to convert the shape to the layout required by the Inference Engine (N,C,H,W). Using the --input_shape parameter with the correct input shape of the model should suffice.
Besides, most TensorFlow models are trained with images in RGB order. In this case, inference results using the Inference Engine samples may be incorrect. By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with --reverse_input_channels argument.
I suggest you validate this by inferring your model with the Hello Classification Python Sample instead since this is one of the official samples provided to test the model's functionality.
You may refer to this "Intel Math Kernel Library for Deep Neural Network" for deeper explanation regarding the input shape.

Keras: Custom loss function with training data not directly related to model

I am trying to convert my CNN written with tensorflow layers to use the keras api in tensorflow (I am using the keras api provided by TF 1.x), and am having issue writing a custom loss function, to train the model.
According to this guide, when defining a loss function it expects the arguments (y_true, y_pred)
https://www.tensorflow.org/guide/keras/train_and_evaluate#custom_losses
def basic_loss_function(y_true, y_pred):
return ...
However, in every example I have seen, y_true is somehow directly related to the model (in the simple case it is the output of the network). In my problem, this is not the case. How do implement this if my loss function depends on some training data that is unrelated to the tensors of the model?
To be concrete, here is my problem:
I am trying to learn an image embedding trained on pairs of images. My training data includes image pairs and annotations of matching points between the image pairs (image coordinates). The input feature is only the image pairs, and the network is trained in a siamese configuration.
I am able to implement this successfully with tensorflow layers and train it sucesfully with tensorflow estimators.
My current implementations builds a tf Dataset from a large database of tf Records, where the features is a dictionary containing the images and arrays of matching points. Before I could easily feed these arrays of image coordinates to the loss function, but here it is unclear how to do so.
There is a hack I often use that is to calculate the loss within the model, by means of Lambda layers. (When the loss is independent from the true data, for instance, and the model doesn't really have an output to be compared)
In a functional API model:
def loss_calc(x):
loss_input_1, loss_input_2 = x #arbirtray inputs, you choose
#according to what you gave to the Lambda layer
#here you use some external data that doesn't relate to the samples
externalData = K.constant(external_numpy_data)
#calculate the loss
return the loss
Using the outputs of the model itself (the tensor(s) that are used in your loss)
loss = Lambda(loss_calc)([model_output_1, model_output_2])
Create the model outputting the loss instead of the outputs:
model = Model(inputs, loss)
Create a dummy keras loss function for compilation:
def dummy_loss(y_true, y_pred):
return y_pred #where y_pred is the loss itself, the output of the model above
model.compile(loss = dummy_loss, ....)
Use any dummy array correctly sized regarding number of samples for training, it will be ignored:
model.fit(your_inputs, np.zeros((number_of_samples,)), ...)
Another way of doing it, is using a custom training loop.
This is much more work, though.
Although you're using TF1, you can still turn eager execution on at the very beginning of your code and do stuff like it's done in TF2. (tf.enable_eager_execution())
Follow the tutorial for custom training loops: https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough
Here, you calculate the gradients yourself, of any result regarding whatever you want. This means you don't need to follow Keras standards of training.
Finally, you can use the approach you suggested of model.add_loss.
In this case, you calculate the loss exaclty the same way I did in the first answer. And pass this loss tensor to add_loss.
You can probably compile a model with loss=None then (not sure), because you're going to use other losses, not the standard one.
In this case, your model's output will probably be None too, and you should fit with y=None.

Why can't I use my dataset anymore after using InceptionV3?

I'm currently working on video-captioning (frame-sequence to natural language).
I recently started using tf.data.Dataset class instead of feed_dict argument in tensorflow.
My goal is to feed this frames to a pretrained CNN (inceptionv3), extract the feature vector and then feed it to my RNN seq2seq network.
I've got a problem of tensorflow types after mapping my Dataset with the inception model: the dataset is then totally unusable, neither via dataset.batch() or dataset.take(). I can't even make a one shot iterator !
Here is how I proceed to build my Dataset:
Step 1: I first extract the same number of frames for every videos. I store all of it into a numpy array. Its shape is (nb_videos, nb_frames, width, height, channels)
Note that in this dataset, every video has the same size and has 3 color channels.
Step 2: Then I create a tf.data.Dataset object using this big numpy array
Note that printing this dataset via python gives:
With n_videos=2; width=240; height=320; channels=3
I already don't understand what "DataAdapter" stands for
At this point; I can create a one shot iterator but using dataset.batch(1) returns:
I don't understand why "?" and not "1" shape..
Step 3: I use the map function on dataset to resize all the frames of all the videos to 299*299*3 (required to use InceptionV3)
At this point, I can use the data in my dataset and make a one shot iterator.
Step 4: I use the map function again to extract every features using InceptionV3 pretrained model.
The problem occurs at this point:
Printing the dataset gives:
Ok looks good
However, it's now impossible to make a one shot iterator for this dataset
Step1 :
X_train_slice, Y_train = build_dataset(number_of_samples)
Step 2:
X_train = tf.data.Dataset.from_tensor_slices(X_train_slice)
Step 3:
def format_video(video):
frames = tf.image.resize_images(video, (299,299))
frames = tf.keras.applications.inception_v3.preprocess_input(frames)
return frames
X_train = X_train.map(lambda video: format_video(video))
Step 4:
Inception model:
image_model = tf.keras.applications.InceptionV3(include_top=False,
weights='imagenet')
new_input = image_model.input
hidden_layer = image_model.layers[-1].output
image_features_extract_model = tf.keras.Model(new_input, hidden_layer)
For the tf.reduce_mean; see how-to-get-pool3-features-of-inception-v3-model-using-keras (SO)
def extract_video_features(video):
batch_features = image_features_extract_model(video)
batch_features = tf.reduce_mean(batch_features, axis=(1, 2))
return batch_features
X_train = X_train.map(lambda video: extract_video_features(video))
Creating the iterator:
iterator = X_train.make_one_shot_iterator()
Here is the output:
ValueError: Failed to create a one-shot iterator for a dataset.
`Dataset.make_one_shot_iterator()` does not support datasets that capture
stateful objects, such as a `Variable` or `LookupTable`. In these cases, use
`Dataset.make_initializable_iterator()`. (Original error: Cannot capture a
stateful node (name:conv2d/kernel, type:VarHandleOp) by value.)
I don't really get it: it asks me to use a initializable_iterator but this kind of iterator is dedicated for placeholder. Here, I've got raw data !
You're using the pipelines wrong.
The idea of tf.data is to provide input pipelines to a model, not to contain the model itself. What you're trying to do it fit the model as a step of the pipeline (your step 4), but, as the error shows, this won't work.
What you should do instead is build the model as you are doing and then call model.predict on the input data, to obtain the features you want (as computed values). If you want to add further computation, add it in the model, since the predict call will run the model and return the values of the output layers.
Side note: image_features_extract_model = tf.keras.Model(new_input, hidden_layer) is completely irrelevant, given the choice you made for input and output tensors: the input is image_model's input and the output is image_model's output, so image_features_extract_model is identical to image_model.
The final code should be:
X_train_slice, Y_train = build_dataset(number_of_samples)
X_train = tf.data.Dataset.from_tensor_slices(X_train_slice)
def format_video(video):
frames = tf.image.resize_images(video, (299,299))
frames = tf.keras.applications.inception_v3.preprocess_input(frames)
return frames
X_train = X_train.map(lambda video: format_video(video))
image_model = tf.keras.applications.InceptionV3(include_top=False,
weights='imagenet')
bottlenecks = image_model.predict(X_train)
# Do something with your bottlenecks

Feeding individual examples into TensorFlow graph trained on files?

I'm new to TensorFlow and am getting a bit tripped up on the mechanics of reading data. I set up a TensorFlow graph on the mnist data, but I'd like to modify it so that I can run one program to train it + save the model out, and run another to load said graph, make predictions, and compute test accuracy.
Where I'm getting confused is how to bypass the original I/O system in the training graph and "inject" an image to predict or an (image, label) tuple of test data for accuracy testing. To read the training data, I'm using this code:
_, input_data = util.read_examples(
paths_to_files,
batch_size,
shuffle=shuffle,
num_epochs=None)
feature_map = {
'label': tf.FixedLenFeature(
shape=[], dtype=tf.int64, default_value=[-1]),
'image': tf.FixedLenFeature(
shape=[NUM_PIXELS * NUM_PIXELS], dtype=tf.int64),
}
example = tf.parse_example(input_data, features=feature_map)
I then feed example to a convolution layer, etc. and generate the output.
Now imagine that I train my graph with that code specifying the input, save out the graph and weights, and then restore the graph and weights in another script for prediction -- I'd like to take (say) 10 images and feed them to the graph to generate predictions. How do I "inject" those 10 images so that the predictions come out the other end?
I played around with feed dictionaries and placeholders, but I'm not sure if they're the right things for me to use... it seems like they rely on having data in memory, as opposed to reading from a queue of test data, for example.
Thanks!
A feed dictionary with placeholders would make sense if you wanted to perform a small number of inferences/evaluations (i.e. enough to fit in memory) - e.g. if you were serving a simple model or running small eval loops.
If you specifically want to infer or evaluate large batches then you should use the same approach you've used for training, but with a different path to your test/eval/live data. e.g.
_, eval_data = util.read_examples(
paths_to_files, # CHANGE THIS BIT
batch_size,
shuffle=shuffle,
num_epochs=None)
You can use this as a normal python variable and set up successive, dependent steps to use this as a provided variable. e.g.
def get_example(data):
return tf.parse_example(data, features=feature_map)
sess.run([get_example(path_to_your_data)])

Tensorflow RNN sequence training

I'm making my first steps learning TF and have some trouble training RNNs.
My toy problem goes like this: a two layers LSTM + dense layer network is fed with raw audio data and should test whether a certain frequency is present in the sound.
so the network should 1 to 1 map float(audio data sequence) to float(pre-chosen frequency volume)
I've got this to work on Keras and seen a similar TFLearn solution but would like to implement this on bare Tensorflow in a relatively efficient way.
what i've done:
lstm = rnn_cell.BasicLSTMCell(LSTM_SIZE,state_is_tuple=True,forget_bias=1.0)
lstm = rnn_cell.DropoutWrapper(lstm)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * 2,state_is_tuple=True)
outputs, states = rnn.dynamic_rnn(stacked_lstm, in, dtype=tf.float32)
outputs = tf.transpose(outputs, [1, 0, 2])
last = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
network= tf.matmul(last, W) + b
# cost function, optimizer etc...
during training I fed this with (BATCH_SIZE, SEQUENCE_LEN,1) batches and it seems like the loss converged correctly but I can't figure out how to predict with the trained network.
My (awful lot of) questions:
how do i make this network return a sequence right from Tensorflow without going back to python for each sample(feed a sequence and predict a sequence of the same size)?
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
after some research:
how do i make this network return a sequence right from Tensorflow
without going back to python for each sample(feed a sequence and
predict a sequence of the same size)?
you can use state_saving_rnn
class Saver():
def __init__(self):
self.d = {}
def state(self, name):
if not name in self.d:
return tf.zeros([1,LSTM_SIZE],tf.float32)
return self.d[name]
def save_state(self, name, val):
self.d[name] = val
return tf.identity('save_state_name') #<-important for control_dependencies
outputs, states = rnn.state_saving_rnn(stacked_lstm, inx, Saver(),
('lstmstate', 'lstmstate2', 'lstmstate3', 'lstmstate4'),sequence_length=[EVAL_SEQ_LEN])
#4 states are for two layers of lstm each has hidden and CEC variables to restore
network = [tf.matmul(outputs[-1], W) for i in xrange(EVAL_SEQ_LEN)]
one problem is that state_saving_rnn is using rnn() and not dynamic_rnn() therefore unroll at compile time EVAL_SEQ_LEN steps you might want to re-implement state_saving_rnn with dynamic_rnn if you want to input long sequences
If I do want to predict one sample at a time and iterate in python what is the correct way to do it?
you can use dynamic_rnn and supply initial_state. this is probably just as efficient as state_saving_rnn. look at state_saving_rnn implementations for reference
During testing is dynamic_rnn needed or it's just used for unrolling for BPTT during training? why is dynamic_rnn returning all the back propagation steps Tensors? these are the outputs of each layer of the unrolled network right?
dynamic_rnn does do unrolling at runtime similarly to compile time rnn(). I guess it returns all the steps for you to branch the graph in some other places - after less time steps. in a network that use [one time step input * current state -> one output, new state] like the one described above it's not needed in testing but could be used for training truncated time back propagation