How does Tensorflow or Keras handle model weight inititialization and when does it happen? - tensorflow

After reading the answer to this question I am a bit confused as to when exactly TensorFlow initializes the weight and bias variables.
As per the answers, Compile defines the loss function, the optimizer and the metrics. That's all.
Since the compile() method doesn't initialize it then that would suggest that it happens during the fit() method run.
However the issue with that is, in case of loading models or loading weights how would fit() know that the weights, its presented with, are actually useful and should not be thrown away and then assigned random values in place of those.
We pass the type of intitializer in the argument kernel_initializer while declaring the layer. For example:
dense02 = tf.keras.layers.Dense(units=10,
kernel_initializer='glorot_uniform',
bias_initializer='zeros')
So an obvious question would be whether the weights are initialized layer by layer during the first epoch forward pass or does it happen for all layers before the first epoch.
(What I am trying to say is that if there say 5 Dense layers in the model, then does the initialization happen say a layer at a time, i.e. the first Dense layer gets initialized then the forward pass happens for that layer, then the second layer is initialized and the forward pass for second Dense layer happens and so on)
Another aspect is regarding transfer learning, when stacking custom layers on top of a trained model, the trained model layers have the weights, while the layers that I added wouldn't have any useful layers. So how would TensorFlow know to only initialize the variables of the layers I added and not the mess up the layers of the transferred model (provided, I don't have trainable=False)
How does TensorFlow or Keras handle weight initialization?

The weights are initialized when the model is created (when each layer in model is initialized), i.e before the compile() and fit():
import tensorflow as tf
from tensorflow.keras import models, layers
inputs = layers.Input((3, ))
outputs = layers.Dense(units=10,
kernel_initializer='glorot_uniform',
bias_initializer='zeros')(inputs)
model = models.Model(inputs=inputs, outputs=outputs)
for layer in model.layers:
print("Config:\n{}\nWeights:\n{}\n".format(layer.get_config(), layer.get_weights()))
Outputs:
Config:
{'batch_input_shape': (None, 3), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'input_1'}
Weights:
[]
Config:
{'name': 'dense', 'trainable': True, 'dtype': 'float32', 'units': 10, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}
Weights:
[array([[-0.60352975, 0.08275259, -0.6521113 , -0.5860774 , -0.42276743,
-0.3142944 , -0.28118378, 0.07770532, -0.5644444 , -0.47069687],
[ 0.4611913 , 0.35170448, -0.62191975, 0.5837332 , -0.3390234 ,
-0.4033073 , 0.03493106, -0.06078851, -0.53159714, 0.49872506],
[ 0.43685734, 0.6160207 , 0.01610583, -0.3673877 , -0.14144647,
-0.3792309 , 0.05478126, 0.602067 , -0.47438127, 0.36463356]],
dtype=float32), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]

After doing a bit more research, even though Mr. For Example's answer is correct, lets get a bit more deep into how initialization works in TensorFlow Keras.
As per the tf.keras.layers.Layer Doc, we can create variables in the following two methods:
__init__(self, ...): Defines custom layer attributes, and creates layer state variable that do not depend on input shapes, using add_weight()
build(self, input_shape): This method can be used to create weights that depend on the shape(s) of the input(s), using add_weight()
kb
The below code shows an example of a basic layer with 2 variables that does the computation: y = w . x + b:
class SimpleDense(Layer):
def __init__(self, units=32):
super(SimpleDense, self).__init__()
self.units = units
def build(self, input_shape): # Create the state of the layer (weights)
w_init = tf.random_normal_initializer()
self.w = tf.Variable(
initial_value=w_init(shape=(input_shape[-1], self.units),
dtype='float32'),
trainable=True)
b_init = tf.zeros_initializer()
self.b = tf.Variable(
initial_value=b_init(shape=(self.units,), dtype='float32'),
trainable=True)
def call(self, inputs): # Defines the computation from inputs to outputs
return tf.matmul(inputs, self.w) + self.b
# Instantiates the layer.
linear_layer = SimpleDense(4)
# This will also call `build(input_shape)` and create the weights.
y = linear_layer(tf.ones((2, 2)))
assert len(linear_layer.weights) == 2
# These weights are trainable, so they're listed in `trainable_weights`:
assert len(linear_layer.trainable_weights) == 2
The most interesting thing to note in the above code is when the build method is called.
The build() is called when the layer (after it has been initialized) is assigned some sort of input whether it be actual values or just a TensorFlow placeholder.
When using a Keras Sequential model, we add a layer to the model, it automatically assigns the input placeholder to the layer and there by initializing it at the same time.
Thus we see the weights before the calling of compile() or the fit() methods of the Keras Model. (Note that __call__() will automatically build the layer (if it has not been built yet) by calling build())
Regarding Transfer Learning, when we are loading the transferred model, we are loading already built layers, so the build method is not called again when you add the layers to your own model.
In other words, the layers, of the transferred model, already have had the input placeholder assigned to it and the build() method has already been called when the transferred model was being trained.
Useful References:
Keras Layer Doc
TF Tutorial: Custom Layers

Related

calling a model inside a model with tensorflow : layers do not support float type inputs

for learning i tried to to use tensorflow probability to fit a simple 1 dimensional function (x,y) but i have realized i do something wrong when i call a model inside another model and i really need to understand what.
here is my model
mlp1 = tf.keras.Sequential(
[tf.keras.layers.Input(shape=(1), dtype=tf.float32),
Dense(20, activation="relu", name="layer1"),
Dense(20, activation="relu", name="layer2"),
Dense(1, activation="softplus",name="layer3"),
]
)
class subq0(tf.keras.Model):
def __init__(self):
super().__init__()
self.mlp1=mlp1()
def call(self, inputs):
sigma= self.mlp1(inputs)
return sigma
polynom = subq0()
optimizer = tf.optimizers.SGD(learning_rate=0.0001,momentum=0.9)
polynom.compile(optimizer=optimizer, loss= "mse" )
polynom.build(input_shape=(1,))
polynom.summary()
polynomial.fit(X_train, y_train , epochs= 100, verbose=0, batch_size=64 ,validation_data=(X_val,y_val) )
the error i get is
ValueError: You cannot build your model by calling build if your
layers do not support float type inputs. Instead, in order to
instantiate and build your model, call your model on real tensor data
(of the correct dtype).
The actual error from call is:
'tensorflow.python.framework.ops.EagerTensor' object is not callable.

Upgraded to Tensorflow 2.5 now get a Lambda Layer error when using pretrained Keras Applications Models

I followed this tutorial to build a siamese network for my problem.
I was using Tensorflow 2.4.1 and now upgraded
This code worked wonderfully before
base_cnn = resnet.ResNet50(
weights="imagenet", input_shape=target_shape + (3,), include_top=False
)
flatten = layers.Flatten()(base_cnn.output)
dense1 = layers.Dense(512, activation="relu")(flatten)
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)
embedding = Model(base_cnn.input, output, name="Embedding")
trainable = False
for layer in base_cnn.layers:
if layer.name == "conv5_block1_out":
trainable = True
layer.trainable = trainable
Now each resnet layer or mobilenet or efficient net (tried them all)
throws these errors:
WARNING:tensorflow:
The following Variables were used a Lambda layer's call (tf.nn.convolution_620), but
are not present in its tracked objects:
<tf.Variable 'stem_conv/kernel:0' shape=(3, 3, 3, 48) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
It compiles and seems to fit.
But do we have to initialize the models somewhat differently in 2.5?
Thanks for any pointers!
I'm not sure what's the main reason for your issue as it's not reproducible generally. But here are some notes about that warning message. The traceback shown in your question is not from ResNet but from EfficientNet.
Now, we know that the Lambda layer exists so that arbitrary expressions can be used as a Layer when constructing Sequential and Functional API models. Lambda layers are best suited for simple operations or quick experimentation. While it is possible to use Variables with Lambda layers, this practice is discouraged as it can easily lead to bugs. For example:
import tensorflow as tf
x_input = tf.range(12.).numpy().reshape(-1, 4)
weights = tf.Variable(tf.random.normal((4, 2)), name='w')
bias = tf.ones((1, 2), name='b')
# lambda custom layer
mylayer1 = tf.keras.layers.Lambda(lambda x: tf.add(tf.matmul(x, weights),
bias), name='lambda1')
mylayer1(x_input)
WARNING:tensorflow:
The following Variables were used a Lambda layer's call (lambda1), but
are not present in its tracked objects:
<tf.Variable 'w:0' shape=(4, 2) dtype=float32, numpy=
array([[-0.753139 , -1.1668463 ],
[-1.3709341 , 0.8887151 ],
[ 0.3157893 , 0.01245957],
[-1.3878908 , -0.38395467]], dtype=float32)>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[ -3.903028 , 0.7617702],
[-16.687727 , -1.8367348],
[-29.472424 , -4.43524 ]], dtype=float32)>
It's because the mylayer1 layer doesn't trace the tf.Variables directly and so that those parameter won't appear in mylayer1.trainable_weights.
mylayer1.trainable_weights
[]
In general, Lambda layers can be convenient for simple stateless computation, but anything more complex should use a subclass Layer instead. From your traceback, it seems like there can be such a possible scenario with the step_conv layer.
for layer in EfficientNetB0(weights=None).layers:
if layer.name == 'stem_conv':
print(layer)
<tensorflow.python.keras.layers.convolutional.Conv2D object..
Quick surveying on source code of tf.compat.v1.nn.conv2d, lead to a lambda expression that might be the cause.
Here there is no need to revert back to TF2.4.1. I would always recommend try with latest version because it addressed many of the performance issues and new features.
I was able to execute above code without any issues in TF2.5.
import tensorflow as tf
print(tf.__version__)
from tensorflow.keras.applications import ResNet50
from tensorflow.keras import layers, Model
img_width, img_height = 224, 224
target_shape = (img_width, img_height, 3)
base_cnn = ResNet50(
weights="imagenet", input_shape=target_shape, include_top=False
)
flatten = layers.Flatten()(base_cnn.output)
dense1 = layers.Dense(512, activation="relu")(flatten)
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)
embedding = Model(base_cnn.input, output, name="Embedding")
trainable = False
for layer in base_cnn.layers:
if layer.name == "conv5_block1_out":
trainable = True
layer.trainable = trainable
Output:
2.5.0
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
94773248/94765736 [==============================] - 1s 0us/step
As per #Olli, Restarting and clearing the session the kernel has resolved the problem.
pip install tensorflow==2.3.0 , worked for me instead of tf 2.5
I was facing the issue related to using Lambda layer

models.set_weights() gives None

I am trying to build an ensamble DNN model. I train e.g. 5 models, take the weights and average them. After that I wanted to clone a first model and assign the new weights. But it does not work.
The Model is built like this:
def build_DNN_model(self):
# initialize the DNN
ann = tf.keras.models.Sequential()
# add first hidden layer
num_neurons = self.num_neurons
ann.add(tf.keras.layers.Dense(units=num_neurons, activation='relu', kernel_initializer=tf.constant_initializer(1.)))
ann.add(tf.keras.layers.Dropout(0.5))
# add second hidden layer
ann.add(tf.keras.layers.Dense(units=num_neurons, activation='relu'))
ann.add(tf.keras.layers.Dropout(0.5))
# add output layer
ann.add(tf.keras.layers.Dense(units=1))
# compile
ann.compile(optimizer='adam', loss='mean_squared_error')
return ann
Then the model is fitted to the data, actually I do 5 models, and fit all of them to the same data.
After that I create a list of KerasModel Objects, called "members".
And now I would like to assign my new weights to a clone of one of the models. But even if I do that:
members[0].set_weights(members[0].get_weights())
it returns me None.
I use Tensoflow 2 version.
I would appreciate your help very much.
You should define the input shape in your first layer of the model
after doing this I simply create 2 models like yours (m1,m2) and assign to m2 the same weights to m1... they are the same
def build_DNN_model(input_dim):
# initialize the DNN
ann = tf.keras.models.Sequential()
# add first hidden layer
num_neurons = 32
ann.add(tf.keras.layers.Dense(units=num_neurons, activation='relu',
kernel_initializer=tf.constant_initializer(1.),
input_dim=input_dim))
ann.add(tf.keras.layers.Dropout(0.5))
# add second hidden layer
ann.add(tf.keras.layers.Dense(units=num_neurons, activation='relu'))
ann.add(tf.keras.layers.Dropout(0.5))
# add output layer
ann.add(tf.keras.layers.Dense(units=1))
# compile
ann.compile(optimizer='adam', loss='mean_squared_error')
return ann
m1 = build_DNN_model((100))
m2 = build_DNN_model((100))
m2.set_weights(m1.get_weights())
# check the weights
[(w1==w2).all() for w1,w2 in zip(m1.get_weights(),m2.get_weights())]
# [True, True, True, True, True, True]
the notebook
EDIT1: assign random weights to m1:
m1.set_weights([np.random.uniform(0,1, i.shape) for i in m1.get_weights()])
EDIT2: here you find the working implementation of model_weight_ensemble in your contest from https://machinelearningmastery.com/polyak-neural-network-model-weight-ensemble/
Creating a simple model:
def create_model1():
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(13,)))
model.add(tf.keras.layers.Dense(units = 6, activation='relu', name = 'd1'))
model.add(tf.keras.layers.Dense(units = 2, activation='softmax', name = 'd2'))
return model
Model Architecture:
Looking at layers:
model.layers
Ouput:
[<tensorflow.python.keras.layers.core.Dense at 0x2193acc95c8>,
<tensorflow.python.keras.layers.core.Dense at 0x2193ad3ad08>]
Looking at the weights of 2nd dense layer:
model.layers[1].weights
Output:
[<tf.Variable 'd2/kernel:0' shape=(6, 2) dtype=float32, numpy=
array([[ 0.11061734, 0.61788374],
[ 0.31208295, 0.19295567],
[-0.6812483 , 0.05383837],
[ 0.39284903, 0.69312006],
[-0.519426 , 0.67820543],
[-0.7337165 , 0.11025453]], dtype=float32)>,
<tf.Variable 'd2/bias:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>]
Setting weights:
new_weights = [tf.random.uniform(shape = (6,2)), tf.random.uniform(shape = (2,))]
model.layers[1].set_weights(new_weights)
For setting weights the shape of new_weights should match the shape of weights of that particular layer.
Here, new_weights is a list containing two values. 1st element is the weight of the kernel and 2nd element is the weight for bias.

Keras TimeDistributed Not Masking CNN Model

For the sake of example, I have an input consisting of 2 images,of total shape (2,299,299,3). I'm trying to apply inceptionv3 on each image, and then subsequently process the output with an LSTM. I'm using a masking layer to exclude a blank image from being processed (specified below).
The code is:
import numpy as np
from keras import backend as K
from keras.models import Sequential,Model
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D, BatchNormalization, \
Input, GlobalAveragePooling2D, Masking,TimeDistributed, LSTM,Dense,Flatten,Reshape,Lambda, Concatenate
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.applications import inception_v3
IMG_SIZE=(299,299,3)
def create_base():
base_model = inception_v3.InceptionV3(weights='imagenet', include_top=False)
x = GlobalAveragePooling2D()(base_model.output)
base_model=Model(base_model.input,x)
return base_model
base_model=create_base()
#Image mask to ignore images with pixel values of -1
IMAGE_MASK = -2*np.expand_dims(np.ones(IMG_SIZE),0)
final_input=Input((2,IMG_SIZE[0],IMG_SIZE[1],IMG_SIZE[2]))
final_model = Masking(mask_value = -2.)(final_input)
final_model = TimeDistributed(base_model)(final_model)
final_model = Lambda(lambda x: x, output_shape=lambda s:s)(final_model)
#final_model = Reshape(target_shape=(2, 2048))(final_model)
#final_model = Masking(mask_value = 0.)(final_model)
final_model = LSTM(5,return_sequences=False)(final_model)
final_model = Model(final_input,final_model)
#Create a sample test image
TEST_IMAGE = np.ones(IMG_SIZE)
#Create a test sample input, consisting of a normal image and a masked image
TEST_SAMPLE = np.concatenate((np.expand_dims(TEST_IMAGE,axis=0),IMAGE_MASK))
inp = final_model.input # input placeholder
outputs = [layer.output for layer in final_model.layers] # all layer outputs
functors = [K.function([inp]+ [K.learning_phase()], [out]) for out in outputs]
layer_outs = [func([np.expand_dims(TEST_SAMPLE,0), 1.]) for func in functors]
This does not work correctly. Specifically, the model should mask the IMAGE_MASK part of the input, but it instead processes it with inception (giving a nonzero output). here are the details:
layer_out[-1] , the LSTM output is fine:
[array([[-0.15324114, -0.09620268, -0.01668587, 0.07938149, -0.00757846]], dtype=float32)]
layer_out[-2] and layer_out[-3] , the LSTM input is wrong, it should have all zeros in the second array:
[array([[[ 0.37713543, 0.36381325, 0.36197218, ..., 0.23298527,
0.43247852, 0.34844452],
[ 0.24972123, 0.2378867 , 0.11810347, ..., 0.51930511,
0.33289322, 0.33403745]]], dtype=float32)]
layer_out[-4], the input to the CNN is correctly masked:
[[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
...,
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]]],
[[[-0., -0., -0.],
[-0., -0., -0.],
[-0., -0., -0.],
...,
[-0., -0., -0.],
[-0., -0., -0.],
[-0., -0., -0.]],
Note that the code seems to work correctly with a simpler base_model such as:
def create_base():
input_layer=Input(IMG_SIZE)
base_model=Flatten()(input_layer)
base_model=Dense(2048)(base_model)
base_model=Model(input_layer,base_model)
return base_model
I have exhausted most online resources on this. Permutations of this question have been asked on Keras's github, such as here, here and here, but I can't seem to find any concrete resolution.
The links suggest that the issues seem to be stemming from a combination of TimeDistributed being applied to BatchNormalization, and the hacky fixes of either the Lambda identity layer, or Reshape layers remove errors but don't seem to output the correct model.
I've tried to force the base model to support masking via:
base_model.__setattr__('supports_masking',True)
and I've also tried applying an identity layer via:
TimeDistributed(Lambda(lambda x: base_model(x), output_shape=lambda s:s))(final_model)
but none of these seem to work. Note that I would like the final model to be trainable, in particular the CNN part of it should remain trainable.
Not entirely sure this will work, but based on the comment made here, with a newer version of tensorflow + keras it should work:
final_model = TimeDistributed(Flatten())(final_input)
final_model = Masking(mask_value = -2.)(final_model)
final_model = TimeDistributed(Reshape(IMG_SIZE))(final_model)
final_model = TimeDistributed(base_model)(final_model)
final_model = Model(final_input,final_model)
I took a look at the source code of masking, and I noticed Keras creates a mask tensor that only reduces the last axis. As long as you're dealing with 5D tensors, it will cause no problem, but when you reduce the dimensions for the LSTM, this masking tensor becomes incompatible.
Doing the first flatten step, before masking, will assure that the masking tensor works properly for 3D tensors. Then you expand the image again to its original size.
I'll probably try to install newer versions soon to test it myself, but these installing procedures have caused too much trouble and I'm in the middle of something important here.
On my machine, this code compiles, but that strange error appears in prediction time (see link at the first line of this answer).
Creating a model for predicting the intermediate layers
I'm not sure, by the code I've seen, that the masking function is kept internally in tensors. I don't know exactly how it works, but it seems to be managed separately from the building of the functions inside the layers.
So, try using a keras standard model to make the predictions:
inp = final_model.input # input placeholder
outputs = [layer.output for layer in final_model.layers] # all layer outputs
fullModel = Model(inp,outputs)
layerPredictions = fullModel.predict(np.expand_dims(TEST_SAMPLE,0))
print(layerPredictions[-2])
It seems to be working as intended. Masking in Keras doesn't produce zeros as you would expect, it instead skips the timesteps that are masked in upstream layers such as LSTM and loss calculation. In case of RNNs, Keras (at least tensorflow) is implemented such that the states from the previous step are carried over, tensorflow_backend.py. This is done in part to preserve the shapes of tensors when dynamic input is given.
If you really want zeros you will have to implement your own layer with a similar logic to Masking and return zeros explicitly. To solve your problem, you need a mask before the final LSTM layer using the final_input:
class MyMask(Masking):
"""Layer that adds a mask based on initial input."""
def compute_mask(self, inputs, mask=None):
# Might need to adjust shapes
return K.any(K.not_equal(inputs[0], self.mask_value), axis=-1)
def call(self, inputs):
# We just return input back
return inputs[1]
def compute_output_shape(self, input_shape):
return input_shape[1]
final_model = MyMask(mask_value=-2.)([final_input, final_model])
You probably can attach the mask in a simpler manner but this custom class essentially adds a mask based on your initial inputs and outputs a Keras tensor that now has a mask.
Your LSTM will ignore in your example the second image. To confirm you can return_sequences=Trueand check that the output for 2 images are identical.
I'm trying implement the same thing, I want my LSTM sequences to have variable sizes. However I can't even implement your original model. I obtain the following error: TypeError: Layer input_1 does not support masking, but was passed an input_mask: Tensor("time_distributed_1/Reshape_1:0", shape=(?, 100, 100), dtype=bool) I'm using tensorflow 1.10 and keras 2.2.2
I solved the problem by adding a second input, a mask to specify which timesteps to take into account for the LSTM. That way the image sequence always has the same number of timesteps, the CNN always generates an output, but some of them are ignored for the LSTM input. However, the missing images need to be chosen carefully so that the batch normalization is not affected.
def LSTM_CNN(params):
resnet = ResNet50(include_top=False, weights='imagenet', pooling = 'avg')
input_layer = Input(shape=(params.numFrames, params.height, params.width, 3))
input_mask = Input(shape=(params.numFrames,1))
curr_layer = TimeDistributed(resnet)(input_layer)
resnetOutput = Dropout(0.5)(curr_layer)
curr_layer = multiply([resnetOutput,input_mask])
cnn_output = curr_layer
curr_layer = Masking(mask_value=0.0)(curr_layer)
lstm_out = LSTM(256, dropout=0.5)(curr_layer)
output = Dense(output_dim=params.numClasses, activation='sigmoid')(lstm_out)
model = Model([input_layer, input_mask], output)
return model

Calling a Keras model on a TensorFlow tensor but keep weights

In Keras as a simplified interface to TensorFlow: tutorial they describe how one can call a Keras model on a TensorFlow tensor.
from keras.models import Sequential
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=784))
model.add(Dense(10, activation='softmax'))
# this works!
x = tf.placeholder(tf.float32, shape=(None, 784))
y = model(x)
They also say:
Note: by calling a Keras model, your are reusing both its architecture and its weights. When you are calling a model on a tensor, you are creating new TF ops on top of the input tensor, and these ops are reusing the TF Variable instances already present in the model.
I interpret this as that the weights of the model will be the same in y as in model. However, for me it seems like the weights in the resulting Tensorflow node are reinitialized. A minimal example can be seen below:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
# Create model with weight initialized to 1
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='ones',
bias_initializer='zeros'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
# Save the weights
model.save_weights('file')
# Create another identical model except with weight initialized to 0
model2 = Sequential()
model2.add(Dense(1, input_dim=1, kernel_initializer='zeros',
bias_initializer='zeros'))
model2.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
# Load the weight from the first model
model2.load_weights('file')
# Call model with Tensorflow tensor
v = tf.Variable([[1, ], ], dtype=tf.float32)
node = model2(v)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(node), model2.predict(np.array([[1, ], ])))
# Prints (array([[ 0.]], dtype=float32), array([[ 1.]], dtype=float32))
Why I want to do this:
I want to use a trained network in another minimization scheme were the network "punish" places in the search space that are not allowed. So if you have ideas not involving this specific approach, that is also very appreciated.
Finally found the answer. There are two problems in the example from the question.
1:
The first and most obvious was that I called the tf.global_variables_intializer() function which will re-initialize all variables in the session. Instead I should have called the tf.variables_initializer(var_list) where var_list is a list of variables to initialize.
2:
The second problem was that Keras did not use the same session as the native Tensorflow objects. This meant that to be able to run the tensorflow object model2(v) with my session sess it needed to be reinitialized. Again Keras as a simplified interface to tensorflow: Tutorial was able to help
We should start by creating a TensorFlow session and registering it with Keras. This means that Keras will use the session we registered to initialize all variables that it creates internally.
import tensorflow as tf
sess = tf.Session()
from keras import backend as K
K.set_session(sess)
If we apply these changes to the example provided in my question we get the following code that does exactly what is expected from it.
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense
sess = tf.Session()
# Register session with Keras
K.set_session(sess)
model = Sequential()
model.add(Dense(1, input_dim=1, kernel_initializer='ones',
bias_initializer='zeros'))
model.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model.save_weights('test')
model2 = Sequential()
model2.add(Dense(1, input_dim=1, kernel_initializer='zeros',
bias_initializer='zeros'))
model2.compile(loss='binary_crossentropy', optimizer='adam',
metrics=['accuracy'])
model2.load_weights('test')
v = tf.Variable([[1, ], ], dtype=tf.float32)
node = model2(v)
init = tf.variables_initializer([v, ])
sess.run(init)
print(sess.run(node), model2.predict(np.array([[1, ], ])))
# prints: (array([[ 1.]], dtype=float32), array([[ 1.]], dtype=float32))
Conclusion:
The lesson is that when mixing Tensorflow and Keras, make sure everything uses the same session.
Thanks for asking this question, and answering it, it helped me! In addition to setting the same tf session in the Keras backend, it is also important to note that if you want to load a Keras model from a file, you need to run a global variable initializer op before you load the model.
sess = tf.Session()
# make sure keras has the same session as this code
tf.keras.backend.set_session(sess)
# Do this BEFORE loading a keras model
init_op = tf.global_variables_initializer()
sess.run(init_op)
model = models.load_model('path/to/your/model.h5')