I am trying to build an ensamble DNN model. I train e.g. 5 models, take the weights and average them. After that I wanted to clone a first model and assign the new weights. But it does not work.
The Model is built like this:
def build_DNN_model(self):
# initialize the DNN
ann = tf.keras.models.Sequential()
# add first hidden layer
num_neurons = self.num_neurons
ann.add(tf.keras.layers.Dense(units=num_neurons, activation='relu', kernel_initializer=tf.constant_initializer(1.)))
ann.add(tf.keras.layers.Dropout(0.5))
# add second hidden layer
ann.add(tf.keras.layers.Dense(units=num_neurons, activation='relu'))
ann.add(tf.keras.layers.Dropout(0.5))
# add output layer
ann.add(tf.keras.layers.Dense(units=1))
# compile
ann.compile(optimizer='adam', loss='mean_squared_error')
return ann
Then the model is fitted to the data, actually I do 5 models, and fit all of them to the same data.
After that I create a list of KerasModel Objects, called "members".
And now I would like to assign my new weights to a clone of one of the models. But even if I do that:
members[0].set_weights(members[0].get_weights())
it returns me None.
I use Tensoflow 2 version.
I would appreciate your help very much.
You should define the input shape in your first layer of the model
after doing this I simply create 2 models like yours (m1,m2) and assign to m2 the same weights to m1... they are the same
def build_DNN_model(input_dim):
# initialize the DNN
ann = tf.keras.models.Sequential()
# add first hidden layer
num_neurons = 32
ann.add(tf.keras.layers.Dense(units=num_neurons, activation='relu',
kernel_initializer=tf.constant_initializer(1.),
input_dim=input_dim))
ann.add(tf.keras.layers.Dropout(0.5))
# add second hidden layer
ann.add(tf.keras.layers.Dense(units=num_neurons, activation='relu'))
ann.add(tf.keras.layers.Dropout(0.5))
# add output layer
ann.add(tf.keras.layers.Dense(units=1))
# compile
ann.compile(optimizer='adam', loss='mean_squared_error')
return ann
m1 = build_DNN_model((100))
m2 = build_DNN_model((100))
m2.set_weights(m1.get_weights())
# check the weights
[(w1==w2).all() for w1,w2 in zip(m1.get_weights(),m2.get_weights())]
# [True, True, True, True, True, True]
the notebook
EDIT1: assign random weights to m1:
m1.set_weights([np.random.uniform(0,1, i.shape) for i in m1.get_weights()])
EDIT2: here you find the working implementation of model_weight_ensemble in your contest from https://machinelearningmastery.com/polyak-neural-network-model-weight-ensemble/
Creating a simple model:
def create_model1():
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(13,)))
model.add(tf.keras.layers.Dense(units = 6, activation='relu', name = 'd1'))
model.add(tf.keras.layers.Dense(units = 2, activation='softmax', name = 'd2'))
return model
Model Architecture:
Looking at layers:
model.layers
Ouput:
[<tensorflow.python.keras.layers.core.Dense at 0x2193acc95c8>,
<tensorflow.python.keras.layers.core.Dense at 0x2193ad3ad08>]
Looking at the weights of 2nd dense layer:
model.layers[1].weights
Output:
[<tf.Variable 'd2/kernel:0' shape=(6, 2) dtype=float32, numpy=
array([[ 0.11061734, 0.61788374],
[ 0.31208295, 0.19295567],
[-0.6812483 , 0.05383837],
[ 0.39284903, 0.69312006],
[-0.519426 , 0.67820543],
[-0.7337165 , 0.11025453]], dtype=float32)>,
<tf.Variable 'd2/bias:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>]
Setting weights:
new_weights = [tf.random.uniform(shape = (6,2)), tf.random.uniform(shape = (2,))]
model.layers[1].set_weights(new_weights)
For setting weights the shape of new_weights should match the shape of weights of that particular layer.
Here, new_weights is a list containing two values. 1st element is the weight of the kernel and 2nd element is the weight for bias.
Related
When extracting a model layer output as in the Tensorflow sequential model document example below, does the input x in the code go through the my_first_layer as well before going into my_intermediate_layer layer? Or does it directly go into the my_intermediate_layer layer without going through the my_first_layer layer?
If it directly goes into the my_intermediate_layer, the input to the my_intermediate_layer does not have the transformation done by my_first_layer Conv2D. However, it seems not right to me because the input should go through all the preceding layers.
Please help understand what layers does x go through?
Feature extraction with a Sequential model
initial_model = keras.Sequential(
[
keras.Input(shape=(250, 250, 3)),
layers.Conv2D(32, 5, strides=2, activation="relu", name="my_first_layer"),
layers.Conv2D(32, 3, activation="relu", name="my_intermediate_layer"),
layers.Conv2D(32, 3, activation="relu"),
]
)
# The model goes through the training.
...
# Feature extractor
feature_extractor = keras.Model(
inputs=initial_model.inputs,
outputs=initial_model.get_layer(name="my_intermediate_layer").output,
)
# Call feature extractor on test input.
x = tf.ones((1, 250, 250, 3))
features = feature_extractor(x)
Keras offers higher level of API, which runs on top of the TensorFlow machine learning platform. Keras offers two types of class to define the neural network model, namely 'Sequential Class' and 'Model Class.'
Sequential Class:
It groups a linear stack of layers to form a model, such that each layer has one input and one output tensor. One can add required layers to the defined model (schema-1) as shown below to execute sequentially as name suggests Keras Sequential Class,
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(8, input_shape=(16,)))
model.add(tf.keras.layers.Dense(4))
The schema for defining a sequential model Keras-Sequential Class Definition has shown below (schema-2),
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential(
[
layers.Dense(2, activation="relu", name="layer1"),
layers.Dense(3, activation="relu", name="layer2"),
layers.Dense(4, name="layer3"),
]
)
# Call model on a test input
x = tf.ones((3, 3))
y = model(x)
Model Class
It allows the user to build a custom model along with many layers as shown below,
import tensorflow as tf
inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
outputs = tf.keras.layers.Dense(5, activation=tf.nn.softmax)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
It allows one to create a new functional API model with additional layers Keras - Model Class as follows,
inputs = keras.Input(shape=(None, None, 3))
processed = keras.layers.RandomCrop(width=32, height=32)(inputs)
conv = keras.layers.Conv2D(filters=2, kernel_size=3)(processed)
pooling = keras.layers.GlobalAveragePooling2D()(conv)
feature = keras.layers.Dense(10)(pooling)
Note: The input tensors supports only dicts, lists or tuples but not lists of list, or dicts of dict.
I hope that this helps.
I am building a Siamese network using Keras(TensorFlow) where the target is a binary column, i.e., match or mismatch(1 or 0). But the model fit method throws an error saying that the y_pred is not compatible with the y_true shape. I am using the binary_crossentropy loss function.
Here is the error I see:
Here is the code I am using:
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[tf.keras.metrics.Recall()])
history = model.fit([X_train_entity_1.todense(),X_train_entity_2.todense()],np.array(y_train),
epochs=2,
batch_size=32,
verbose=2,
shuffle=True)
My Input data shapes are as follows:
Inputs:
X_train_entity_1.shape is (700,2822)
X_train_entity_2.shape is (700,2822)
Target:
y_train.shape is (700,1)
In the error it throws, y_pred is the variable which was created internally. What is y_pred dimension is 2822 when I am having a binary target. And 2822 dimension actually matches the input size, but how do I understand this?
Here is the model I created:
in_layers = []
out_layers = []
for i in range(2):
input_layer = Input(shape=(1,))
embedding_layer = Embedding(embed_input_size+1, embed_output_size)(input_layer)
lstm_layer_1 = Bidirectional(LSTM(1024, return_sequences=True,recurrent_dropout=0.2, dropout=0.2))(embedding_layer)
lstm_layer_2 = Bidirectional(LSTM(512, return_sequences=True,recurrent_dropout=0.2, dropout=0.2))(lstm_layer_1)
in_layers.append(input_layer)
out_layers.append(lstm_layer_2)
merge = concatenate(out_layers)
dense1 = Dense(256, activation='relu', kernel_initializer='he_normal', name='data_embed')(merge)
drp1 = Dropout(0.4)(dense1)
btch_norm1 = BatchNormalization()(drp1)
dense2 = Dense(32, activation='relu', kernel_initializer='he_normal')(btch_norm1)
drp2 = Dropout(0.4)(dense2)
btch_norm2 = BatchNormalization()(drp2)
output = Dense(1, activation='sigmoid')(btch_norm2)
model = Model(inputs=in_layers, outputs=output)
model.summary()
Since my data is very sparse, I used todense. And there the type is as follows:
type(X_train_entity_1) is scipy.sparse.csr.csr_matrix
type(X_train_entity_1.todense()) is numpy.matrix
type(X_train_entity_2) is scipy.sparse.csr.csr_matrix
type(X_train_entity_2.todense()) is numpy.matrix
Summary of last few layers as follows:
Mismatched shape in the Input layer. The input shape needs to match the shape of a single element passed as x, or dataset.shape[1:]. So since your dataset size is (700,2822), that is 700 samples of size 2822. So your input shape should be 2822.
Change:
input_layer = Input(shape=(1,))
To:
input_layer = Input(shape=(2822,))
You need to set return_sequences in the lstm_layer_2 to False:
lstm_layer_2 = Bidirectional(LSTM(512, return_sequences=False, recurrent_dropout=0.2, dropout=0.2))(lstm_layer_1)
Otherwise, you will still have the timesteps of your input. That is why you have the shape (None, 2822, 1). You can also add a Flatten layer prior to your output layer, but I would recommend setting return_sequences=False.
Note that a Dense layer computes the dot product between the inputs and the kernel along the last axis of the inputs.
After reading the answer to this question I am a bit confused as to when exactly TensorFlow initializes the weight and bias variables.
As per the answers, Compile defines the loss function, the optimizer and the metrics. That's all.
Since the compile() method doesn't initialize it then that would suggest that it happens during the fit() method run.
However the issue with that is, in case of loading models or loading weights how would fit() know that the weights, its presented with, are actually useful and should not be thrown away and then assigned random values in place of those.
We pass the type of intitializer in the argument kernel_initializer while declaring the layer. For example:
dense02 = tf.keras.layers.Dense(units=10,
kernel_initializer='glorot_uniform',
bias_initializer='zeros')
So an obvious question would be whether the weights are initialized layer by layer during the first epoch forward pass or does it happen for all layers before the first epoch.
(What I am trying to say is that if there say 5 Dense layers in the model, then does the initialization happen say a layer at a time, i.e. the first Dense layer gets initialized then the forward pass happens for that layer, then the second layer is initialized and the forward pass for second Dense layer happens and so on)
Another aspect is regarding transfer learning, when stacking custom layers on top of a trained model, the trained model layers have the weights, while the layers that I added wouldn't have any useful layers. So how would TensorFlow know to only initialize the variables of the layers I added and not the mess up the layers of the transferred model (provided, I don't have trainable=False)
How does TensorFlow or Keras handle weight initialization?
The weights are initialized when the model is created (when each layer in model is initialized), i.e before the compile() and fit():
import tensorflow as tf
from tensorflow.keras import models, layers
inputs = layers.Input((3, ))
outputs = layers.Dense(units=10,
kernel_initializer='glorot_uniform',
bias_initializer='zeros')(inputs)
model = models.Model(inputs=inputs, outputs=outputs)
for layer in model.layers:
print("Config:\n{}\nWeights:\n{}\n".format(layer.get_config(), layer.get_weights()))
Outputs:
Config:
{'batch_input_shape': (None, 3), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'input_1'}
Weights:
[]
Config:
{'name': 'dense', 'trainable': True, 'dtype': 'float32', 'units': 10, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}
Weights:
[array([[-0.60352975, 0.08275259, -0.6521113 , -0.5860774 , -0.42276743,
-0.3142944 , -0.28118378, 0.07770532, -0.5644444 , -0.47069687],
[ 0.4611913 , 0.35170448, -0.62191975, 0.5837332 , -0.3390234 ,
-0.4033073 , 0.03493106, -0.06078851, -0.53159714, 0.49872506],
[ 0.43685734, 0.6160207 , 0.01610583, -0.3673877 , -0.14144647,
-0.3792309 , 0.05478126, 0.602067 , -0.47438127, 0.36463356]],
dtype=float32), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]
After doing a bit more research, even though Mr. For Example's answer is correct, lets get a bit more deep into how initialization works in TensorFlow Keras.
As per the tf.keras.layers.Layer Doc, we can create variables in the following two methods:
__init__(self, ...): Defines custom layer attributes, and creates layer state variable that do not depend on input shapes, using add_weight()
build(self, input_shape): This method can be used to create weights that depend on the shape(s) of the input(s), using add_weight()
kb
The below code shows an example of a basic layer with 2 variables that does the computation: y = w . x + b:
class SimpleDense(Layer):
def __init__(self, units=32):
super(SimpleDense, self).__init__()
self.units = units
def build(self, input_shape): # Create the state of the layer (weights)
w_init = tf.random_normal_initializer()
self.w = tf.Variable(
initial_value=w_init(shape=(input_shape[-1], self.units),
dtype='float32'),
trainable=True)
b_init = tf.zeros_initializer()
self.b = tf.Variable(
initial_value=b_init(shape=(self.units,), dtype='float32'),
trainable=True)
def call(self, inputs): # Defines the computation from inputs to outputs
return tf.matmul(inputs, self.w) + self.b
# Instantiates the layer.
linear_layer = SimpleDense(4)
# This will also call `build(input_shape)` and create the weights.
y = linear_layer(tf.ones((2, 2)))
assert len(linear_layer.weights) == 2
# These weights are trainable, so they're listed in `trainable_weights`:
assert len(linear_layer.trainable_weights) == 2
The most interesting thing to note in the above code is when the build method is called.
The build() is called when the layer (after it has been initialized) is assigned some sort of input whether it be actual values or just a TensorFlow placeholder.
When using a Keras Sequential model, we add a layer to the model, it automatically assigns the input placeholder to the layer and there by initializing it at the same time.
Thus we see the weights before the calling of compile() or the fit() methods of the Keras Model. (Note that __call__() will automatically build the layer (if it has not been built yet) by calling build())
Regarding Transfer Learning, when we are loading the transferred model, we are loading already built layers, so the build method is not called again when you add the layers to your own model.
In other words, the layers, of the transferred model, already have had the input placeholder assigned to it and the build() method has already been called when the transferred model was being trained.
Useful References:
Keras Layer Doc
TF Tutorial: Custom Layers
In this tf tutorial, the U-net model has been divided into 2 parts, first contraction where they have used Mobilenet and it is not trainable. In second part, I'm not able to understand what all layers are being trained. As far as I could see, only the last layer conv2dTranspose seems trainable. Am I right?
And if I am how could only one layer is able to do such a complex task as segmentation?
Tutorial link: https://www.tensorflow.org/tutorials/images/segmentation
The code for the Image Segmentation Model, from the Tutorial is shown below:
def unet_model(output_channels):
inputs = tf.keras.layers.Input(shape=[128, 128, 3])
x = inputs
# Downsampling through the model
skips = down_stack(x)
x = skips[-1]
skips = reversed(skips[:-1])
# Upsampling and establishing the skip connections
for up, skip in zip(up_stack, skips):
x = up(x)
concat = tf.keras.layers.Concatenate()
x = concat([x, skip])
# This is the last layer of the model
last = tf.keras.layers.Conv2DTranspose(
output_channels, 3, strides=2,
padding='same') #64x64 -> 128x128
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
First part of the Model is Downsampling uses not the entire Mobilenet Architecture but only the Layers,
'block_1_expand_relu', # 64x64
'block_3_expand_relu', # 32x32
'block_6_expand_relu', # 16x16
'block_13_expand_relu', # 8x8
'block_16_project'
of the Pre-Trained Model, Mobilenet, which are non-trainable.
Second part of the Model (which is of your interest), before the layer, Conv2DTranspose is Upsampling part, which is present in the list,
up_stack = [
pix2pix.upsample(512, 3), # 4x4 -> 8x8
pix2pix.upsample(256, 3), # 8x8 -> 16x16
pix2pix.upsample(128, 3), # 16x16 -> 32x32
pix2pix.upsample(64, 3), # 32x32 -> 64x64
]
It means that it is accessing a Function named upsample from the Module, pix2pix. The code for the Module, pix2pix is present in this Github Link.
Code for the function, upsample is shown below:
def upsample(filters, size, norm_type='batchnorm', apply_dropout=False):
"""Upsamples an input.
Conv2DTranspose => Batchnorm => Dropout => Relu
Args:
filters: number of filters
size: filter size
norm_type: Normalization type; either 'batchnorm' or 'instancenorm'.
apply_dropout: If True, adds the dropout layer
Returns:
Upsample Sequential Model
"""
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
padding='same',
kernel_initializer=initializer,
use_bias=False))
if norm_type.lower() == 'batchnorm':
result.add(tf.keras.layers.BatchNormalization())
elif norm_type.lower() == 'instancenorm':
result.add(InstanceNormalization())
if apply_dropout:
result.add(tf.keras.layers.Dropout(0.5))
result.add(tf.keras.layers.ReLU())
return result
This means that the second part of the Model comprises of the Upsampling Layers, whose functionality is defined above, with the Number of Filters being 512, 256, 128 and 64.
I'm working on creating a Lambda layer after the Convolution Layer using Tensorflow 2.x.
I have a function named "custom_layer" which takes in the tensor output from the previous convolution layer.
I need to extract each feature maps of the convolution layer from this tensor and perform mathematical operations.
Finally, the outputs have to be combined into a single tensor and returned to be used in the next layer.
#Lamba layer
def custom_layer(tensor):
# perform operation on individual feature maps
# return the combined tensor output
return tensor
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(filters= 64, kernel_size= (3,3), input_shape = (28,28,1), activation = 'relu', name = 'conv2D_1'),
tf.keras.layers.Lambda(custom_layer, name="lambda_layer"),
tf.keras.layers.Conv2D(filters= 64, kernel_size= (3,3), activation = 'relu', name = 'conv2D_2'),
tf.keras.layers.MaxPooling2D(pool_size = (2,2), name = 'MaxPool2D_1'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation= 'softmax')])
Using tf.print(tensor) I was able to view the tensor output (feature maps). But I'm not able to figure out a method to access those individual feature maps.
The issue is solved. I used tf.py_function() within the custom_layer(tensor) function.