Are weights and trainable variables the same in Keras? - tensorflow

That's all, I need to know if they are the same or correspond to different concepts.
When I use the model.summary() method, it gives me the amount of trainable variables, I need to know if they are the same as the weights

"Almost".
In most cases, yes, they are. But there are layers that use non-trainable weights for other purposes.
For instance, a BatchNormalization layers has four weight variables:
mean: not trainable with backpropagation, but learnable from taking statistics from the data
variance: not trainable with backpropagation, but learnable from taking statistics from the data
scale: trainable with backpropagation
offset: trainable with backpropagation

Related

efficientnet.tfkeras vs tf.keras.applications.efficientnet

I am trying to use efficientnet to custom train my dataset.
And I find out with all other code/data/config the same.
efficientnet.tfkeras.EfficientNetB0 can gives ~90% training/prediction accruacy and tf.keras.applications.efficientnet.EfficientNetB0 only gives ~70% accuracy.
But I guess both should be the same implementation of the efficient net, or I am missing something here?
I am using latest efficientnet and Tensorflow 2.3.0
with strategy.scope():
model = tf.keras.Sequential([
efficientnet.tfkeras.EfficientNetB0( #tf.keras.applications.efficientnet.EfficientNetB0
input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3),
weights='imagenet',
include_top=False
),
L.GlobalAveragePooling2D(),
L.Dense(1, activation='sigmoid')
])
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['binary_crossentropy']
)
model.summary()
I did run into the same problem for EfficientNetB4 and did encounter the following:
The number of total parameters are not equal. The trainable parameters are equal, but the non-trainable parameters aren't. The efficientnet.tfkeras has 7 fewer non-trainable parameters than the tf.keras.applications model.
The number of layers are not equal, the efficientnet.tfkeras has fewer layers than tf.keras.application model.
The different layers are at the very beginning, the most noteworthy are the normalization and rescaling layers, which are in the tf.keras.applications model, but not in the efficientnet.tfkeras model. You can observe this yourself using the model.summary() method.
When applying this layer, by using model.layers[i](array), it turn out these layers do rescale the image by dividing it by 255 and applying normalization according to:
(input_image - IMAGENET_MEAN) / square_root(IMAGENET_STD)
Thus, it turns out the image normalization is build into the model. When you perform this normalization yourself to the input image, the image will be normalized twice resulting in extremely small pixel values. The model will therefore have a hard time learning.
TLDR: Do not normalize the input image as it is build into the tf.keras.application model, input images should have values in the range 0-255.

Build layers with fixed weights in TensorFlow

I want to build a fully-connected (dense) layer for a regression task. I usually do it with TF2, using Keras API like:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=2, activation='sigmoid', input_shape=(1, )))
model.add(tf.keras.layers.Dense(units=2, activation='linear'))
model.compile(optimizer='adam', loss='mae')
model.fit(inp_data, out_data, epochs=1000)
Now I want to build a custom layer. The layer is composed of, say 10 units, in which 8 units have predefined, fixed, untrainable weights and biases and 2 units have randomly-chosen weights and biases, to be trained by the network. Has anyone any idea how can I define it in Tensorflow?
Keras layers may receive a trainable parameter, True by default, to indicate whether you want them to be trained. Non-trainable layers will just keep the value they are given by the initializer. If I understand correctly, you want to have one layer which is only partially trainable. That is not possible as such with existing layers. Maybe you could do it with a custom layer class, but you can have an equivalent behavior by using two simple layers and then concatenating them (as long as your activation works element-wise, and even it it doesn't, like in a softmax layer, you could apply that activation after the concatenation). This is how it could work:
inputs = tf.keras.Input(shape=(1,))
# This is the trainable part of the layer
layer_train = tf.keras.layers.Dense(units=8, activation='sigmoid')(inputs)
# This is the non-trainable part
layer_const = tf.keras.layers.Dense(units=2, activation='sigmoid', trainable=False)(inputs)
# Merge both parts
layer = tf.keras.layers.Concatenate()([layer_train, layer_const])
# Make model
model = tf.keras.Model(inputs=inputs, outputs=layer)
# ...

how to make tensorflow ops non trainable

I'm building a stacked convolutional autoencoder with Tensorflow Core(no API pure Tensorflow). I want to add non trainable layers between encoder and decoder. Does anybody know how to add non trainable layers in tensorflow graph. The tensorboard graph picture is attached, the ops which appears in the blue marked box are the ones that I want to make non trainable, or one can say I do not want gradient computation on them.
TF Version: 1.15
I've tried out tf.stop_gradient() method but this method prevents the contribution of all the input before it. Tensorboard Graph
You have two options:
When you define the weights variable with tf.Variable or tf.get_variable, pass trainable=False. This will stop the variable from being added to the trainable variables collection (accessible through tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)), which is used by default as the list of variables to train by the optimizer.
When you define the optimization step with minimize or compute_gradients, pass a var_list argument with the list of variables that you want to train. The optimizer will then ignore the trainable variables collection and will only affect the listed variables.

Weights Not Trainable with `tf.contrib.layers.recompute_grad`

When I use tf.contrib.layers.recompute_grad on a layer, the trainable weights are eliminated. How do I save the weights that are recalculated on the backwards pass at each epoch?

What is the definition of a non-trainable parameter?

What is the definition of non-trainable parameter in a model?
For example, while you are building your own model, its value is 0 as a default, but when you want to use an inception model, it is becoming something else rather than 0. What would be the reason behind it?
In keras, non-trainable parameters (as shown in model.summary()) means the number of weights that are not updated during training with backpropagation.
There are mainly two types of non-trainable weights:
The ones that you have chosen to keep constant when training. This means that keras won't update these weights during training at all.
The ones that work like statistics in BatchNormalization layers. They're updated with mean and variance, but they're not "trained with backpropagation".
Weights are the values inside the network that perform the operations and can be adjusted to result in what we want. The backpropagation algorithm changes the weights towards a lower error at the end.
By default, all weights in a keras model are trainable.
When you create layers, internally it creates its own weights and they're trainable. (The backpropagation algorithm will update these weights)
When you make them untrainable, the algorithm will not update these weights anymore. This is useful, for instance, when you want a convolutional layer with a specific filter, like a Sobel filter, for instance. You don't want the training to change this operation, so these weights/filters should be kept constant.
There is a lot of other reasons why you might want to make weights untrainable.
Changing parameters:
For deciding whether weights are trainable or not, you take layers from the model and set trainable:
model.get_layer(layerName).trainable = False #or True
This must be done before compilation.
There are some details that other answers do not cover.
In Keras, non-trainable parameters are the ones that are not trained using gradient descent. This is also controlled by the trainable parameter in each layer, for example:
from keras.layers import *
from keras.models import *
model = Sequential()
model.add(Dense(10, trainable=False, input_shape=(100,)))
model.summary()
This prints zero trainable parameters, and 1010 non-trainable parameters.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 10) 1010
=================================================================
Total params: 1,010
Trainable params: 0
Non-trainable params: 1,010
_________________________________________________________________
Now if you set the layer as trainable with model.layers[0].trainable = True
then it prints:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 10) 1010
=================================================================
Total params: 1,010
Trainable params: 1,010
Non-trainable params: 0
_________________________________________________________________
Now all parameters are trainable and there are zero non-trainable parameters. But there are also layers that have both trainable and non-trainable parameters, one example is the BatchNormalization layer, where the mean and standard deviation of the activations is stored for use while test time. One example:
model.add(BatchNormalization())
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 10) 1010
_________________________________________________________________
batch_normalization_1 (Batch (None, 10) 40
=================================================================
Total params: 1,050
Trainable params: 1,030
Non-trainable params: 20
_________________________________________________________________
This specific case of BatchNormalization has 40 parameters in total, 20 trainable, and 20 non-trainable. The 20 non-trainable parameters correspond to the computed mean and standard deviation of the activations that is used during test time, and these parameters will never be trainable using gradient descent, and are not affected by the trainable flag.
Non-trainable parameters are quite a broad subject. A straightforward example is to consider the case of any specific NN model and its architecture.
Say we have already setup your network definition in Keras, and your architecture is something like 256->500->500->1. Based on this definition, we seem to have a Regression Model (one output) with two hidden layers (500 nodes each) and an input of 256.
One non-trainable parameters of your model is, for example, the number of hidden layers itself (2). Other could be the nodes on each hidden layer (500 in this case), or even the nodes on each individual layer, giving you one parameter per layer plus the number of layers itself.
These parameters are "non-trainable" because you can't optimize its value with your training data. Training algorithms (like back-propagation) will optimize and update the weights of your network, which are the actual trainable parameters here (usually several thousands, depending on your connections). Your training data as it is can't help you determine those non-trainable parameters.
However, this does not mean that numberHiddenLayers is not trainable at all, it only means that in this model and its implementation we are unable to do so. We could make numberHiddenLayers trainable; the easiest way would be to define another ML algorithm that takes this model as input and trains it with several values of numberHiddenLayers. The best value is obtained with the model that outperformed the others, thus optimizing the numberHiddenLayers variable.
In other words, non-trainable parameters of a model are those that you will not be updating and optimized during training, and that have to be defined a priori, or passed as inputs.
It is clear that if you freeze any layer of the network. all params on that frozen layer turn to non-trainable. On the other hand if you design your network from the scratch, it might have some non-trainable parameters too. For instance batchnormalization layer has 4 parameter which are;
[gamma weights, beta weights, moving_mean, moving_variance]
The first two of them are trainable but last two are not. So the batch normalization layer is highly probably the reason that your custom network has non-trainable paramteres.