How much VRAM do I need for training a Keras model - tensorflow

Hello I recently started my bachelor Project in which I need to train a LSTM model which has following structure
inputs (InputLayer) [(None, None, 80)] 0
masking (Masking) (None, None, 80) 0
lstm (LSTM) (None, None, 100) 72400
outputs (TimeDistributed) (None, None, 39) 3939
=================================================================
Total params: 76,339
Trainable params: 76,339
Non-trainable params: 0
inputs = tf.keras.Input(shape=(None,nb_features), name = 'inputs')
x = tf.keras.layers.Masking(mask_value = data.MASK_VALUE)(inputs)
x = tf.keras.layers.LSTM(hidden_units,
return_sequences = True,
dropout = dropout_rate)(x)
dense = tf.keras.layers.Dense(nb_skills, activation = 'sigmoid')
outputs = tf.keras.layers.TimeDistributed(dense, name = 'outputs')(x)
Now my qeustion how much VRAM I need for training. I have a 2080ti with 11GB of VRAM inside my Station and I dont know if that is enough. Maybe there is some helpful site which calculates something like that. I try to find it myself but I dont think that something like that exists yet.
Thanks for your help

11GB VRAM should be enough. I have trained a 400K+ parameter model with 3 LSTM layers that easily fits inside my 4GB VRAM. If are using GPU, you can also use Tensorboard profiling to view the memory profile of the model when you start training it. There you can see your peak VRAM usage on the GPU device.

Related

how to calculate the confidence of a softmax layer

I am working on a multi-class computer vision classification task and using a CNN with FC layers stacked on top using softmax activation, the problem is that lets say im classifying animals categories, if i predicted what a rock image is it will return a high probability for the most similar category of animals due to using softmax activation that returns a probabilistic distribution compressed between 0 and 1. what can i use to determine the confidence of my models probability output to say whether i can rely on these probabilities or not.
PS:I dont want to add a no_label class
Is it possible using keras functional api to have 2 outputs of the model the pre_softmax and the softmax output without updating the weights according to a linear activation which is the pre_softmax layer since the training would be affected
Is it possible using keras functional api to have 2 outputs of the model the pre_softmax and the softmax output without updating the weights according to a linear activation which is the pre_softmax layer since the training would be affected
Yes. You can do it like this
input = tf.keras.layers.Input((128,128,3))
x = tf.keras.layers.Conv2D(32,3)(input)
x = tf.keras.layers.MaxPooling2D()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(128)(x)
non_softmax_output = tf.keras.layers.Dense(10)(x)
softmax_output = tf.keras.layers.Softmax()(non_softmax_output)
model = tf.keras.models.Model(inputs=input,outputs=[non_softmax_output,softmax_output])
model.summary()
>>>
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 128, 128, 3)] 0
conv2d_1 (Conv2D) (None, 126, 126, 32) 896
max_pooling2d_1 (MaxPooling (None, 63, 63, 32) 0
2D)
flatten_1 (Flatten) (None, 127008) 0
dense_23 (Dense) (None, 128) 16257152
dense_24 (Dense) (None, 10) 1290
softmax (Softmax) (None, 10) 0
=================================================================
Total params: 16,259,338
Trainable params: 16,259,338
Non-trainable params: 0
_________________________________________________________________
The easier alternative is to just work with the predictions from the softmax layer. You don't gather much from the linear layer without the activation. Those weights by themselves do not mean much. You could instead define a function outside the model that changes the predictions based on some threshold value
Assume you define only 1 output in the above model with a softmax layer. You can define a function like this to get predictions based on some threshold value you choose
def modify_predict(test_images,threshold):
predictions = model.predict(test_images)
max_values = np.max(predictions,axis=1)
labels = np.argmax(predictions,axis=1)
new_predictions = np.where(max_values > threshold, labels, 999) #You can use any indicator here instead of 999 for your no_label class
return new_predictions
On the first part of your question, the only way you can know how your
model will behave on non-animal pictures is by having non-animal pictures
in your data.
There are two options
The first is to include non-animal pictures in the training set (and dev and test sets), and to train the model to distinguish between animal / non-animal.
You could either build a separate binary classification model to distinguish animal/non-animal (as alrady suggesetd in comments), or you could integrate it into one model by having a
'non-animal' class. (Although I recognise you indicate this last option is
not something you want to do).
The second is to include non-animal pictures in the dev and test sets, but not in the training set. You can't then train the model to distinguish between animal and non-animal, but you can at least measure how it behaves on
non-animal pictures, and perhaps create some sort of heuristic for selecting only some of your model's predictions. This seems like a worse option to me, even though it's generally accepted that dev and test sets can come from a different distribution to the training set. It's something one might do if there were only a small number of non-animal pictures available, but that surely can't be the case here.
There is, for example, a large labelled image database
available at https://www.image-net.org/index.php

Keras output names are incorrect when tf ops are used as outputs

I'm having trouble setting output names correctly in a Keras model.
The use case here is a Tensorflow Serving model, which names inputs and outputs based on the layer names.
Inputs are easy enough to name. But outputs, if they aren't instances of keras.Layer, don't seem to have their names properly set as the output names in the model.
See the following example:
import tensorflow as tf
import tensorflow.keras as keras
input_0 = tf.keras.Input(shape=(10,), name="my_input_0")
x = keras.layers.Dense(units=1)(input_0)
output_0 = tf.math.log(x, name="my_output_0")
output_1 = tf.math.exp(x, name="my_output_1")
inputs = {
"my_input_0": input_0
}
outputs = {
"my_output_0": output_0,
"my_output_1": output_1
}
model = keras.Model(inputs, outputs)
model.summary()
The model summary has the correct name for the input layer, but does not have the correct names for either output layer, despite the fact that the name was specified both in the output dict keys, as well as the layer name itself.
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
my_input_0 (InputLayer) [(None, 10)] 0 []
dense_1 (Dense) (None, 1) 11 ['my_input_0[0][0]']
tf.math.log_1 (TFOpLambda) (None, 1) 0 ['dense_1[0][0]']
tf.math.exp_1 (TFOpLambda) (None, 1) 0 ['dense_1[0][0]']
==================================================================================================
Total params: 11
Trainable params: 11
Non-trainable params: 0
__________________________________________________________________________________________________
It's possible to work around this issue by wrapping the outputs in keras.Layer():
output_0 = keras.layers.Layer(name="my_output_0")(tf.math.log(x))
output_1 = keras.layers.Layer(name="my_output_1")(tf.math.exp(x))
But this adds extra layers to the model; I imagine the runtime cost is negligible, but it feels ugly and clogs up the model summary.
Is there a better way to accomplish this output naming? Remember, the core issue is that a Tensorflow Serving model sets the output names based on the output layer names, which are the same as those displayed in the summary.
One solution would be to use model.layers._name for every layer you want to rename.
For example:
model.layers[2]._name="my_output_0"
model.summary()
This should give the output:

Change fully convolutional network input shape in TF 2.3 and tf.keras

I'm working with tensorflow 2.3 and tf.keras
I've trained a network on images with input shape (None,120,120,12) . Actually I've also been able to train the model while declaring the input as (None,128,128,12) while feeding (None,120,120,12) batches because of a coding error. TF just printed out a warning and didn't care. This wasn't the behavior in previous versions. My network has only convolutional layers and, if the input size has enough powers of 2 considering the depth, it provides an output image of the same shape as the input, it has only convolutional layers.
I've finally fully trained this model and I'd like to apply it also to images of different shape. Is there a proper way to change the input shape of my trained model? Or should I define a new model and then copy the weights layer by layer? Or should I just forget about it and just accept the warnings and forget about them since it works anyway?
ah. You again. I think your problem is basically simple. Once you train your model with an input size. If you want to run the model, the input must be the same shape. However, if you want to take advantage of the trained model and believe that the features have learnt is not much different, then you can apply transferlearning, and of course, retrain it again. You don't have to copy weights, just freeze the model and train only the input and output. You can check this for some basic example with your VGG
base_model = tensorflow.keras.applications.VGG19(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(224, 224, 3),
include_top=False)
base_model.trainable = False
inputs = layers.Input(shape=150,150,3)
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 150, 150, 3)] 0
_________________________________________________________________
vgg19 (Model) multiple 20024384
_________________________________________________________________
global_average_pooling2d (Gl (None, 512) 0
_________________________________________________________________
dense (Dense) (None, 1) 513
=================================================================

Tensorflow super simple model? 10 inputs, 1 output, so 11 trainable parameters

I am a little new to Tensorflow, I'm using TensorflowJS, but feel free to post your Python code.
What I am trying to achieve is the following:
I want to train a simple model of 10 inputs and 1 output.
I have 10 inputs of consistent dimensions [255,255].
The output should be of size [255,255] aswell, and should add each of the inputs according to some weights. So there will be 10 weights (+bias), the output is simply a lineair combination of the inputs.
I want to train these 10 weights so the result is as close as possible to a validation matrix of size [255,255]. I think the absoluteDifference as a loss function is best for this.
However, I have no idea how to make this trainable model in Tensorflow? So far this is what I got:
const model = tf.sequential();
model.add(tf.layers.dense({inputShape: [255,255], units: 10, activation: 'relu'}));
/* Prepare the model for training: Specify the loss and the optimizer. */
model.compile({loss: 'absoluteDifference', optimizer: 'momentum'});
In python it would be something this:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(255, 255, 10)), # 10 inputs of 255x255
keras.layers.Dense(9, activation='relu'),
keras.layers.Dense(1, activation='sigmoid') #assuming it's binary classification, we use sigmoid
])
model.compile(optimizer='adam',
loss=tf.losses.BinaryCrossentropy(from_logits=True))
Quick note that in TF 2.0, absolutedifference loss does not exist. You'd have to use TF 1.X
You can go through a detailed example of it in TF Documentation
EDIT:
Model Summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_3 (Flatten) (None, 650250) 0
_________________________________________________________________
dense_5 (Dense) (None, 9) 5852259
_________________________________________________________________
dense_6 (Dense) (None, 1) 10
=================================================================
Total params: 5,852,269
Trainable params: 5,852,269
Non-trainable params: 0
_________________________________________________________________

Sentiment classifier training with Keras

I am using keras (backend tensorflow) to classify sentiments from Amazon review.
It starts with an embedding layer (which uses GloVe), then LSTM layer and finally a Dense layer as output. Model summary below:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, None, 100) 2258700
_________________________________________________________________
lstm_1 (LSTM) (None, 16) 7488
_________________________________________________________________
dense_1 (Dense) (None, 5) 85
=================================================================
Total params: 2,266,273
Trainable params: 2,266,273
Non-trainable params: 0
_________________________________________________________________
Train on 454728 samples, validate on 113683 samples
When training the train and eval accuracy is about 74% and loss (train and eval) around 0.6.
I've tried with changing amount of elements in LSTM layer, as well as including dropout, recurrent dropout, regularizer, and with GRU (instead of LSTM). Then the accuracy increased a bit (~76%).
What else could I try in order to improve my results?
I have had a great a better success with sentiment analysis using Bidirectional LSTM also stacking two layers vertically i.e 2 LSTMS together forming a deep network also helped and try to increase the number of lstm elements to be around 128.