cannot fine-tune a Keras model with 4 VGG16 - tensorflow

I build a model with 4 VGG16 (not including the top) and then concatenate the 4 outputs from the 4 VGG16 to form a dense layer, which is followed by a softmax layer, so my model has 4 inputs (4 images) and 1 output (4 classes).
I first do the transfer learning by just training the dense layers and freezing the layers from VGG16, and that works fine.
However, after unfreeze the VGG16 layers by setting layer.trainable = True, I get the following errors:
tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018󈚦󈚸 23:12:28.501894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:0a:00.0
totalMemory: 11.93GiB freeMemory: 11.71GiB
2018󈚦󈚸 23:12:28.744990: I
tensorflow/stream_executor/cuda/cuda_dnn.cc:444] could not convert BatchDescriptor {count: 0 feature_map_count: 512 spatial: 14 14 value_min: 0.000000 value_max: 0.000000 layout: BatchDepthYX} to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
Then I follow the solution in this page and set os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'. The first error above is gone, but I still get the second error:
keras tensorflow/stream_executor/cuda/cuda_dnn.cc:444 could not convert BatchDescriptor to cudnn tensor descriptor: CUDNN_STATUS_BAD_PARAM
If I freeze the VGG16 layers again, then the code works fine. In other works, those errors only occur when I set the VGG16 layers trainable.
I also build a model with only 1 VGG16, and that model also works fine.
So, in summary, only when I unfreeze the VGG16 layers in a model with 4 VGG16, I get those errors.
Any ideas how to fix this?

It turns out that it has nothing to do the number of VGG16 in the model. The problem is due to the batch size.
When I said the model with 1 VGG16 worked, that model used batch size 8. And when I reduced the batch size smaller than 4 (either 1, 2, or 3), then the same errors happened.
Now I just use batch size 4 for the model with 4 VGG16, and it works fine, although I still don't know why it fails when batch size < 4 (probably it's related to the fact I'm using 4 GPUs).

Related

how to plot input and output shapes on top of each other using polt_model in keras

I want to plot my model using Keras.utils.plot_model function. my problem is that when I plot the model, the input and output shapes do not place on top of each other and instead will be put alongside each other (like figure 1).
Here is the code to plot this model:
model = tf.keras.models.Sequential()
model.add(layers.Embedding(100, 128, input_length=45,
input_shape=(45,), name='embed'))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=False)
but I like to have the model plot such as figure 2 which is the typical figure we can find on internet and I created it many times before.
I couldn't find any figsize or fontsize option in plot_model to try changing them. I use google Colaboratory Notebook.
Any help is very appreciated.
I also have the same issue and I finally found this github link.
github
Just because we're using tensorflow ver2.8.0, this problem seems to happen.
As mentioned in the link, one valid solution is to change our tensorflow version such as tf-nightly.
[tensorflow ver2.8.0]
import tensorflow as tf
tf.__version__
2.8.0
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1,input_shape=[1], name="input_layer")
],name="model_1")
model.compile(...)
[tensorflow nightly]
!pip --quiet install tf-nightly #try not to use tf ver2.8
import tensorflow as tf
tf.__version__
2.10.0-dev20220403
#just do the same thing as above
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1,input_shape=[1], name="input_layer")
],name="model_1")
model.compile(...)
I hope you solve this problem.
It is easy but using model sequence is more easily managed.
What are the embedded layers and dataset buffers !?
It is batches of input, you manage the combination or number of batches!
( Using MS-word draws the graphs is faster or drawing tools, I use free office when study )
[ Codes ]:
import tensorflow as tf
from tensorflow.keras.utils import plot_model
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(100,), dtype='int32', name='input'),
tf.keras.layers.Embedding(output_dim=512, input_dim=100, input_length=100),
tf.keras.layers.LSTM(32),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid', name='output'),
])
dot_img_file = 'F:\\temp\\Python\\img\\001.png'
tf.keras.utils.plot_model(model, to_file=dot_img_file, show_shapes=True)
# <IPython.core.display.Image object>
input('...')
[ Output ]:
F:\temp\Python>python test_tf_plotgraph.py
2022-03-28 14:21:26.043715: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-28 14:21:26.645113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4565 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
...
...

Tensorflow 2: Get the number of trainable parameters in a Model from Model Garden (Zoo)

After choosing and downloading a model from TensorFlow 2 Detection Model Zoo, it can be loaded as followed:
import tensorflow as tf
model = tf.saved_model.load(f'./efficientdet_d0_coco17_tpu-32/saved_model/')
However, it looks like one cannot extract the number of trainable variables directly/indirectly from the model variable, according to this investigation.
Nevertheless, the model training can continue, with new data, as this is a typical use-case of a pre-trained model. There must be a way to get the number of trainable variables. But I don't know how.
I tried:
tf.trainable_variables
# AttributeError: module 'tensorflow' has no attribute 'trainable_variables'
Environment:
Tensorflow 2.7.0 (implying CUDA 11.2, cuDNN 8.1).
Windows 10 x64
Python 3.9.7
NVIDIA GeForce MX150, Compute capability: 6.1

Tensorflow gpu error: Dst tensor not initialized

It is my first time training a model on GPU. I am using tensorflow. I am getting an error: InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run AssignVariableOp: Dst tensor is not initialized. [Op:AssignVariableOp]
I have tried for solutions like reducing batch size, use tf-nightly but to no avail. I am using Nvidia GeForce GTX 1080 8 Gb. I am trying to train an image classification model using Keras Application(Xception).

tflite_convert ValueError Unknown layer BatchNorm

We are using
Tensorflow 1.14
Keras 2.1.2
GPU: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.86
for custom object detection using Mask-RCNN from this repo https://github.com/matterport/Mask_RCNN.
Now we trained a model successfully and its detecting objects on our desktop. Now, we want to generate tflite for mobile usage where we are facing below mentioned error:
ValueError: Unknown layer BatchNorm
Please note that we have created weights and keras model .h5 with different scripts
We have tried following code to convert keras model to tflite
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model_file( 'Save-Model8.h5')
tfmodel = converter.convert()
open ("model.tflite", "wb") .write(tfmodel)

SageMaker fails when using Multi-GPU with keras.utils.multi_gpu_model

Running AWS SageMaker with a custom model, the TrainingJob fails with an Algorithm Error when using Keras plus a Tensorflow backend in multi-gpu configuration:
from keras.utils import multi_gpu_model
parallel_model = multi_gpu_model(model, gpus=K)
parallel_model.compile(loss='categorical_crossentropy',
optimizer='rmsprop')
parallel_model.fit(x, y, epochs=20, batch_size=256)
This simple parallel model loading will fail. There is no further error or exception from CloudWatch logging. This configuration works properly on local machine with 2x NVIDIA GTX 1080, same Keras Tensorflow backend.
According to SageMaker documentation and tutorials the multi_gpu_model utility will work ok when Keras backend is MXNet, but I did not find any mention when the backend is Tensorflow with the same multi gpu configuration.
[UPDATE]
I have updated the code with the suggested answer below, and I'm adding some logging before the TrainingJob hangs
This logging repeats twice
2018-11-27 10:02:49.878414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-11-27 10:02:49.878462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-27 10:02:49.878471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2 3
2018-11-27 10:02:49.878477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y Y Y
2018-11-27 10:02:49.878481: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N Y Y
2018-11-27 10:02:49.878486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: Y Y N Y
2018-11-27 10:02:49.878492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3: Y Y Y N
2018-11-27 10:02:49.879340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 14874 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0)
2018-11-27 10:02:49.879486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 14874 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0)
2018-11-27 10:02:49.879694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:2 with 14874 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0)
2018-11-27 10:02:49.879872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:3 with 14874 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
Before there is some logging info about each GPU, that repeats 4 times
2018-11-27 10:02:46.447639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
totalMemory: 15.78GiB freeMemory: 15.37GiB
According to the logging all the 4 GPUs are visible and loaded in the Tensorflow Keras backend. After that no application logging follows, the TrainingJob status is inProgress for a while, after that it becomes Failed with the same Algorithm Error.
Looking at CloudWatch logging I can see some metrics at work. Specifically GPU Memory Utilization, CPU Utilization are ok, while GPU utilization is 0%.
[UPDATE]
Due to a known bug on Keras that is about saving a multi gpu model, I'm using this override of the multi_gpu_model utility in keras.utils
from keras.layers import Lambda, concatenate
from keras import Model
import tensorflow as tf
def multi_gpu_model(model, gpus):
#source: https://github.com/keras-team/keras/issues/8123#issuecomment-354857044
if isinstance(gpus, (list, tuple)):
num_gpus = len(gpus)
target_gpu_ids = gpus
else:
num_gpus = gpus
target_gpu_ids = range(num_gpus)
def get_slice(data, i, parts):
shape = tf.shape(data)
batch_size = shape[:1]
input_shape = shape[1:]
step = batch_size // parts
if i == num_gpus - 1:
size = batch_size - step * i
else:
size = step
size = tf.concat([size, input_shape], axis=0)
stride = tf.concat([step, input_shape * 0], axis=0)
start = stride * i
return tf.slice(data, start, size)
all_outputs = []
for i in range(len(model.outputs)):
all_outputs.append([])
# Place a copy of the model on each GPU,
# each getting a slice of the inputs.
for i, gpu_id in enumerate(target_gpu_ids):
with tf.device('/gpu:%d' % gpu_id):
with tf.name_scope('replica_%d' % gpu_id):
inputs = []
# Retrieve a slice of the input.
for x in model.inputs:
input_shape = tuple(x.get_shape().as_list())[1:]
slice_i = Lambda(get_slice,
output_shape=input_shape,
arguments={'i': i,
'parts': num_gpus})(x)
inputs.append(slice_i)
# Apply model on slice
# (creating a model replica on the target device).
outputs = model(inputs)
if not isinstance(outputs, list):
outputs = [outputs]
# Save the outputs for merging back together later.
for o in range(len(outputs)):
all_outputs[o].append(outputs[o])
# Merge outputs on CPU.
with tf.device('/cpu:0'):
merged = []
for name, outputs in zip(model.output_names, all_outputs):
merged.append(concatenate(outputs,
axis=0, name=name))
return Model(model.inputs, merged)
This works ok on local 2x NVIDIA GTX 1080 / Intel Xeon / Ubuntu 16.04. It will fails on SageMaker Training Job.
I have posted this issue on AWS Sagemaker forum in
TrainingJob custom algorithm with Keras backend and multi GPU
SageMaker Fails when using Multi-GPU with
keras.utils.multi_gpu_model
[UPDATE]
I have slightly modified the tf.session code adding some initializers
with tf.Session() as session:
K.set_session(session)
session.run(tf.global_variables_initializer())
session.run(tf.tables_initializer())
and now at least I can see that one GPU (I assume device gpu:0) is used from the instance metrics. The multi-gpu does not work anyways.
This might not be the best answer for your problem, but this is what I am using for a multi-gpu model with Tensorflow backend. First i initialize using:
def setup_multi_gpus():
"""
Setup multi GPU usage
Example usage:
model = Sequential()
...
multi_model = multi_gpu_model(model, gpus=num_gpu)
multi_model.fit()
About memory usage:
https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory
"""
import tensorflow as tf
from keras.utils.training_utils import multi_gpu_model
from tensorflow.python.client import device_lib
# IMPORTANT: Tells tf to not occupy a specific amount of memory
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # dynamically grow the memory used on the GPU
sess = tf.Session(config=config)
set_session(sess) # set this TensorFlow session as the default session for Keras.
# getting the number of GPUs
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
num_gpu = len(get_available_gpus())
print('Amount of GPUs available: %s' % num_gpu)
return num_gpu
Then i call
# Setup multi GPU usage
num_gpu = setup_multi_gpus()
and create a model.
...
After which you're able to make it a multi GPU model.
multi_model = multi_gpu_model(model, gpus=num_gpu)
multi_model.compile...
multi_model.fit...
The only thing here that is different from what you are doing is the way Tensorflow is initializing the GPU's. I can't imagine it being the problem, but it might be worth trying out.
Good luck!
Edit: I noticed sequence to sequence not being able to work with multi GPU. Is that the type of model you are trying to train?
I apologize for the slow response.
It seems there are a lot of threads that are running in parallel, and I want to link them together, so that other individuals who have the same issue can see the progress and discussion going on.
https://forums.aws.amazon.com/thread.jspa?messageID=881541
https://forums.aws.amazon.com/thread.jspa?messageID=881540
https://github.com/aws/sagemaker-python-sdk/issues/512
There a few questions in regards to this.
What version of TensorFlow and Keras?
I am not too sure what is causing this problem. Does your container have all of the needed dependencies such as CUDA and etc? https://www.tensorflow.org/install/gpu
Were you able to train using single GPU with Keras?