I've been trying to train a model on AWS Sagemaker as I found that my computer is no longer powerful enough to train my model in a reasonable amount of time. However, when I tried to load the model (after copy pasting the code from my computer) I got an unexpected error.
After tinkering around for a little bit, I found that the very first Conv2D layer has a different output shape than it was on my computer.
Sagemaker output dimensions:
(None, 128, 498, 3)
Expected output dimensions:
(None, 498, 498, 3)
My code is below:
import tensorflow as tf
from tensorflow import keras
model = keras.models.Sequential()
model.add(keras.Input(shape = (500,500,3)))
model.add(keras.layers.Conv2D(filters=128, kernel_size = (3,3), activation='relu'))
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.summary()
How can I fix this?
I came here, because I had the same problem. I found the solution, but I am still really confused about it. I just want to mention that I use the same tensorflow version locally and on sagemaker (2.10). And on both EXACTLY the same code.
If you go https://keras.io/api/layers/convolution_layers/convolution2d/
It states:
"Output shape
4+D tensor with shape: batch_shape + (filters, new_rows, new_cols) if data_format='channels_first' or 4+D tensor with shape: batch_shape + (new_rows, new_cols, filters) if data_format='channels_last'. rows and cols values might have changed due to padding."
So I forced the Sagemaker's version to `data_format='channels_last'
Now both version the local one and AWS one are consistent.
`
Related
I'm using the following code to load an imagenet pre-trained VGG19 model and fit to my custom dataset.
from keras.applications.vgg19 import VGG19
optim = tf.keras.optimizers.RMSprop(momentum=0.9)
vgg19 = VGG19(include_top=False, weights='imagenet', input_tensor=tf.keras.layers.Input(shape=(224, 224, 3)))
vgg19.trainable = False
# x = keras.layers.GlobalAveragePooling2D()(model_vgg19_pt.output)
x = keras.layers.Flatten()(vgg19.output)
output = keras.layers.Dense(n_classes, activation='softmax')(x)
model_vgg19_pt = keras.models.Model(inputs=[vgg19.input], outputs=[output])
model_vgg19_pt.compile(optimizer=optim,
loss='categorical_crossentropy', metrics=['categorical_accuracy'])
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
model_vgg19_pt.fit(x_train, y_train, batch_size=20,
epochs=50, callbacks=[callback]
)
on model.fit() line, I get the following error
KeyError: 'The optimizer cannot recognize variable dense_1/kernel:0. This usually means you are trying to call the optimizer to update different parts of the model separately. Please call optimizer.build(variables) with the full list of trainable variables before the training loop or use legacy optimizer `tf.keras.optimizers.legacy.{self.class.name}.'
What does it mean and how can I fix it?
I get the same errors for
keras.applications.inception_v3
too, when using the same implementation method.
Additionally, this was working with jupyter notebook file on tensorflow cpu, but when running on a remote machine with tensorflow-gpu installed, I'm getting these errors.
This works fine with optimizer SGD, but not with RMSprop. why?
Additional
Using this:
model_vgg19_pt.compile(optimizer=tf.keras.optimizers.RMSprop(momentum=0.9),
loss='categorical_crossentropy', metrics=['categorical_accuracy'])
instead as used above works. But can somebody explain why....
Which version of Tensorflow GPU have you installed? TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Please check the link to install TensorFlow by following all the Hardware/Software requirements for the GPU support.
The LearningRateScheduler arguments in callback is not defined which you are passing while model compilation.
I was able to train the model after removing the callback from model.fit(). (Attaching the gist here for your reference)
I'm trying to resize some images to use them with Inception. I want to do it as a separate preprocessing step to speed things up later. Running tf.image.resize on all of the images at once crashes, as does a loop.
I'd like to do it in batches, but I don't know how to do that with making it part of my neural network model - I want to keep it outside as a separate preprocessing step.
I made this:
inception = tf.keras.applications.inception_v3.InceptionV3(include_top=True, input_shape=(299, 299, 3))
inception = tf.keras.Model([inception.input], [inception.layers[-2].output])
for layer in inception.layers:
layer.trainable = False
resize_incept = tf.keras.Sequential([
tf.keras.layers.Resizing(299, 299),
inception])
resize_incept.compile()
So can I just call it on my images? But then how do I batch it? When I call it like resize_incept(images) it crashes (too big), but if I call resize_incept(images, batch_size = 25), it doesn't work (TypeError: call() got an unexpected keyword argument 'batch_size').
EDIT: I'm trying to figure out if I can use tf.data.dataset for this
EDIT2: I've put my data (which was an array of (batch, 32, 32, 3) into tf.data.dataset so I can try this:
train_dataset = train_dataset.map(lambda x, y: (resize_incept(x), y))
But when I try it, it give me this error:
ValueError: Input 0 is incompatible with layer model_1: expected shape=(None, 299, 299, 3), found shape=(299, 299, 3)
The problem seems to be that whatever is coming out of the resize layer is somehow wrong for going into the inception layer (because what I'm putting in at first is (32,32,3) and there are no complaints about those dimensions)? But the inception layer, already has input_shape=(299, 299, 3) so I would think that's what shape it would take?
If you want it as a seperate preprocessing step. I would recommend to use the image_dataset_from_directory() function included in keras preprocesssing.
tf.keras.preprocessing.image_dataset_from_directory(
directory,
labels="inferred",
label_mode="int",
class_names=None,
color_mode="rgb",
batch_size=32,
image_size=(256, 256),
shuffle=True,
seed=None,
validation_split=None,
subset=None,
interpolation="bilinear",
follow_links=False,
crop_to_aspect_ratio=False,
**kwargs)
You can manipulate the batch size and the wanted image size and choose wether to split the data or not. Also, be carefull with crop_to_aspect_ratio as you may lose important features from the cropped areas.
If you want to know more about it's parameters and a code example you could check the
image_dataset_from_directory documentation.
Good luck!
I'd like to construct a network in the Tensorflow V2 object detection API using 5-channel images. However, I am stuck on how to modify the weights of the first convolutional layer using the Tensorflow 2.2 framework.
I have downloaded the pre-trained RetinaNet from the V2 Model Zoo. I then tried the following to modify the weights in the first layer of the checkpoint and save them back:
tf_path = tf.train.latest_checkpoint('./RetinaNet/checkpoint/')
init_vars = tf.train.list_variables(tf_path)
tf_vars = {}
for name, shape in init_vars:
array = tf.train.load_variable(tf_path, name)
try:
if shape[2]==3:#look for a layer who's 3rd input dimension is 3 i.e. the 1st convolutional layer
array=np.concatenate((array,array[:,:,:2,:]),axis=2)
array=array.astype('float32')
tf_vars[name]=tf.Variable(array)
else:
tf_vars[name]=tf.Variable(array)
except:
tf_vars[name]=tf.Variable(array)
saver = tf.compat.v1.train.Saver(var_list=tf_vars)
sess = tf.compat.v1.Session()
saver.save(sess, './RetinaNet/checkpoint/ckpt-0')
I loaded the model back in to make sure the 1st convolutional layer had been changed - all looks ok.
But when I go to train the model, I get the following error:
Model was constructed with shape (None, None, None, 3) for input Tensor("input_1:0", shape=(None, None, None, 3), dtype=float32), but it was called on an input with incompatible shape (64, 128, 128, 5)
Which leads me to believe my method of modifying the weights is not so "ok" after all. Would anyone have some tips on how to modify these weights correctly?
Thanks
This now works but the solution is very hacky... it also means not training from the pretrained weights from the model zoo - so you need to comment everything to do with the fine_tune_checkpoint in the config file.
Then, go to .\Lib\site-packages\official\vision\image_classification\efficientnet and change the number of input channels and number of classes in efficientnet_model.py and efficientnet_config.py.
I trained a simple MLP model using new tf.keras version 2.2.4-tf. Here is how the model look like:
input_layer = Input(batch_shape=(138, 28))
first_layer = Dense(30, activation=activation, name="first_dense_layer_1")(input_layer)
first_layer = Dropout(0.1)(first_layer, training=True)
second_layer = Dense(15, activation=activation, name="second_dense_layer")(first_layer)
out = Dense(1, name='output_layer')(second_layer)
model = Model(input_layer, out)
I'm getting an error when I try to do prediction prediction_result = model.predict(test_data, batch_size=138). The test_data has shape of (69, 28), so it is smaller than the batch_size which is 138. Here is the error, it seems like the issue comes from first dropout layer:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [138,30] vs. [69,30]
[[node model/dropout/dropout/mul_1 (defined at ./mlp_new_tf.py:471) ]] [Op:__inference_distributed_function_1700]
The same solution works with no issues in older version of keras (2.2.4) and tensorflow (1.12.0). How can I fix the issue? I don't have more data for test, so I can't change the test_data set to have more data points!
Since you are seeing the issue at prediction time, one way of getting around this would be to pad the test data to be a multiple of your batch size. It shouldn't slow down the prediction since the number of batches doesn't change. numpy.pad should do the trick.
I've been trying to build a sequential model in Keras using the pooling layer tf.nn.fractional_max_pool. I know I could try making my own custom layer in Keras, but I'm trying to see if I can use the layer already in Tensorflow. For the following code snippet:
p_ratio=[1.0, 1.44, 1.44, 1.0]
model = Sequential()
model.add(ZeroPadding2D((2,2), input_shape=(1, 48, 48)))
model.add(Conv2D(320, (3, 3), activation=PReLU()))
model.add(ZeroPadding2D((1,1)))
model.add(Conv2D(320, (3, 3), activation=PReLU()))
model.add(InputLayer(input_tensor=tf.nn.fractional_max_pool(model.layers[3].output, p_ratio)))
I get this error. I've tried some other things with Input instead of InputLayer and also the Keras Functional API but so far no luck.
Got it to work. For future reference, this is how you would need to implement it. Since tf.nn.fractional_max_pool returns 3 tensors, you need to get the first one only:
model.add(InputLayer(input_tensor=tf.nn.fractional_max_pool(model.layers[3].output, p_ratio)[0]))
Or using Lambda layer:
def frac_max_pool(x):
return tf.nn.fractional_max_pool(x,p_ratio)[0]
With the model implementation being:
model.add(Lambda(frac_max_pool))