Dimension does not match when using `keras.Model.fit` in `BERT` of tensorflow - tensorflow

I follow the instruction of Fine-tuning BERT to build a model with my own dataset(It is kind of large, and greater than 20G), then take steps to re-cdoe my data and load them from tf_record files.
The training_dataset I create has the same signature as that in the instruction
training_dataset.element_spec
({'input_word_ids': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None),
'input_mask': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None),
'input_type_ids': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None)},
TensorSpec(shape=(32,), dtype=tf.int32, name=None))
where batch_size is 32, max_seq_length is 1024.
As the instruction suggestes,
The resulting tf.data.Datasets return (features, labels) pairs, as expected by keras.Model.fit
It semms that everything works as expected,(the instruction does not show how to use training_dataset though ) However, the following code
bert_classifier.fit(
x = training_dataset,
validation_data=test_dataset, # has the same signature just as training_dataset
batch_size=32,
epochs=epochs,
verbose=1,
)
encouters an error that seems weird to me,
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/captain/project/dataload/train.py", line 81, in <module>
verbose=1,
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
*args, **kwds))
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/home/captain/.local/lib/python3.7/site-packages/official/nlp/keras_nlp/layers/position_embedding.py:88 call *
return tf.broadcast_to(position_embeddings, input_shape)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:845 broadcast_to **
"BroadcastTo", input=input, shape=shape, name=name)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
attrs=attr_protos, op_def=op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:592 _create_op_internal
compute_device)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3536 _create_op_internal
op_def=op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2016 __init__
control_input_ops, op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1856 _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 512 and 1024 for '{{node bert_classifier/bert_encoder_1/position_embedding/BroadcastTo}} =
BroadcastTo[T=DT_FLOAT, Tidx=DT_INT32](bert_classifier/bert_encoder_1/position_embedding/strided_slice_1, bert_classifier/bert_encoder_1/position_embedding/Shape)'
with input shapes: [512,768], [3] and with input tensors computed as partial shapes: input[1] = [32,1024,768].
There is nothing to do with 512, and I didn't use 512 thorough my code. So where is wrong with my code and how to fix that?

They created the bert_classifier based on bert_config_file loaded from bert_config.json
bert_classifier, bert_encoder = bert.bert_models.classifier_model(bert_config, num_labels=2)
bert_config.json
{
'attention_probs_dropout_prob': 0.1,
'hidden_act': 'gelu',
'hidden_dropout_prob': 0.1,
'hidden_size': 768,
'initializer_range': 0.02,
'intermediate_size': 3072,
'max_position_embeddings': 512,
'num_attention_heads': 12,
'num_hidden_layers': 12,
'type_vocab_size': 2,
'vocab_size': 30522
}
According to this config, hidden_size is 768 and max_position_embeddings is 512 so your input data used to feed to BERT model must be the same shape as described. It explains the reason why you are getting the shape-mismatched issue.
Therefore, to make it work, you have to change all lines for creating tensor inputs from 1024 to 512.

Related

InvalidArgumentError: Graph execution error: Node: 'categorical_crossentropy/remove_squeezable_dimensions/Squeeze'

I want to train U-Net for a semantic segmentation task using cross entropy as a loss function. I have RGB images and masks. The mask is one hot encoded. When I try to train the model, I got this error.
seed=123
batch_size= 32
n_classes=2
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
#Define a function to perform additional preprocessing after datagen.
#For example, scale images, convert masks to categorical, etc.
def preprocess_data(img, mask, num_class):
#Scale images
img = img / 255. #This can be done in ImageDataGenerator but showing it outside as an example
#Convert mask to one-hot
labelencoder = LabelEncoder()
n, h, w, c = mask.shape
print(mask.shape)
mask = mask.reshape(-1,1)
mask = labelencoder.fit_transform(mask)
mask = mask.reshape(n, h, w, c)
mask = to_categorical(mask, num_class)
print(mask.shape)
return (img, mask)
#Define the generator.
#We are not doing any rotation or zoom to make sure mask values are not interpolated.
#It is important to keep pixel values in mask as 0, 1, 2, 3, .....
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def trainGenerator(train_img_path, train_mask_path, num_class):
img_data_gen_args = dict(horizontal_flip=True,
vertical_flip=True,
fill_mode='reflect')
image_datagen = ImageDataGenerator(**img_data_gen_args)
mask_datagen = ImageDataGenerator(**img_data_gen_args)
image_generator = image_datagen.flow_from_directory(
train_img_path,
class_mode = None,
target_size=(256,256),
batch_size = batch_size,
seed = seed)
mask_generator = mask_datagen.flow_from_directory(
train_mask_path,
class_mode = None,
target_size=(256,256),
batch_size = batch_size,
seed = seed)
train_generator = zip(image_generator, mask_generator)
for (img, mask) in train_generator:
img, mask = preprocess_data(img, mask, num_class)
yield (img, mask)`
model.compile(optimizer='adam', loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['accuracy'])
model.fit(train_img_gen,
steps_per_epoch=steps_per_epoch,
epochs=3,
verbose=1,
validation_data=val_img_gen,
validation_steps=val_steps_per_epoch)
--------------------------------------------------------------------------- InvalidArgumentError Traceback (most recent call
last) Input In [131], in <cell line: 1>()
----> 1 history=model.fit(train_img_gen,
2 steps_per_epoch=steps_per_epoch,
3 epochs=3,
4 verbose=1,
5 validation_data=val_img_gen,
6 validation_steps=val_steps_per_epoch)
File
/opt/conda/lib/python3.9/site-packages/keras/utils/traceback_utils.py:67,
in filter_traceback..error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.traceback)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
File
/opt/conda/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:54,
in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
52 try:
53 ctx.ensure_initialized()
---> 54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
55 inputs, attrs, num_outputs)
56 except core._NotOkStatusException as e:
57 if name is not None:
InvalidArgumentError: Graph execution error:
Detected at node
'categorical_crossentropy/remove_squeezable_dimensions/Squeeze'
defined at (most recent call last):
File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/lib/python3.9/site-packages/ipykernel_launcher.py", line
17, in
app.launch_new_instance()
File "/opt/conda/lib/python3.9/site-packages/traitlets/config/application.py",
line 972, in launch_instance
app.start()
File "/opt/conda/lib/python3.9/site-packages/ipykernel/kernelapp.py", line
712, in start
self.io_loop.start()
File "/opt/conda/lib/python3.9/site-packages/tornado/platform/asyncio.py",
line 199, in start
self.asyncio_loop.run_forever()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 596, in run_forever
self._run_once()
File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1890, in _run_once
handle._run()
File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.9/site-packages/ipykernel/kernelbase.py", line
504, in dispatch_queue
await self.process_one()
File "/opt/conda/lib/python3.9/site-packages/ipykernel/kernelbase.py", line
493, in process_one
await dispatch(*args)
File "/opt/conda/lib/python3.9/site-packages/ipykernel/kernelbase.py", line
400, in dispatch_shell
await result
File "/opt/conda/lib/python3.9/site-packages/ipykernel/kernelbase.py", line
724, in execute_request
reply_content = await reply_content
File "/opt/conda/lib/python3.9/site-packages/ipykernel/ipkernel.py", line
383, in do_execute
res = shell.run_cell(
File "/opt/conda/lib/python3.9/site-packages/ipykernel/zmqshell.py", line
528, in run_cell
return super().run_cell(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/IPython/core/interactiveshell.py",
line 2881, in run_cell
result = self._run_cell(
File "/opt/conda/lib/python3.9/site-packages/IPython/core/interactiveshell.py",
line 2936, in _run_cell
return runner(coro)
File "/opt/conda/lib/python3.9/site-packages/IPython/core/async_helpers.py",
line 129, in pseudo_sync_runner
coro.send(None)
File "/opt/conda/lib/python3.9/site-packages/IPython/core/interactiveshell.py",
line 3135, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/opt/conda/lib/python3.9/site-packages/IPython/core/interactiveshell.py",
line 3338, in run_ast_nodes
if await self.run_code(code, result, async=asy):
File "/opt/conda/lib/python3.9/site-packages/IPython/core/interactiveshell.py",
line 3398, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_68/434058273.py", line 1, in <cell line: 1>
history=model.fit(train_img_gen,
File "/opt/conda/lib/python3.9/site-packages/keras/utils/traceback_utils.py",
line 64, in error_handler
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/keras/engine/training.py",
line 1384, in fit
tmp_logs = self.train_function(iterator)
File "/opt/conda/lib/python3.9/site-packages/keras/engine/training.py",
line 1021, in train_function
return step_function(self, iterator)
File "/opt/conda/lib/python3.9/site-packages/keras/engine/training.py",
line 1010, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/opt/conda/lib/python3.9/site-packages/keras/engine/training.py",
line 1000, in run_step
outputs = model.train_step(data)
File "/opt/conda/lib/python3.9/site-packages/keras/engine/training.py",
line 860, in train_step
loss = self.compute_loss(x, y, y_pred, sample_weight)
File "/opt/conda/lib/python3.9/site-packages/keras/engine/training.py",
line 918, in compute_loss
return self.compiled_loss(
File "/opt/conda/lib/python3.9/site-packages/keras/engine/compile_utils.py",
line 201, in call
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/opt/conda/lib/python3.9/site-packages/keras/losses.py", line 141, in call
losses = call_fn(y_true, y_pred)
File "/opt/conda/lib/python3.9/site-packages/keras/losses.py", line 242, in call
y_pred, y_true = losses_utils.squeeze_or_expand_dimensions(y_pred, y_true)
File "/opt/conda/lib/python3.9/site-packages/keras/utils/losses_utils.py",
line 187, in squeeze_or_expand_dimensions
y_true, y_pred = remove_squeezable_dimensions(
File "/opt/conda/lib/python3.9/site-packages/keras/utils/losses_utils.py",
line 130, in remove_squeezable_dimensions
labels = tf.squeeze(labels, [-1]) Node: 'categorical_crossentropy/remove_squeezable_dimensions/Squeeze' Can
not squeeze dim[4], expected a dimension of 1, got 2 [[{{node
categorical_crossentropy/remove_squeezable_dimensions/Squeeze}}]]
[Op:__inference_train_function_34035]

LSTM - RuntimeError: Too many failed attempts to build model

I am trying to use keras tuner to tune an LSTM neural network to detect if an article is a fake news or not, using a kaggle dataset.
However, I keep getting this error: RuntimeError: Too many failed attempts to build model
I have also tried to use the RandomSearch rather than the BayesianOptimization, but still getting the same type of error.
This is the code:
'''
def build_model(hp):
voc_size=5000
embedding_vector_features=40
model = Sequential([
Embedding(
voc_size,
embedding_vector_features,
input_length = sent_length
),
AlphaDropout(
rate = hp.Choice(
'dropout_1_rate',
values=[0.3, 0.5],
default=0.3
)
),
LSTM(
units = hp.Int(
'LSTM_1_units',
min_value=100,
max_value=300,
step=32,
default=128
),
activation = hp.Choice(
'LSTM_1_activation',
values=['relu', 'selu']
),
kernel_initializer='lecun_normal'
),
AlphaDropout(
rate = hp.Choice(
'dropout_2_rate',
values=[0.3, 0.5],
default=0.3
)
),
LSTM(
units = hp.Int(
'LSTM_2_units',
min_value=100,
max_value=300,
step=32,
default=128
),
activation = hp.Choice(
'LSTM_2_activation',
values=['relu', 'selu']
),
kernel_initializer='lecun_normal'
),
AlphaDropout(
rate = hp.Choice(
'dropout_3_rate',
values=[0.3, 0.5],
default=0.3
)
),
Dense(
units = 1,
activation = 'sigmoid',
kernel_initializer='lecun_normal'
)
])
model.compile(
optimizer = keras.optimizers.Nadam(
hp.Choice(
'learning_rate',
values=[1e-2, 1e-3]
)
),
loss = 'binary_crooentropy',
metric = ['accuracy']
)
return model
tuner_search = BayesianOptimization(build_model,
objective='val_accuracy',
max_trials=3,
seed=42,
directory='output',
project_name='Fake News Classifier'
)
'''
When I try to run this code I get the following error:
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Invalid model 0/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 1/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 2/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 3/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 4/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 5/5
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py in build(self, hp)
103 with maybe_distribute(self.distribution_strategy):
--> 104 model = self.hypermodel.build(hp)
105 except:
19 frames
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py in build(self, hp)
111 if i == self._max_fail_streak:
112 raise RuntimeError(
--> 113 'Too many failed attempts to build model.')
114 continue
115
RuntimeError: Too many failed attempts to build model.
How can I solve the issue?
Actual error is ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
LSTM layer expects input shape inputs: A 3D tensor with shape [batch, timesteps, feature]
I could reproduce your issue
import tensorflow as tf
inputs = tf.random.normal([32, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)
Output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-160c5e8d5d9a> in <module>()
2 inputs = tf.random.normal([32, 8])
3 lstm = tf.keras.layers.LSTM(4)
----> 4 output = lstm(inputs)
5 print(output.shape)
2 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py in assert_input_compatibility(input_spec, inputs, layer_name)
217 'expected ndim=' + str(spec.ndim) + ', found ndim=' +
218 str(ndim) + '. Full shape received: ' +
--> 219 str(tuple(shape)))
220 if spec.max_ndim is not None:
221 ndim = x.shape.rank
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (32, 8)
Working code snippet
import tensorflow as tf
inputs = tf.random.normal([32, 10, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)
Output
(32, 4)

ValueError: can't split axis of size 85 into pieces of size [2,2,1,20] keras training

I got this error at the start of the training, what does it mean? there is no clear indication of what causes the error/the source of the problem ..............................................................................................................................................................
Epoch 1/10
Traceback (most recent call last):
File "train.py", line 199, in <module>
app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "train.py", line 194, in main
validation_data=val_dataset)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 71, in _method_wrapper
return method(self, *args, **kwargs)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 920, in fit
tmp_logs = train_function(iterator)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 608, in __call__
result = self._call(*args, **kwds)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 655, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 535, in _initialize
*args, **kwds))
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2447, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2775, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2665, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 446, in wrapped_fn
return weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/root/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:630 train_function *
return step_function(self, iterator)
/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:620 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
/root/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:952 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/root/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2292 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/root/.local/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2651 _call_for_each_replica
return fn(*args, **kwargs)
/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:613 run_step **
outputs = model.train_step(data)
/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:573 train_step
y, y_pred, sample_weight, regularization_losses=self.losses)
/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/compile_utils.py:204 __call__
loss_value = loss_obj(y_t, y_p, sample_weight=sw)
/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/losses.py:145 __call__
losses = self.call(y_true, y_pred)
/root/.local/lib/python3.6/site-packages/tensorflow/python/keras/losses.py:248 call
return self.fn(y_true, y_pred, **self._fn_kwargs)
/content/drive/My Drive/yolov3-tf2/yolov3_tf2/models.py:264 yolo_loss
y_pred, anchors, classes)
/content/drive/My Drive/yolov3-tf2/yolov3_tf2/models.py:155 yolo_boxes
pred, (2, 2, 1, classes), axis=-1)
/root/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py:1978 split
value=value, size_splits=size_splits, axis=axis, num_split=num, name=name)
/root/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py:9881 split_v
num_split=num_split, name=name)
/root/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:744 _apply_op_helper
attrs=attr_protos, op_def=op_def)
/root/.local/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py:595 _create_op_internal
compute_device)
/root/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:3470 _create_op_internal
op_def=op_def)
/root/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:1960 __init__
control_input_ops, op_def)
/root/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:1800 _create_c_op
raise ValueError(str(e))
ValueError: can't split axis of size 85 into pieces of size [2,2,1,20] for '{{node yolo_loss/split}} = SplitV[T=DT_FLOAT, Tlen=DT_INT32, num_split=4](model/layer_207_output_0/layer_206_lambda/Reshape, yolo_loss/Const, yolo_loss/split/split_dim)' with input shapes: [?,?,?,3,85], [4], [] and with computed input tensors: input[1] = <2 2 1 20>, input[2] = <-1>
this is due to not having same no of nodes: Here 85 cannot be convert or resize into 2*2*1*20 which is equal to 80
you can reshape or resize only node value are equal
In you case 85 is not equal to 2*2*1*20(80)

Tensorboard exception with summary.image of shape [-1, 125, 128, 1] of MFCCs

Following this guide, I'm converting a tensor [batch_size, 16000, 1] to an MFCC using the method described in the link:
def gen_spectrogram(wav, sr=16000):
# A 1024-point STFT with frames of 64 ms and 75% overlap.
stfts = tf.contrib.signal.stft(wav, frame_length=1024, frame_step=256, fft_length=1024)
spectrograms = tf.abs(stfts)
# Warp the linear scale spectrograms into the mel-scale.
num_spectrogram_bins = stfts.shape[-1].value
lower_edge_hertz, upper_edge_hertz, num_mel_bins = 80.0, 7600.0, 80
linear_to_mel_weight_matrix = tf.contrib.signal.linear_to_mel_weight_matrix(
num_mel_bins, num_spectrogram_bins,
sample_rate, lower_edge_hertz, upper_edge_hertz)
mel_spectrograms = tf.tensordot(spectrograms, linear_to_mel_weight_matrix, 1)
mel_spectrograms.set_shape(
spectrograms.shape[:-1].concatenate(
linear_to_mel_weight_matrix.shape[-1:]
)
)
# Compute a stabilized log to get log-magnitude mel-scale spectrograms.
log_mel_spectrograms = tf.log(mel_spectrograms + 1e-6)
# Compute MFCCs from log_mel_spectrograms and take the first 13.
return tf.contrib.signal.mfccs_from_log_mel_spectrograms(log_mel_spectrograms)[..., :13]
I then reshape the output of that to [batch_size, 125, 128, 1]. If I send that to a tf.layers.conv2d, things seem to work fine. However, if I try to tf.summary.image, I get the following error:
print(spec)
// => Tensor("spectrogram/Reshape:0", shape=(?, 125, 128, 1), dtype=float32)
tf.summary.image('spec', spec)
Caused by op u'spectrogram/stft/rfft', defined at:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/Users/rsilveira/rnd/ml-engine/trainer/flatv1.py", line 103, in <module>
runner.run(model_fn)
File "trainer/runner.py", line 88, in run
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
File "/Library/Python/2.7/site-packages/tensorflow/python/estimator/training.py", line 432, in train_and_evaluate
executor.run_local()
File "/Library/Python/2.7/site-packages/tensorflow/python/estimator/training.py", line 611, in run_local
hooks=train_hooks)
File "/Library/Python/2.7/site-packages/tensorflow/python/estimator/estimator.py", line 302, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/Library/Python/2.7/site-packages/tensorflow/python/estimator/estimator.py", line 711, in _train_model
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/Library/Python/2.7/site-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/Users/rsilveira/rnd/ml-engine/trainer/flatv1.py", line 53, in model_fn
spec = gen_spectrogram(x)
File "/Users/rsilveira/rnd/ml-engine/trainer/flatv1.py", line 22, in gen_spectrogram
step,
File "/Library/Python/2.7/site-packages/tensorflow/contrib/signal/python/ops/spectral_ops.py", line 91, in stft
return spectral_ops.rfft(framed_signals, [fft_length])
File "/Library/Python/2.7/site-packages/tensorflow/python/ops/spectral_ops.py", line 136, in _rfft
return fft_fn(input_tensor, fft_length, name)
File "/Library/Python/2.7/site-packages/tensorflow/python/ops/gen_spectral_ops.py", line 619, in rfft
"RFFT", input=input, fft_length=fft_length, name=name)
File "/Library/Python/2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/Library/Python/2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/Library/Python/2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Input dimension 4 must have length of at least 512 but got: 320
Not sure where to start troubleshooting this. What am I missing here?

Training model from logits and checkpoint

I am new in tensor flow and I am trying to train the mobile net_v1. To do that, I first created the tfrecords' file for multi-class from a txt file.( example : namefile label1 label2 ...)
import sys, os
import tensorflow as tf
import cv2
import numpy as np
import matplotlib.pyplot as plt
# function
def load_image(addr):
# read an image and resize to (224, 224)
# cv2 load images as BGR, convert it to RGB
img = cv2.imread(addr)
img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32)
return img
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[*value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def loadData(inputs):
addrs = []
labels = []
f = open(inputs, 'r')
data = [ln.split(' ') for ln in f ]
f.close()
print(data)
for i in range(0, len(data)):
addrs.append(data[i][0].rstrip())
l = []
for j in range(1,len(data[i])):
if(data[i][j].rstrip().isdigit() == True):
l.append(int(data[i][j].rstrip()))
print(l)
labels.append(l)
return addrs, labels
def CreateTrainFile(input_filename, train_filename,):
path = '/home/rd/Documents/RD2/Databases/Faces/'
# load file and label
train_addrs, train_labels = loadData(input_filename)
print(train_labels)
# open the TFRecords file
writer = tf.python_io.TFRecordWriter(train_filename)
for i in range(len(train_addrs)):
# print how many images are saved every 1000 images
if not i % 1000:
print('Train data: {}/{}'.format(i, len(train_addrs)))
sys.stdout.flush()
# Load the image
img = load_image(train_addrs[i])
label = train_labels[i]
print('label : ', _int64_feature(label))
# Create a feature
feature = {'train/label': _int64_feature(label),
'train/image': _bytes_feature(tf.compat.as_bytes(img.tostring()))}
# Create an example protocol buffer
example = tf.train.Example(features=tf.train.Features(feature=feature))
# Serialize to string and write on the file
writer.write(example.SerializeToString())
writer.close()
sys.stdout.flush()
# open the TFRecords file
def CreateValidationFile(val_filename):
writer = tf.python_io.TFRecordWriter(val_filename)
for i in range(len(val_addrs)):
# print how many images are saved every 1000 images
if not i % 1000:
print('Val data: {}/{}'.format(i, len(val_addrs)))
sys.stdout.flush()
# Load the image
img = load_image(val_addrs[i])
label = val_labels[i]
# Create a feature
feature = {'val/label': _int64_feature(label),
'val/image': _bytes_feature(tf.compat.as_bytes(img.tostring()))}
# Create an example protocol buffer
example = tf.train.Example(features=tf.train.Features(feature=feature))
# Serialize to string and write on the file
writer.write(example.SerializeToString())
writer.close()
sys.stdout.flush()
# open the TFRecords file
def CreateTestFile(test_filename):
writer = tf.python_io.TFRecordWriter(test_filename)
for i in range(len(test_addrs)):
# print how many images are saved every 1000 images
if not i % 1000:
print('Test data: {}/{}'.format(i, len(test_addrs)))
sys.stdout.flush()
# Load the image
img = load_image(test_addrs[i])
label = test_labels[i]
# Create a feature
feature = {'test/label': _int64_feature(label),
'test/image': _bytes_feature(tf.compat.as_bytes(img.tostring()))}
# Create an example protocol buffer
example = tf.train.Example(features=tf.train.Features(feature=feature))
# Serialize to string and write on the file
writer.write(example.SerializeToString())
writer.close()
sys.stdout.flush()
def ReadRecordFileTrain(data_path):
#data_path = 'train.tfrecords' # address to save the hdf5 file
with tf.Session() as sess:
feature = {'train/image': tf.FixedLenFeature([], tf.string),
'train/label': tf.FixedLenFeature([], tf.int64)}
# Create a list of filenames and pass it to a queue
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
# Define a reader and read the next record
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# Decode the record read by the reader
features = tf.parse_single_example(serialized_example, features=feature)
# Convert the image data from string back to the numbers
image = tf.decode_raw(features['train/image'], tf.float32)
# Cast label data into int32
label = tf.cast(features['train/label'], tf.int32)
# Reshape image data into the original shape
image = tf.reshape(image, [224, 224, 3])
# Any preprocessing here ...
# Creates batches by randomly shuffling tensors
images, labels = tf.train.shuffle_batch([image, label], batch_size=2, capacity=30, num_threads=1, min_after_dequeue=10)
return images, labels
def main():
train_filename = 'train.tfrecords' # address to save the TFRecords file
#test_filename = 'test.tfrecords' # address to save the TFRecords file
#val_filename = 'val.tfrecords' # address to save the TFRecords file
CreateTrainFile("data.txt", train_filename)
main()
and to read the tf records :
def ReadRecordFileTrain(data_path):
#data_path = 'train.tfrecords' # address to save the hdf5 file
with tf.Session() as sess:
feature = {'train/image': tf.FixedLenFeature([], tf.string),
'train/label': tf.FixedLenFeature([2], tf.int64)}
# Create a list of filenames and pass it to a queue
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
# Define a reader and read the next record
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# Decode the record read by the reader
features = tf.parse_single_example(serialized_example, features=feature)
# Convert the image data from string back to the numbers
image = tf.decode_raw(features['train/image'], tf.float32)
print('label1 :', features['train/label'] )
# Cast label data into int32
label = tf.cast(features['train/label'], tf.int32)
print('label load:', label)
# Reshape image data into the original shape
image = tf.reshape(image, [224, 224, 3])
# Any preprocessing here ...
# Creates batches by randomly shuffling tensors
images, labels = tf.train.batch([image, label], batch_size=2, capacity=30, num_threads=1)
return images, labels
I suppose it works but I am not sure ( I don't have any errors when I called these functions.)
Then, I load the model and its weight. Call the loss function and try to start the training, but it fails at this moment.
g = tf.Graph()
with g.as_default():
# size of the folder
inputs = tf.placeholder(tf.float32, [1, 224, 224, 3])
# load dataset
images, labels = ReadRecordFileTrain('train.tfrecords')
print('load dataset done')
print('labels = ', labels)
print('data = ', images)
print(tf.shape(labels))
# load network
network, end_points= mobilenet.mobilenet_v1(images, num_classes=2, depth_multiplier=0.25 )
print('load network done')
print('network : ', network)
variables_to_restore = slim.get_variables_to_restore(exclude=["MobilenetV1/Logits/Conv2d_1c_1x1"])
load_checkpoint = "modele_mobilenet_v1_025/mobilenet_v1_0.25_224.ckpt"
init_fn = slim.assign_from_checkpoint_fn(load_checkpoint, variables_to_restore)
print('custom network done')
# Specify the loss function:
tf.losses.softmax_cross_entropy(labels, network)
total_loss = tf.losses.get_total_loss()
#tf.scalar_summary('losses/total_loss', total_loss)
# Specify the optimization scheme:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)
# create_train_op that ensures that when we evaluate it to get the loss,
# the update_ops are done and the gradient updates are computed.
train_tensor = slim.learning.create_train_op(total_loss, optimizer)
print('loss and optimizer chosen')
# Actually runs training.
save_checkpoint = 'model/modelcheck'
# start training
learning = slim.learning.train(train_tensor, save_checkpoint, init_fn=init_fn, number_of_steps=1000)
The error message :
label1 : Tensor("ParseSingleExample/Squeeze_train/label:0", shape=(2,), dtype=int64)
label load: Tensor("Cast:0", shape=(2,), dtype=int32)
load dataset done
labels = Tensor("batch:1", shape=(2, 2), dtype=int32)
data = Tensor("batch:0", shape=(2, 224, 224, 3), dtype=float32)
Tensor("Shape:0", shape=(2,), dtype=int32)
load network done
network : Tensor("MobilenetV1/Logits/SpatialSqueeze:0", shape=(2, 2), dtype=float32)
custom network done
loss and optimizer chosen
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,2] rhs shape= [1,1,256,1]
[[Node: save_1/Assign_109 = Assign[T=DT_FLOAT, _class=["loc:#MobilenetV1/Logits/Conv2d_1c_1x1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Logits/Conv2d_1c_1x1/weights, save_1/RestoreV2_109)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 106, in <module>
main()
File "test.py", line 103, in main
learning = slim.learning.train(train_tensor, save_checkpoint, init_fn=init_fn, number_of_steps=1000)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 725, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 960, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 788, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 949, in managed_session
start_standard_services=start_standard_services)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 706, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 256, in prepare_session
config=config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 188, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1457, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,2] rhs shape= [1,1,256,1]
[[Node: save_1/Assign_109 = Assign[T=DT_FLOAT, _class=["loc:#MobilenetV1/Logits/Conv2d_1c_1x1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Logits/Conv2d_1c_1x1/weights, save_1/RestoreV2_109)]]
Caused by op 'save_1/Assign_109', defined at:
File "test.py", line 106, in <module>
main()
File "test.py", line 103, in main
learning = slim.learning.train(train_tensor, save_checkpoint, init_fn=init_fn, number_of_steps=1000)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 642, in train
saver = saver or tf_saver.Saver()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1056, in __init__
self.build()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1086, in build
restore_sequentially=self._restore_sequentially)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 155, in restore
self.op.get_shape().is_fully_defined())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/state_ops.py", line 270, in assign
validate_shape=validate_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 47, in assign
use_locking=use_locking, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,256,2] rhs shape= [1,1,256,1]
[[Node: save_1/Assign_109 = Assign[T=DT_FLOAT, _class=["loc:#MobilenetV1/Logits/Conv2d_1c_1x1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Logits/Conv2d_1c_1x1/weights, save_1/RestoreV2_109)]]
I don't understand where the problem comes from and how to solve it.
InvalidArgumentError: Assign requires shapes of both tensors to match.
lhs shape= [1,1,256,2] rhs shape= [1,1,256,1]
I used to get this error when my model saved in the model directory has conflicts with my current running model. Try deleting your model directory and start training again.
It seems to solve the error but now, when I want to execute it with a tf.Session it failed. I was wondering if the problem comes from my graph or am I doing something wrong in the tf.Session ?
def evaluation(logits, labels):
with tf.name_scope('Accuracy'):
# Operation comparing prediction with true label
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels, 1))
# Operation calculating the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Summary operation for the accuracy
#tf.scalar_summary('train_accuracy', accuracy)
return accuracy
g = tf.Graph()
with g.as_default():
# size of the folder
inputs = tf.placeholder(tf.float32, [1, 224, 224, 3])
# load dataset
images, labels = ReadRecordFileTrain('train.tfrecords')
print('load dataset done')
print('labels = ', labels)
print('data = ', images)
print(tf.shape(labels))
# load network
network, end_points= mobilenet.mobilenet_v1(images, num_classes=2, depth_multiplier=0.25 )
print('load network done')
print('network : ', network)
variables_to_restore = slim.get_variables_to_restore(exclude=["MobilenetV1/Logits/Conv2d_1c_1x1"])
load_checkpoint = "modele_mobilenet_v1_025/mobilenet_v1_0.25_224.ckpt"
init_fn = slim.assign_from_checkpoint_fn(load_checkpoint, variables_to_restore)
print('custom network done')
# Specify the loss function:
tf.losses.softmax_cross_entropy(labels, network)
total_loss = tf.losses.get_total_loss()
#tf.scalar_summary('losses/total_loss', total_loss)
# Specify the optimization scheme:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)
# create_train_op that ensures that when we evaluate it to get the loss,
# the update_ops are done and the gradient updates are computed.
train_tensor = slim.learning.create_train_op(total_loss, optimizer)
print('loss and optimizer chosen')
# Actually runs training.
save_checkpoint = 'model/modelcheck'
# start training
learning = slim.learning.train(train_tensor, save_checkpoint, init_fn=init_fn, number_of_steps=1000)
accuracy = evaluation(network, labels)
with tf.Session(graph=g) as sess:
sess.run(network)
print('network load')
sess.run(total_loss)
sess.run(accuracy)
sess.run(train_tensor)
sess.run(learning)
The error :
label1 : Tensor("ParseSingleExample/Squeeze_train/label:0", shape=(2,), dtype=int64)
label load: Tensor("Cast:0", shape=(2,), dtype=int32)
load dataset done
labels = Tensor("batch:1", shape=(4, 2), dtype=int32)
data = Tensor("batch:0", shape=(4, 224, 224, 3), dtype=float32)
Tensor("Shape:0", shape=(2,), dtype=int32)
load network done
network : Tensor("MobilenetV1/Logits/SpatialSqueeze:0", shape=(4, 2), dtype=float32)
custom network done
loss and optimizer chosen
end of graph
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta
[[Node: MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta/read = Identity[T=DT_FLOAT, _class=["loc:#MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta"], _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 113, in <module>
main()
File "test.py", line 105, in main
sess.run(network)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta
[[Node: MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta/read = Identity[T=DT_FLOAT, _class=["loc:#MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta"], _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta)]]
Caused by op 'MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta/read', defined at:
File "test.py", line 113, in <module>
main()
File "test.py", line 67, in main
network, end_points= mobilenet.mobilenet_v1(images, num_classes=2, depth_multiplier=0.25 )
File "/home/rd/Documents/RD2/users/Ludovic/tensorflow_mobilenet/mobilenet_v1.py", line 301, in mobilenet_v1
conv_defs=conv_defs)
File "/home/rd/Documents/RD2/users/Ludovic/tensorflow_mobilenet/mobilenet_v1.py", line 228, in mobilenet_v1_base
scope=end_point)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1891, in separable_convolution2d
outputs = normalizer_fn(outputs, **normalizer_params)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 528, in batch_norm
outputs = layer.apply(inputs, training=is_training)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 320, in apply
return self.__call__(inputs, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 286, in __call__
self.build(input_shapes[0])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/normalization.py", line 125, in build
trainable=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 349, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1389, in wrapped_custom_getter
*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 275, in variable_getter
variable_getter=functools.partial(getter, **kwargs))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 228, in _add_variable
trainable=trainable and self.trainable)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1389, in wrapped_custom_getter
*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1334, in layer_variable_getter
return _model_variable_getter(getter, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1326, in _model_variable_getter
custom_getter=getter, use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 262, in model_variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 217, in variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1334, in layer_variable_getter
return _model_variable_getter(getter, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1326, in _model_variable_getter
custom_getter=getter, use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 262, in model_variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 217, in variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 714, in _get_single_variable
validate_shape=validate_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 197, in __init__
expected_shape=expected_shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variables.py", line 316, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1338, in identity
result = _op_def_lib.apply_op("Identity", input=input, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta
[[Node: MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta/read = Identity[T=DT_FLOAT, _class=["loc:#MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta"], _device="/job:localhost/replica:0/task:0/cpu:0"](MobilenetV1/Conv2d_3_depthwise/BatchNorm/beta)]]