Keras model.fit throws Segmentation Fault with error- libprotobuf FATAL CHECK failed: (value.size()) <= (kint32max) - tensorflow

I am trying to train a simple tensorflow model on emr cluster with around 9000 parameters. But When I try to train the model it throws following error. I tried increasing the memory and decreasing the batch size. But it didn't help.
libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/wire_format_lite.cc:504] CHECK failed: (value.size()) <= (kint32max):
Segmentation fault
Based on one of the suggestion from this, I reduced the dataset to half but it causes another error:
Traceback (most recent call last):
File "/home/hadoop/feed_reco/datalake2/Dropoutnet/dropoutnet_training_test.py", line 175, in <module>
train_tensorflow(data_dir, "/home/hadoop/temp_model/", learning_rate, dropout_rate, epochs)
File "/home/hadoop/feed_reco/datalake2/Dropoutnet/dropoutnet_training_test.py", line 165, in train_tensorflow
model.fit(dataset, epochs=epochs)
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
return method(self, *args, **kwargs)
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
tmp_logs = train_function(iterator)
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
cancellation_manager=cancellation_manager)
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/home/hadoop/conda/envs/py-env/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = -1 is not in [0, 6806177)
[[{{node embedding_lookup_1}}]]
[[StatefulPartitionedCall]]
[[IteratorGetNext]] [Op:__inference_train_function_715]
Function call stack:
train_function -> train_function -> train_function

Related

Is my PC not able to run CNN Deep-Q Learning (Keras) in Jupyer Notebook?

PC setup:
Asus H110m-E, Intel i3-6100, 4G RAM, no display card, Windows 10, Jupyter Notebook
When I try to run the following codes,
https://github.com/PacktPublishing/Deep-Reinforcement-Learning-with-Python/blob/master/09.%20%20Deep%20Q%20Network%20and%20its%20Variants/9.03.%20Playing%20Atari%20Games%20using%20DQN.ipynb
Hundreds of thousands of progress bars keep showing up from model.fit(verbose=0). After some time, file size of ipynb becomes quite big (in terms of MB) and ResourceExhaustedError pops up.
Is there any way to solve the problem?
ResourceExhaustedError Traceback (most recent call last)
Input In [19], in <cell line: 13>()
39 break
41 if len(dqn.replay_buffer) > batch_size:
---> 42 dqn.train(batch_size)
Input In [18], in DQN.train(self, batch_size)
47 Q_values = self.main_network.predict(state)
48 Q_values[0][action] = target_Q
---> 50 self.main_network.fit(state, Q_values, epochs=1, verbose=0)
File ~\anaconda3\lib\site-packages\keras\utils\traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\eager\execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
52 try:
53 ctx.ensure_initialized()
---> 54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
55 inputs, attrs, num_outputs)
56 except core._NotOkStatusException as e:
57 if name is not None:
ResourceExhaustedError: Graph execution error:
Detected at node 'gradient_tape/sequential/conv2d_1/Conv2D/Conv2DBackpropFilter' defined at (most recent call last):
File "C:\Users\Xxx\anaconda3\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Xxx\anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Xxx\anaconda3\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "C:\Users\Xxx\anaconda3\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
app.start()
File "C:\Users\Xxx\anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
self.io_loop.start()
File "C:\Users\Xxx\anaconda3\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
self.asyncio_loop.run_forever()
File "C:\Users\Xxx\anaconda3\lib\asyncio\base_events.py", line 601, in run_forever
self._run_once()
File "C:\Users\Xxx\anaconda3\lib\asyncio\base_events.py", line 1905, in _run_once
handle._run()
File "C:\Users\Xxx\anaconda3\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\Xxx\anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 471, in dispatch_queue
await self.process_one()
File "C:\Users\Xxx\anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 460, in process_one
await dispatch(*args)
File "C:\Users\Xxx\anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 367, in dispatch_shell
await result
File "C:\Users\Xxx\anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 662, in execute_request
reply_content = await reply_content
File "C:\Users\Xxx\anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 360, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "C:\Users\Xxx\anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 532, in run_cell
return super().run_cell(*args, **kwargs)
File "C:\Users\Xxx\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2863, in run_cell
result = self._run_cell(
File "C:\Users\Xxx\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2909, in _run_cell
return runner(coro)
File "C:\Users\Xxx\anaconda3\lib\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "C:\Users\Xxx\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3106, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "C:\Users\Xxx\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3309, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "C:\Users\Xxx\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3369, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "C:\Users\Xxx\AppData\Local\Temp\ipykernel_10728\2638643138.py", line 42, in <cell line: 13>
dqn.train(batch_size)
File "C:\Users\Xxx\AppData\Local\Temp\ipykernel_10728\203765593.py", line 50, in train
self.main_network.fit(state, Q_values, epochs=1, verbose=0)
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\engine\training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\engine\training.py", line 1051, in train_function
return step_function(self, iterator)
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\engine\training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\engine\training.py", line 1030, in run_step
outputs = model.train_step(data)
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\engine\training.py", line 893, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 537, in minimize
grads_and_vars = self._compute_gradients(
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 590, in _compute_gradients
grads_and_vars = self._get_gradients(tape, loss, var_list, grad_loss)
File "C:\Users\Xxx\anaconda3\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 471, in _get_gradients
grads = tape.gradient(loss, var_list, grad_loss)
Node: 'gradient_tape/sequential/conv2d_1/Conv2D/Conv2DBackpropFilter'
OOM when allocating tensor with shape[82,110,512] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[{{node gradient_tape/sequential/conv2d_1/Conv2D/Conv2DBackpropFilter}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_1420]

DNN library is not found. (Google Colab)

I'm working on a ML project using Google Colab and Tensorflow to train a CNN, starting from the EfficientNetV2M model.
It used to work just fine until two days ago, when starting the training:
train = model.fit(X, y, epochs=save_every_n_epochs, batch_size=16, verbose=1)
gave the following error:
UnimplementedError Traceback (most recent call last)
<ipython-input-5-1b2fb9765100> in <module>
70 print(f"Training the model for {save_every_n_epochs} epochs")
71
---> 72 train = model.fit(X, y, epochs=save_every_n_epochs, batch_size=16, verbose=1)
73 print("Model trained")
74
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
53 ctx.ensure_initialized()
54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55 inputs, attrs, num_outputs)
56 except core._NotOkStatusException as e:
57 if name is not None:
UnimplementedError: Graph execution error:
Detected at node 'sequential/efficientnetv2-m/stem_conv/Conv2D' defined at (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py", line 846, in launch_instance
app.start()
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py", line 612, in start
self.io_loop.start()
File "/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/usr/lib/python3.7/asyncio/base_events.py", line 541, in run_forever
self._run_once()
File "/usr/lib/python3.7/asyncio/base_events.py", line 1786, in _run_once
handle._run()
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.7/dist-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1233, in inner
self.run()
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 365, in process_one
yield gen.maybe_future(dispatch(*args))
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 268, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py", line 545, in execute_request
user_expressions, allow_stdin,
File "/usr/local/lib/python3.7/dist-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py", line 306, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2855, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 2881, in _run_cell
return runner(coro)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
coro.send(None)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3058, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3249, in run_ast_nodes
if (await self.run_code(code, result, async_=asy)):
File "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-1b2fb9765100>", line 72, in <module>
train = model.fit(X, y, epochs=save_every_n_epochs, batch_size=16, verbose=1)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function
return step_function(self, iterator)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step
outputs = model.train_step(data)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
y_pred = self(x, training=True)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 490, in __call__
return super().__call__(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/sequential.py", line 374, in call
return super(Sequential, self).call(inputs, training=training, mask=mask)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 459, in call
inputs, training=training, mask=mask)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 596, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 490, in __call__
return super().__call__(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 459, in call
inputs, training=training, mask=mask)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 596, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/keras/layers/convolutional/base_conv.py", line 250, in call
outputs = self.convolution_op(inputs, self.kernel)
File "/usr/local/lib/python3.7/dist-packages/keras/layers/convolutional/base_conv.py", line 232, in convolution_op
name=self.__class__.__name__)
Node: 'sequential/efficientnetv2-m/stem_conv/Conv2D'
DNN library is not found.
[[{{node sequential/efficientnetv2-m/stem_conv/Conv2D}}]] [Op:__inference_train_function_45723]
I wasn't able to train any model since then, always getting this error, also when loading previously stored models.
I am able to train when i use an environment without GPU, but it is obviously too slow.
I've also tried to change the TF version as suggested in other topics, without any success.
Any suggestions?
Yes there's a similar question from a few hours ago. Apparently this is a problem related to the latest Tensoflow update introduced in Colab (Tensorflow 2.9.1).
As a quick fix you could downgrade Tensorflow. However only downgrading to tf 2.8, as suggested in the linked question wasn't enough to fix the problem in my case.
Try this:
!pip uninstall tensorflow-gpu
!pip install tensorflow-gpu==2.8
!apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2
Also make sure to restart the runtime if it asks you to do so.

LSTM - RuntimeError: Too many failed attempts to build model

I am trying to use keras tuner to tune an LSTM neural network to detect if an article is a fake news or not, using a kaggle dataset.
However, I keep getting this error: RuntimeError: Too many failed attempts to build model
I have also tried to use the RandomSearch rather than the BayesianOptimization, but still getting the same type of error.
This is the code:
'''
def build_model(hp):
voc_size=5000
embedding_vector_features=40
model = Sequential([
Embedding(
voc_size,
embedding_vector_features,
input_length = sent_length
),
AlphaDropout(
rate = hp.Choice(
'dropout_1_rate',
values=[0.3, 0.5],
default=0.3
)
),
LSTM(
units = hp.Int(
'LSTM_1_units',
min_value=100,
max_value=300,
step=32,
default=128
),
activation = hp.Choice(
'LSTM_1_activation',
values=['relu', 'selu']
),
kernel_initializer='lecun_normal'
),
AlphaDropout(
rate = hp.Choice(
'dropout_2_rate',
values=[0.3, 0.5],
default=0.3
)
),
LSTM(
units = hp.Int(
'LSTM_2_units',
min_value=100,
max_value=300,
step=32,
default=128
),
activation = hp.Choice(
'LSTM_2_activation',
values=['relu', 'selu']
),
kernel_initializer='lecun_normal'
),
AlphaDropout(
rate = hp.Choice(
'dropout_3_rate',
values=[0.3, 0.5],
default=0.3
)
),
Dense(
units = 1,
activation = 'sigmoid',
kernel_initializer='lecun_normal'
)
])
model.compile(
optimizer = keras.optimizers.Nadam(
hp.Choice(
'learning_rate',
values=[1e-2, 1e-3]
)
),
loss = 'binary_crooentropy',
metric = ['accuracy']
)
return model
tuner_search = BayesianOptimization(build_model,
objective='val_accuracy',
max_trials=3,
seed=42,
directory='output',
project_name='Fake News Classifier'
)
'''
When I try to run this code I get the following error:
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Invalid model 0/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 1/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 2/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 3/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 4/5
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm_1 will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
Invalid model 5/5
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py", line 104, in build
model = self.hypermodel.build(hp)
File "<ipython-input-18-fe84fe0afbca>", line 62, in build_model
kernel_initializer='lecun_normal'
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 144, in __init__
self.add(layer)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/training/tracking/base.py", line 517, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/sequential.py", line 223, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py", line 660, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 952, in __call__
input_list)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1091, in _functional_construction_call
inputs, input_masks, args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 822, in _keras_tensor_symbolic_call
return self._infer_output_signature(inputs, args, kwargs, input_masks)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 862, in _infer_output_signature
self._maybe_build(inputs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 2685, in _maybe_build
self.input_spec, inputs, self.name)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py", line 223, in assert_input_compatibility
str(tuple(shape)))
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py in build(self, hp)
103 with maybe_distribute(self.distribution_strategy):
--> 104 model = self.hypermodel.build(hp)
105 except:
19 frames
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/kerastuner/engine/hypermodel.py in build(self, hp)
111 if i == self._max_fail_streak:
112 raise RuntimeError(
--> 113 'Too many failed attempts to build model.')
114 continue
115
RuntimeError: Too many failed attempts to build model.
How can I solve the issue?
Actual error is ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 128)
LSTM layer expects input shape inputs: A 3D tensor with shape [batch, timesteps, feature]
I could reproduce your issue
import tensorflow as tf
inputs = tf.random.normal([32, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)
Output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-160c5e8d5d9a> in <module>()
2 inputs = tf.random.normal([32, 8])
3 lstm = tf.keras.layers.LSTM(4)
----> 4 output = lstm(inputs)
5 print(output.shape)
2 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/input_spec.py in assert_input_compatibility(input_spec, inputs, layer_name)
217 'expected ndim=' + str(spec.ndim) + ', found ndim=' +
218 str(ndim) + '. Full shape received: ' +
--> 219 str(tuple(shape)))
220 if spec.max_ndim is not None:
221 ndim = x.shape.rank
ValueError: Input 0 of layer lstm_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (32, 8)
Working code snippet
import tensorflow as tf
inputs = tf.random.normal([32, 10, 8])
lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)
print(output.shape)
Output
(32, 4)

Dimension does not match when using `keras.Model.fit` in `BERT` of tensorflow

I follow the instruction of Fine-tuning BERT to build a model with my own dataset(It is kind of large, and greater than 20G), then take steps to re-cdoe my data and load them from tf_record files.
The training_dataset I create has the same signature as that in the instruction
training_dataset.element_spec
({'input_word_ids': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None),
'input_mask': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None),
'input_type_ids': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None)},
TensorSpec(shape=(32,), dtype=tf.int32, name=None))
where batch_size is 32, max_seq_length is 1024.
As the instruction suggestes,
The resulting tf.data.Datasets return (features, labels) pairs, as expected by keras.Model.fit
It semms that everything works as expected,(the instruction does not show how to use training_dataset though ) However, the following code
bert_classifier.fit(
x = training_dataset,
validation_data=test_dataset, # has the same signature just as training_dataset
batch_size=32,
epochs=epochs,
verbose=1,
)
encouters an error that seems weird to me,
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/captain/project/dataload/train.py", line 81, in <module>
verbose=1,
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
*args, **kwds))
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:805 train_function *
return step_function(self, iterator)
/home/captain/.local/lib/python3.7/site-packages/official/nlp/keras_nlp/layers/position_embedding.py:88 call *
return tf.broadcast_to(position_embeddings, input_shape)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:845 broadcast_to **
"BroadcastTo", input=input, shape=shape, name=name)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
attrs=attr_protos, op_def=op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:592 _create_op_internal
compute_device)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3536 _create_op_internal
op_def=op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2016 __init__
control_input_ops, op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1856 _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 512 and 1024 for '{{node bert_classifier/bert_encoder_1/position_embedding/BroadcastTo}} =
BroadcastTo[T=DT_FLOAT, Tidx=DT_INT32](bert_classifier/bert_encoder_1/position_embedding/strided_slice_1, bert_classifier/bert_encoder_1/position_embedding/Shape)'
with input shapes: [512,768], [3] and with input tensors computed as partial shapes: input[1] = [32,1024,768].
There is nothing to do with 512, and I didn't use 512 thorough my code. So where is wrong with my code and how to fix that?
They created the bert_classifier based on bert_config_file loaded from bert_config.json
bert_classifier, bert_encoder = bert.bert_models.classifier_model(bert_config, num_labels=2)
bert_config.json
{
'attention_probs_dropout_prob': 0.1,
'hidden_act': 'gelu',
'hidden_dropout_prob': 0.1,
'hidden_size': 768,
'initializer_range': 0.02,
'intermediate_size': 3072,
'max_position_embeddings': 512,
'num_attention_heads': 12,
'num_hidden_layers': 12,
'type_vocab_size': 2,
'vocab_size': 30522
}
According to this config, hidden_size is 768 and max_position_embeddings is 512 so your input data used to feed to BERT model must be the same shape as described. It explains the reason why you are getting the shape-mismatched issue.
Therefore, to make it work, you have to change all lines for creating tensor inputs from 1024 to 512.

maximum number of labels in corss_entropy classification is 300?

I found that with sparse_softmax_cross_entropy_with_logits ou can only have at maximum 300 labels?
I found nothing about that.
What if I have more?
EDIT:
If I do not limit to 300 classes, I get evrytime the following trace:
2019-03-05 15:24:17.899610: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at sparse_xent_op.cc:90 : Invalid argument: Received a label value of 428 which is outside the valid range of [0, 300). Label values: 428 262
Traceback (most recent call last):
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 428 which is outside the valid range of [0, 300). Label values: 428 262
[[{{node QAModel/loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}} = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](QAModel/StartDist/SimpleSoftmaxLayer/Add, _arg_QAModel/Placeholder_4_0_5)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 236, in <module>
tf.app.run()
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "main.py", line 194, in main
qa_model.train(sess, train_context_path, train_qn_path, train_ans_path, dev_qn_path, dev_context_path, dev_ans_path)
File "C:\Users\\IB\QA-Models-Bidaf\code\qa_model.py", line 764, in train
loss, global_step, param_norm, grad_norm = self.run_train_iter(session, batch, summary_writer)
File "C:\Users\\IB\QA-Models-Bidaf\code\qa_model.py", line 359, in run_train_iter
[_, summaries, loss, global_step, param_norm, gradient_norm] = session.run(output_feed, input_feed)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 428 which is outside the valid range of [0, 300). Label values: 428 262
[[node QAModel/loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at C:\Users\\IB\QA-Models-Bidaf\code\qa_model.py:318) = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](QAModel/StartDist/SimpleSoftmaxLayer/Add, _arg_QAModel/Placeholder_4_0_5)]]
Caused by op 'QAModel/loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits', defined at:
File "main.py", line 236, in <module>
tf.app.run()
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "main.py", line 165, in main
qa_model = QAModel(FLAGS, id2word, word2id, emb_matrix)
File "C:\Users\\IB\QA-Models-Bidaf\code\qa_model.py", line 64, in __init__
self.add_loss()
File "C:\Users\\IB\QA-Models-Bidaf\code\qa_model.py", line 318, in add_loss
loss= tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.logits, labels=self.ans_span) # loss_start has shape (batch_size)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2049, in sparse_softmax_cross_entropy_with_logits
precise_logits, labels, name=name)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 8063, in sparse_softmax_cross_entropy_with_logits
labels=labels, name=name)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
op_def=op_def)
File "C:\Users\\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Received a label value of 428 which is outside the valid range of [0, 300). Label values: 428 262
[[node QAModel/loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at C:\Users\\IB\QA-Models-Bidaf\code\qa_model.py:318) = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](QAModel/StartDist/SimpleSoftmaxLayer/Add, _arg_QAModel/Placeholder_4_0_5)]]
Everytime I get this range upt to 300. Why?!
valid range of [0, 300)
When I took part in YouTube8M Kaggle competition it had about 5k classes and we used a loss provided by competition organizers, i.e. Google
Have a look https://github.com/mpekalski/Y8M/blob/master/video_level_code/losses.py