Tensorflow 2.0.0 MirroredStrategy NCCL problem - tensorflow

I'm trying to train with multi-gpu using tf.distribute.MirroredStrategy().
After several attempt to apply to my custom code, it has some error about NcclAllReduce.
So I copied mnist tutorial using tf.distribute from tensorflow page, running it has same error. logs and my environments are below
My env
sys.platform----------Window 10
Python----------3.7.6
Numpy----------1.18.1
TensorFlow----------2.0.0
TF CUDA support-----------True
GPU----------2 GPU, both are Quadro GV100
INFO:tensorflow:batch_all_reduce: 8 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:batch_all_reduce: 8 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-12-a7ead7e91ea5> in <module>
19 num_batches = 0
20 for x in train_dist_dataset:
---> 21 total_loss += distributed_train_step(x)
22 num_batches += 1
23 train_loss = total_loss / num_batches
~\Anaconda3\envs\tf-MSTO-DL\lib\site-packages\tensorflow_core\python\eager\def_function.py in __call__(self, *args, **kwds)
455
456 tracing_count = self._get_tracing_count()
--> 457 result = self._call(*args, **kwds)
458 if tracing_count == self._get_tracing_count():
459 self._call_counter.called_without_tracing()
~\Anaconda3\envs\tf-MSTO-DL\lib\site-packages\tensorflow_core\python\eager\def_function.py in _call(self, *args, **kwds)
518 # Lifting succeeded, so variables are initialized and we can run the
519 # stateless function.
--> 520 return self._stateless_fn(*args, **kwds)
521 else:
522 canon_args, canon_kwds = \
~\Anaconda3\envs\tf-MSTO-DL\lib\site-packages\tensorflow_core\python\eager\function.py in __call__(self, *args, **kwargs)
1821 """Calls a graph function specialized to the inputs."""
1822 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 1823 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
1824
1825 #property
~\Anaconda3\envs\tf-MSTO-DL\lib\site-packages\tensorflow_core\python\eager\function.py in _filtered_call(self, args, kwargs)
1139 if isinstance(t, (ops.Tensor,
1140 resource_variable_ops.BaseResourceVariable))),
-> 1141 self.captured_inputs)
1142
1143 def _call_flat(self, args, captured_inputs, cancellation_manager=None):
~\Anaconda3\envs\tf-MSTO-DL\lib\site-packages\tensorflow_core\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1222 if executing_eagerly:
1223 flat_outputs = forward_function.call(
-> 1224 ctx, args, cancellation_manager=cancellation_manager)
1225 else:
1226 gradient_name = self._delayed_rewrite_functions.register()
~\Anaconda3\envs\tf-MSTO-DL\lib\site-packages\tensorflow_core\python\eager\function.py in call(self, ctx, args, cancellation_manager)
509 inputs=args,
510 attrs=("executor_type", executor_type, "config_proto", config),
--> 511 ctx=ctx)
512 else:
513 outputs = execute.execute_with_cancellation(
~\Anaconda3\envs\tf-MSTO-DL\lib\site-packages\tensorflow_core\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
65 else:
66 message = e.message
---> 67 six.raise_from(core._status_to_exception(e.code, message), None)
68 except TypeError as e:
69 keras_symbolic_tensors = [
~\Anaconda3\envs\tf-MSTO-DL\lib\site-packages\six.py in raise_from(value, from_value)
InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node Adam/NcclAllReduce}}with these attrs: [reduction="sum", shared_name="c1", T=DT_FLOAT, num_devices=2]
Registered devices: [CPU, GPU]
Registered kernels:
<no registered kernels>
[[Adam/NcclAllReduce]] [Op:__inference_distributed_train_step_1755]

There are several options for cross_device_ops, it seems
strategy = tf.distribute.MirroredStrategy(
cross_device_ops=tf.distribute.NcclAllReduce())
would produce a NCCL error depending on your architecture and configuration.
This option was meant for NVIDIA DGX-1 architecture and might underperform on other architectures :
strategy = tf.distribute.MirroredStrategy(
cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
Should work :
strategy = tf.distribute.MirroredStrategy(
cross_device_ops=tf.distribute.ReductionToOneDevice())
So that it can be advised to try the different options.
See Tensorflow Multi-GPU - NCCL

NCCL drivers do not work with Windows. To my knowledge they only work with Linux. I have read that there might be a NCCL driver equivalent for Windows but have not been able to find them myself. If you want to stay with Windows you can try HierarchicalCopyAllReduce.
mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/job:localhost/replica:0/task:0/device:GPU:0", "/job:localhost/replica:0/task:0/device:GPU:1"],
cross_device_ops = tf.distribute.HierarchicalCopyAllReduce())
with mirrored_strategy.scope():

You can try:
strategy = tf.distribute.MultiWorkerMirroredStrategy()
For me (1 CPU - 3 GPU setup on Windows WSL2) it's faster than any of the other methods.

I had this exact problem with tensorflow 2.2.0 when using tf.distribbute.MirrorStrategy and found that I was running the wrong version of Cuda - I had been runing 10.0 and it should have been cuda 10.1.
I think worth checking which version you are running and if it's right.
For tensorflow 2.0.0 you should be using cuda 10.0. See here for combinations between the two libraries

Related

ResourceExhaustedError: OOM when allocating tensor with shape[64,64,224,224]

Got ResourceExhaustedError while training deep learning algorithm on NVIDIA GeForce RTX 3050 Ti Laptop GPU using tensorflow with memory_limit: 1721342363
I am training 2870 images and its working well using CPU but on GPU it seems to be getting restricted due to memory limit. Have I turned on a limit of memory on my GPU or do I have no option but to use my CPU?
It took me 70 mins on my CPU and that is why I chose to run on my GPU.
But while training
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(training_set, validation_data=test_set, epochs=20, batch_size=32)
Got this error:
Epoch 1/20
1/45 [..............................] - ETA: 9:46 - loss: 1.8638 - accuracy: 0.1667
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
Cell In [4], line 3
1 #Compile the model
2 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
----> 3 history = model.fit(training_set, validation_data=test_set, epochs=20, batch_size=32)
File ~\anaconda3\envs\tf-gpu1\lib\site-packages\keras\engine\training.py:1184, in Model.fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1177 with tf.profiler.experimental.Trace(
1178 'train',
1179 epoch_num=epoch,
1180 step_num=step,
1181 batch_size=batch_size,
1182 _r=1):
1183 callbacks.on_train_batch_begin(step)
-> 1184 tmp_logs = self.train_function(iterator)
1185 if data_handler.should_sync:
1186 context.async_wait()
File ~\anaconda3\envs\tf-gpu1\lib\site-packages\tensorflow\python\eager\def_function.py:885, in Function.__call__(self, *args, **kwds)
882 compiler = "xla" if self._jit_compile else "nonXla"
884 with OptionalXlaContext(self._jit_compile):
--> 885 result = self._call(*args, **kwds)
887 new_tracing_count = self.experimental_get_tracing_count()
888 without_tracing = (tracing_count == new_tracing_count)
File ~\anaconda3\envs\tf-gpu1\lib\site-packages\tensorflow\python\eager\def_function.py:917, in Function._call(self, *args, **kwds)
914 self._lock.release()
915 # In this case we have created variables on the first call, so we run the
916 # defunned version which is guaranteed to never create variables.
--> 917 return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
918 elif self._stateful_fn is not None:
919 # Release the lock early so that multiple threads can perform the call
920 # in parallel.
921 self._lock.release()
File ~\anaconda3\envs\tf-gpu1\lib\site-packages\tensorflow\python\eager\function.py:3039, in Function.__call__(self, *args, **kwargs)
3036 with self._lock:
3037 (graph_function,
3038 filtered_flat_args) = self._maybe_define_function(args, kwargs)
-> 3039 return graph_function._call_flat(
3040 filtered_flat_args, captured_inputs=graph_function.captured_inputs)
File ~\anaconda3\envs\tf-gpu1\lib\site-packages\tensorflow\python\eager\function.py:1963, in ConcreteFunction._call_flat(self, args, captured_inputs, cancellation_manager)
1959 possible_gradient_type = gradients_util.PossibleTapeGradientTypes(args)
1960 if (possible_gradient_type == gradients_util.POSSIBLE_GRADIENT_TYPES_NONE
1961 and executing_eagerly):
1962 # No tape is watching; skip to running the function.
-> 1963 return self._build_call_outputs(self._inference_function.call(
1964 ctx, args, cancellation_manager=cancellation_manager))
1965 forward_backward = self._select_forward_and_backward_functions(
1966 args,
1967 possible_gradient_type,
1968 executing_eagerly)
1969 forward_function, args_with_tangents = forward_backward.forward()
File ~\anaconda3\envs\tf-gpu1\lib\site-packages\tensorflow\python\eager\function.py:591, in _EagerDefinedFunction.call(self, ctx, args, cancellation_manager)
589 with _InterpolateFunctionError(self):
590 if cancellation_manager is None:
--> 591 outputs = execute.execute(
592 str(self.signature.name),
593 num_outputs=self._num_outputs,
594 inputs=args,
595 attrs=attrs,
596 ctx=ctx)
597 else:
598 outputs = execute.execute_with_cancellation(
599 str(self.signature.name),
600 num_outputs=self._num_outputs,
(...)
603 ctx=ctx,
604 cancellation_manager=cancellation_manager)
File ~\anaconda3\envs\tf-gpu1\lib\site-packages\tensorflow\python\eager\execute.py:59, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
57 try:
58 ctx.ensure_initialized()
---> 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
ResourceExhaustedError: OOM when allocating tensor with shape[64,64,224,224] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model/block1_conv2/Relu (defined at \AppData\Local\Temp\ipykernel_11956\3538519329.py:3) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_1205]
Function call stack:
train_function
I might be not entirely right on the details, but the training process looks like this from the GPU memory perspective:
the network is loaded into the GPU memory
a batch is loaded into the GPU memory
trainable parameters are computed on the GPU
So there are several ways to reduce GPU memory usage:
Reduce the number of trainable parameters so each training step requires less computations. You can do this by either reducing the size of the net or by making some of the later layers non-trainable if the net is pretrained (tf.keras.applications has a lot of networks pretrained on the imagenet).
To make layers of the Functional net (in which the abovementioned nets are written) untrainable, you should switch their .trainable attribute to False:
Say to freeze all the layers starting from the 100th one you need to:
ind_to_freeze = 100
for layer in net.layers[:100]:
layer.trainable=False
Reduce the image shapes so each batch requires less memory. Generally, this will lower the accuracy.
Reduce the batch size so each batch requires less memory.
Also I see you've loaded all 2870 images into the (CPU) memory at once. I recommend using generator instead that takes filenames and reads instead. You can read about building native tf.data.Datasets here

TensorFlow GradienTape crashes with InvalidArgumentError when working on sparse Tensors

I am encountering strange behavior when trying to evaluate the derivatives of a result obtained by sparse tensor operations. If I blow up all sparse inputs to dense before operating on them, the following code works as expected (first part of the following code), but it crashes with InvalidArgumentError when I do the same with sparse tensors. In addition, I get while_loop Warnings as below. As in the actual problem of course more operations and much bigger and more tensors are involved, I essentially have to collect the entries of c in sparse mode. Can anyone make (more) sense of this behavior?
import tensorflow as tf
import numpy as np
a=tf.SparseTensor(indices=[[0,0],[1,1]],values=np.array([1,1],dtype=np.float32),dense_shape=(2,2))
b=tf.SparseTensor(indices=[[0,1],[1,0]],values=np.array([-1,-1],dtype=np.float32),dense_shape=(2,2))
#dense mode...
f1=tf.Variable([1,1],dtype=np.float32)
with tf.GradientTape() as gtape:
c=tf.sparse.to_dense(a)*f1[0]+tf.sparse.to_dense(b)*f1[1]
print(gtape.jacobian(c,f1)) #... works fine
#sparse mode...
f2=tf.Variable([1,1],dtype=np.float32)
with tf.GradientTape() as gtape:
c=tf.sparse.add(a*f2[0],b*f2[1],0)
c=tf.sparse.to_dense(c)
print(gtape.jacobian(c,f2)) #... InvalidArgumentError
#WARNING:tensorflow:Using a while_loop for converting SparseAddGrad
#WARNING:tensorflow:Using a while_loop for converting SparseTensorDenseAdd
#WARNING:tensorflow:Using a while_loop for converting SparseTensorDenseAdd
#---------------------------------------------------------------------------
#InvalidArgumentError Traceback (most recent call last)
#<ipython-input-10-d449761ef6b2> in <module>
# 12 c=tf.sparse.add(a*f2[0],b*f2[1],0)
# 13 c=tf.sparse.to_dense(c)
#---> 14 print(gtape.jacobian(c,f2)) #InvalidArgumentError
#c:\program files\python37\lib\site-packages\tensorflow\python\eager\backprop.py in jacobian(self, target, sources, unconnected_gradients, parallel_iterations, experimental_use_pfor)
# 1187 try:
# 1188 output = pfor_ops.pfor(loop_fn, target_size,
#-> 1189 parallel_iterations=parallel_iterations)
# 1190 except ValueError as err:
# 1191 six.reraise(
#c:\program files\python37\lib\site-packages\tensorflow\python\ops\parallel_for\control_flow_ops.py in pfor(loop_fn, iters, fallback_to_while_loop, parallel_iterations)
# 203 def_function.run_functions_eagerly(False)
# 204 f = def_function.function(f)
#--> 205 outputs = f()
# 206 if functions_run_eagerly is not None:
# 207 def_function.run_functions_eagerly(functions_run_eagerly)
#c:\program files\python37\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
# 826 tracing_count = self.experimental_get_tracing_count()
# 827 with trace.Trace(self._name) as tm:
#--> 828 result = self._call(*args, **kwds)
# 829 compiler = "xla" if self._experimental_compile else "nonXla"
# 830 new_tracing_count = self.experimental_get_tracing_count()
#c:\program files\python37\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
# 893 # If we did not create any variables the trace we have is good enough.
# 894 return self._concrete_stateful_fn._call_flat(
#--> 895 filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
# 896
# 897 def fn_with_cond(inner_args, inner_kwds, inner_filtered_flat_args):
#c:\program files\python37\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
# 1917 # No tape is watching; skip to running the function.
# 1918 return self._build_call_outputs(self._inference_function.call(
#-> 1919 ctx, args, cancellation_manager=cancellation_manager))
# 1920 forward_backward = self._select_forward_and_backward_functions(
# 1921 args,
#c:\program files\python37\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
# 558 inputs=args,
# 559 attrs=attrs,
#--> 560 ctx=ctx)
# 561 else:
# 562 outputs = execute.execute_with_cancellation(
#c:\program files\python37\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
# 58 ctx.ensure_initialized()
# 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
#---> 60 inputs, attrs, num_outputs)
# 61 except core._NotOkStatusException as e:
# 62 if name is not None:
#InvalidArgumentError: Only tensors with ranks between 1 and 5 are currently supported. Tensor rank: 0
# [[{{node gradient_tape/SparseTensorDenseAdd_1/pfor/while/body/_56/gradient_tape/SparseTensorDenseAdd_1/pfor/while/SparseTensorDenseAdd}}]] [Op:__inference_f_6235]
#Function call stack:
#f

How to run tensorflow-gpu

I have one problem, my jupyter Notebook does not run on the gpu. I updated my Driver (Nvidia GTX 1660 Ti), installed CUDA 11, put the CuDNN-files into the folders and put the correct path in the environmental variables.
After doing that, I added a new environment to Anaconda including a GPU-kernel and installed tensorflow-gpu (Version 2.4, because CUDA 11 needs Version >= 2.4.0), like it is explained in this video.
After that I opened jupyter notebook with the new kernel. So I can run my code and until a certain step that works, but my GPU utilization in the task manager is below 1% and my RAM is at 60%-99%. So I think, my code isn't running on the GPU. I made a few tests:
import tensorflow.keras
import tensorflow as tf
print(tf.__version__)
print(tensorflow.keras.__version__)
print(tf.test.is_built_with_cuda())
print(tf.config.list_physical_devices('GPU'))
print(tf.test.is_gpu_available())
leads to (What I think it is correct):
2.4.0
2.4.0
True
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
True
The next test is:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
What leads to:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9334837591848971536
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4837251481
locality {
bus_id: 1
links {
}
}
incarnation: 2660164806064353779
physical_device_desc: "device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
So there is CPU and GPU in that kernel, isn't it?
What can I do, that my Neural Network is running on the GPU and not on the CPU?
My Code is running until I try to train my neural network. This is the code and the error occurring:
model.fit([np.asarray(X_train).astype(np.float32), np.asarray(X_train_zusatz).astype(np.float32)],
y_train, epochs=10, batch_size=10)
It is a concatenated neural network, if you are wondering about the input and it works quite fine with normal tensorflow (not tensorflow-gpu). But training need a very very long time.
Epoch 1/10
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-27-10813edc74c8> in <module>
3
4 model.fit([np.asarray(X_train).astype(np.float32), np.asarray(X_train_zusatz).astype(np.float32)],
----> 5 y_train, epochs=10, batch_size=10)#,
6 #validation_data=[[X_test, X_test_zusatz], y_test], class_weight=class_weight)
~\.conda\envs\tf-gpu\lib\site-pac
kages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1098 _r=1):
1099 callbacks.on_train_batch_begin(step)
-> 1100 tmp_logs = self.train_function(iterator)
1101 if data_handler.should_sync:
1102 context.async_wait()
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
886 # Lifting succeeded, so variables are initialized and we can run the
887 # stateless function.
--> 888 return self._stateless_fn(*args, **kwds)
889 else:
890 _, _, _, filtered_flat_args = \
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
2941 filtered_flat_args) = self._maybe_define_function(args, kwargs)
2942 return graph_function._call_flat(
-> 2943 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
2944
2945 #property
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1917 # No tape is watching; skip to running the function.
1918 return self._build_call_outputs(self._inference_function.call(
-> 1919 ctx, args, cancellation_manager=cancellation_manager))
1920 forward_backward = self._select_forward_and_backward_functions(
1921 args,
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
558 inputs=args,
559 attrs=attrs,
--> 560 ctx=ctx)
561 else:
562 outputs = execute.execute_with_cancellation(
~\.conda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/lstm/while/body/_1/model/lstm/while/lstm_cell/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[gradient_tape/model/embedding/embedding_lookup/Reshape/_74]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[300,300] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/lstm/while/body/_1/model/lstm/while/lstm_cell/split}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_4691]
Function call stack:
train_function -> train_function
Why is this error occurring?
-Update-
This is how my "nvidia-smi" looks like during training my model (After ca. 20 seconds of training).
Thank you and best regards,
Daniel

ResNet model in Tensorflow Federated

I tried to customize the model in "Image classification" tutorial in Tensorflow Federated. (It originally used a sequential model)
I use Keras ResNet50 but when it began to train, there is always an error "Incompatible shapes"
Here are my codes:
NUM_CLIENTS = 4
NUM_EPOCHS = 10
BATCH_SIZE = 2
SHUFFLE_BUFFER = 5
def create_compiled_keras_model():
model = tf.keras.applications.resnet.ResNet50(include_top=False, weights='imagenet',
input_tensor=tf.keras.layers.Input(shape=(100,
300, 3)), pooling=None)
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.SGD(learning_rate=0.02),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
return model
def model_fn():
keras_model = create_compiled_keras_model()
return tff.learning.from_compiled_keras_model(keras_model, sample_batch)
iterative_process = tff.learning.build_federated_averaging_process(model_fn)
Error information:
enter image description here
I feel that the shape is incompatible because the epoch and clients information were somehow missing. Would be very thankful if someone could give me a hint.
Updates:
The Assertion error happened during tff.learning.build_federated_averaging_process
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-164-dac26193d9d8> in <module>()
----> 1 iterative_process = tff.learning.build_federated_averaging_process(model_fn)
2
3 # iterative_process = build_federated_averaging_process(model_fn)
13 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_federated/python/learning/federated_averaging.py in build_federated_averaging_process(model_fn, server_optimizer_fn, client_weight_fn, stateful_delta_aggregate_fn, stateful_model_broadcast_fn)
165 return optimizer_utils.build_model_delta_optimizer_process(
166 model_fn, client_fed_avg, server_optimizer_fn,
--> 167 stateful_delta_aggregate_fn, stateful_model_broadcast_fn)
/usr/local/lib/python3.6/dist-packages/tensorflow_federated/python/learning/framework/optimizer_utils.py in build_model_delta_optimizer_process(model_fn, model_to_client_delta_fn, server_optimizer_fn, stateful_delta_aggregate_fn, stateful_model_broadcast_fn)
349 # still need this.
350 with tf.Graph().as_default():
--> 351 dummy_model_for_metadata = model_utils.enhance(model_fn())
352
353 # ===========================================================================
<ipython-input-159-b2763ace8e5b> in model_fn()
1 def model_fn():
2 keras_model = model
----> 3 return tff.learning.from_compiled_keras_model(keras_model, sample_batch)
/usr/local/lib/python3.6/dist-packages/tensorflow_federated/python/learning/keras_utils.py in from_compiled_keras_model(keras_model, dummy_batch)
211 # Model.test_on_batch() once before asking for metrics.
212 if isinstance(dummy_tensors, collections.Mapping):
--> 213 keras_model.test_on_batch(**dummy_tensors)
214 else:
215 keras_model.test_on_batch(*dummy_tensors)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py in test_on_batch(self, x, y, sample_weight, reset_metrics)
1007 sample_weight=sample_weight,
1008 reset_metrics=reset_metrics,
-> 1009 standalone=True)
1010 outputs = (
1011 outputs['total_loss'] + outputs['output_losses'] + outputs['metrics'])
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py in test_on_batch(model, x, y, sample_weight, reset_metrics, standalone)
503 y,
504 sample_weights=sample_weights,
--> 505 output_loss_metrics=model._output_loss_metrics)
506
507 if reset_metrics:
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py in __call__(self, *args, **kwds)
568 xla_context.Exit()
569 else:
--> 570 result = self._call(*args, **kwds)
571
572 if tracing_count == self._get_tracing_count():
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py in _call(self, *args, **kwds)
606 # In this case we have not created variables on the first call. So we can
607 # run the first trace but we should fail if variables are created.
--> 608 results = self._stateful_fn(*args, **kwds)
609 if self._created_variables:
610 raise ValueError("Creating variables on a non-first call to a function"
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in __call__(self, *args, **kwargs)
2407 """Calls a graph function specialized to the inputs."""
2408 with self._lock:
-> 2409 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
2410 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
2411
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in _maybe_define_function(self, args, kwargs)
2765
2766 self._function_cache.missed.add(call_context_key)
-> 2767 graph_function = self._create_graph_function(args, kwargs)
2768 self._function_cache.primary[cache_key] = graph_function
2769 return graph_function, args, kwargs
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
2655 arg_names=arg_names,
2656 override_flat_arg_shapes=override_flat_arg_shapes,
-> 2657 capture_by_value=self._capture_by_value),
2658 self._function_attributes,
2659 # Tell the ConcreteFunction to clean up its graph once it goes out of
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
979 _, original_func = tf_decorator.unwrap(python_func)
980
--> 981 func_outputs = python_func(*func_args, **func_kwargs)
982
983 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py in wrapped_fn(*args, **kwds)
437 # __wrapped__ allows AutoGraph to swap in a converted function. We give
438 # the function a weak reference to itself to avoid a reference cycle.
--> 439 return weak_wrapped_fn().__wrapped__(*args, **kwds)
440 weak_wrapped_fn = weakref.ref(wrapped_fn)
441
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in wrapper(*args, **kwargs)
966 except Exception as e: # pylint:disable=broad-except
967 if hasattr(e, "ag_error_metadata"):
--> 968 raise e.ag_error_metadata.to_exception(e)
969 else:
970 raise
AssertionError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_eager.py:345 test_on_batch *
with backend.eager_learning_phase_scope(0):
/usr/lib/python3.6/contextlib.py:81 __enter__
return next(self.gen)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py:425 eager_learning_phase_scope
assert ops.executing_eagerly_outside_functions()
AssertionError:
Ah, I believe this issue is coming from mismatched expectations on sample_batch. TFF passes sample_batch to Keras, which calls a forward pass with this sample batch to initialize various attributes of the keras model. sample_batch should be either a sample from the literal data you are going to be feeding the model as on the server side, or a batch of fake data which matches the shape and type of the data you will be passing in.
An example of the former can be found here (this uses tf.data.Dataset), and there are several examples of the latter in test code, like here.
From what I see of the definition of the model, likely the x element of your sample_batch should be an ndarray of shape [2, 100, 300, 3] (where 2 is for the batch size, but technically this can be any nonzero dimension), and the y element should also match the expected y structure in the data you are using.
I hope this helps, just ping back if there are any problems!
One thing to note, that may be helpful in thinking about TFF--TFF is building a syntax tree representing the distributed computation you are defining via build_federated_averaging_process. This error actually occurs during construction of this object. TFF must trace the computation you pass it in order to know what structure to generate, and this is what is raising here. Actual training of the model happens when you call next on the returned IterativeProcess.
I have same problem:
if I execute this line
state, metrics = iterative_process.next(state, federated_train_data)
print('round 1, metrics={}'.format(metrics))
I find this error
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/sequential/vgg16/block1_pool/MaxPool}}]]
[[subcomputation/StatefulPartitionedCall_1/ReduceDataset]]
[[subcomputation/StatefulPartitionedCall_1/ReduceDataset/_140]]
(1) Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/sequential/vgg16/block1_pool/MaxPool}}]]
[[subcomputation/StatefulPartitionedCall_1/ReduceDataset]]
0 successful operations.
0 derived errors ignored.
knowin that I employe VGG16
have you any idea on this type of error

Saving A Model That Contains Universal Sentence Encoder as its Embedding

I am trying to save a model which uses USE from tf-hub as its embedding layer and has a few FFN stacked upon it. The model seems to work fine, but I am facing a problem in saving and loading the model.
disable_eager_execution()
embed = hub.Module(module_url)
def UniversalEmbedding(x):
return embed(tf.squeeze(tf.cast(x, tf.string)))
input_text = Input(shape=[], dtype=tf.string)
response_text = Input(shape=[], dtype=tf.string)
text_embedding = Lambda(UniversalEmbedding, output_shape=(512, ))(input_text)
response_embedding = Lambda(UniversalEmbedding, output_shape=(512, ))(response_text)
response_embedding = Dense(512, activation='relu')(response_embedding)
response_embedding = Dense(512, activation='relu')(response_embedding)
score = Dot(axes=1, normalize=True)([text_embedding, response_embedding])
pred = Dense(2, activation='softmax')(score)
text_encoder = Model(inputs=[input_text], outputs=text_embedding)
response_encoder = Model(inputs=[response_text], outputs=response_embedding)
model = Model(inputs=[input_text, response_text], outputs=pred)
The code above is how I built my model (its a dual encoder model with USE as its encoder).
I had to disable eager execution because USE seems to be not working in eager execution environment yet. If not, and if there is a workaround for that, I'd really appreciate any help for this too :)
The model is trained and saved via the following code :
with tf.compat.v1.Session() as session:
K.set_session(session)
session.run(tf.compat.v1.global_variables_initializer())
session.run(tf.compat.v1.tables_initializer())
history = model.fit_generator(generator=train_neg_sample_generator,
validation_data=val_neg_sample_generator, epochs=20,
callbacks=[checkpointer, earlystopper], verbose=0)
and the model is loaded with no error when the weights in the checkpoints (saved in hdf5 files) are loaded to the model defined in the code above. So the code below works fine, only because the architecture 'model' is already defined above.
with tf.compat.v1.Session() as session:
K.set_session(session)
session.run(tf.compat.v1.global_variables_initializer())
session.run(tf.compat.v1.tables_initializer())
model.load_weights('./saved_models/weights.03-0.29.hdf5')
tf.keras.models.save_model(model, 'test_model2.hdf5')
predicts = model.predict([["how are you?", "how are you?", 'hi', 'my two favorites in one pic!'], ["i'm fine", "what the heck", 'hi', 'same!']])
print(predicts)
print(np.argmax(predicts, axis=1))
Then I tried 2 things. First, I tried to save the architecture in json format, load the model architecture and then load the weights, but it did not work. Then I tried to save the whole model via keras.models.save_model, but it did not work either.
In both cases, they returned
AttributeError: module 'tensorflow' has no attribute 'placeholder'
How can I save/load the whole model (if not at once, loading architecture / weight separately is fine too) ?
Here is the whole error log
AttributeError Traceback (most recent call last)
<ipython-input-31-47468f2533ad> in <module>()
1 from keras.models import load_model
2
----> 3 model2 = load_model('testest.h5')
13 frames
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py in load_wrapper(*args, **kwargs)
456 os.remove(tmp_filepath)
457 return res
--> 458 return load_function(*args, **kwargs)
459
460 return load_wrapper
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py in load_model(filepath, custom_objects, compile)
548 if H5Dict.is_supported_type(filepath):
549 with H5Dict(filepath, mode='r') as h5dict:
--> 550 model = _deserialize_model(h5dict, custom_objects, compile)
551 elif hasattr(filepath, 'write') and callable(filepath.write):
552 def load_function(h5file):
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py in _deserialize_model(h5dict, custom_objects, compile)
241 raise ValueError('No model found in config.')
242 model_config = json.loads(model_config.decode('utf-8'))
--> 243 model = model_from_config(model_config, custom_objects=custom_objects)
244 model_weights_group = h5dict['model_weights']
245
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py in model_from_config(config, custom_objects)
591 '`Sequential.from_config(config)`?')
592 from ..layers import deserialize
--> 593 return deserialize(config, custom_objects=custom_objects)
594
595
/usr/local/lib/python3.6/dist-packages/keras/layers/__init__.py in deserialize(config, custom_objects)
166 module_objects=globs,
167 custom_objects=custom_objects,
--> 168 printable_module_name='layer')
/usr/local/lib/python3.6/dist-packages/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
145 config['config'],
146 custom_objects=dict(list(_GLOBAL_CUSTOM_OBJECTS.items()) +
--> 147 list(custom_objects.items())))
148 with CustomObjectScope(custom_objects):
149 return cls.from_config(config['config'])
/usr/local/lib/python3.6/dist-packages/keras/engine/network.py in from_config(cls, config, custom_objects)
1041 # First, we create all layers and enqueue nodes to be processed
1042 for layer_data in config['layers']:
-> 1043 process_layer(layer_data)
1044
1045 # Then we process nodes in order of layer depth.
/usr/local/lib/python3.6/dist-packages/keras/engine/network.py in process_layer(layer_data)
1027
1028 layer = deserialize_layer(layer_data,
-> 1029 custom_objects=custom_objects)
1030 created_layers[layer_name] = layer
1031
/usr/local/lib/python3.6/dist-packages/keras/layers/__init__.py in deserialize(config, custom_objects)
166 module_objects=globs,
167 custom_objects=custom_objects,
--> 168 printable_module_name='layer')
/usr/local/lib/python3.6/dist-packages/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
147 list(custom_objects.items())))
148 with CustomObjectScope(custom_objects):
--> 149 return cls.from_config(config['config'])
150 else:
151 # Then `cls` may be a function returning a class.
/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py in from_config(cls, config)
1101 A layer instance.
1102 """
-> 1103 return cls(**config)
1104
1105 def count_params(self):
/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your `' + object_name + '` call to the ' +
90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper
/usr/local/lib/python3.6/dist-packages/keras/engine/input_layer.py in __init__(self, input_shape, batch_size, batch_input_shape, dtype, input_tensor, sparse, name)
85 dtype=dtype,
86 sparse=self.sparse,
---> 87 name=self.name)
88 else:
89 self.is_placeholder = False
/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py in placeholder(shape, ndim, dtype, sparse, name)
539 x = tf.sparse_placeholder(dtype, shape=shape, name=name)
540 else:
--> 541 x = tf.placeholder(dtype, shape=shape, name=name)
542 x._keras_shape = shape
543 x._uses_learning_phase = False
AttributeError: module 'tensorflow' has no attribute 'placeholder'
Provide the whole error log, not just a part of it.
If it's really error because of saving then what about model.save('model.h5')?
Not using the module from tf.keras.models but call a methomd from the Model class itself.
But why these code?
with tf.compat.v1.Session() as session:
K.set_session(session)
session.run(tf.compat.v1.global_variables_initializer())
session.run(tf.compat.v1.tables_initializer())
I believe you could juat call model.fit right away, and your tf is version2 right? Why call compat.v1?
Tensorflow 2 doesn't have Placeholder so I think it's maybe about this.
Worked fine with tensorflow version 1.15
Looking forward for tf-hub to become completely compatible with tensorflow 2.0 and keras...