tf.data input pipeline, dynamic shaped tensors and slicing: NotImplementedError: Cannot convert a symbolic Tensor (args_0:0) to a numpy array - tensorflow

I am trying to write an efficient data input using tensorflow and tf.data.
I want to replicate the functionality of PIL.Image.Image.crop, where one can pass negative bounding box values such that the crop is expanded with zeros.
For example, if I call PIL.Image.Image.crop([-10, 0, img_height, img_width]), the image has 10 additional rows at the beginning filled with zeros.
As far as I understood, using python functions in the tf.data input pipeline can slow down the code significantly, therefore I try to write everything using tensorflow functions. I also want to use prefetching, batching, shuffling etc. which is already provided by the API.
My plan to implement the PIL crop using tensorflow functions is to preallocate a (dynamically shaped) zero tensor and assign the cropped values using slicing.
This is the error I ran into: NotImplementedError: Cannot convert a symbolic Tensor (args_2:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported
Here is minimal code to replicate the issue:
import tensorflow as tf
ds2 = tf.data.Dataset.from_tensor_slices(np.arange(10))
def preproc(number):
o = tf.zeros((number,number))
# ...
ds2.map(preproc)
Question 1: How can I solve that issue?
Question 2: I am coming from a PyTorch background and I am confused about the complexity of the whole tf.data pipeline. Why am I restricted to tensorflow functions in order to use all the nice features?
For reference, here is my complete code so far.
class MyPreprocessing:
def __call__(self, *data):
# Complete code omitted for simplicity.
# This is called somewhere:
self._crop(...)
def _crop(self, img, center, body_size, res):
tl = tf.cast(center - body_size/2, dtype=tf.int32)
br = tf.cast(center + body_size/2, dtype=tf.int32)
height, width = tf.shape(img)[0], tf.shape(img)[1]
crop_tl = tf.stack([
tf.cond(tl[0] < 0, lambda: tf.constant(0, dtype=tf.int32), lambda: tl[0]),
tf.cond(tl[1] < 0, lambda: tf.constant(0, dtype=tf.int32), lambda: tl[1])])
crop_br = tf.stack([
tf.cond(br[0] > height, lambda: height, lambda: br[0]),
tf.cond(br[1] > width, lambda: width, lambda: br[1])])
crop = tf.image.crop_to_bounding_box(
img,
crop_tl[0],
crop_tl[1],
crop_br[0] - crop_tl[1],
crop_br[1] - crop_tl[1])
new_tl = crop_tl - tl
new_br = crop_br - tl
# Error:
new_img = tf.zeros((body_size, body_size, tf.constant(3)), dtype=tf.float32)
new_img[new_tl[0]:new_br[0], new_tl[1]:new_br[1]].assign(crop)
return tf.image.resize(new_img, (res, res))
Edit: Full Stacktrace
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-2-84d20c31b9c9> in <module>
8 # ...
9
---> 10 ds2.map(preproc)
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in map(self, map_func, num_parallel_calls, deterministic)
1923 warnings.warn("The `deterministic` argument has no effect unless the "
1924 "`num_parallel_calls` argument is specified.")
-> 1925 return MapDataset(self, map_func, preserve_cardinality=True)
1926 else:
1927 return ParallelMapDataset(
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, input_dataset, map_func, use_inter_op_parallelism, preserve_cardinality, use_legacy_function)
4481 self._use_inter_op_parallelism = use_inter_op_parallelism
4482 self._preserve_cardinality = preserve_cardinality
-> 4483 self._map_func = StructuredFunctionWrapper(
4484 map_func,
4485 self._transformation_name(),
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs)
3710 resource_tracker = tracking.ResourceTracker()
3711 with tracking.resource_tracker_scope(resource_tracker):
-> 3712 self._function = fn_factory()
3713 # There is no graph to add in eager mode.
3714 add_to_graph &= not context.executing_eagerly()
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py in get_concrete_function(self, *args, **kwargs)
3132 or `tf.Tensor` or `tf.TensorSpec`.
3133 """
-> 3134 graph_function = self._get_concrete_function_garbage_collected(
3135 *args, **kwargs)
3136 graph_function._garbage_collector.release() # pylint: disable=protected-access
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _get_concrete_function_garbage_collected(self, *args, **kwargs)
3098 args, kwargs = None, None
3099 with self._lock:
-> 3100 graph_function, _ = self._maybe_define_function(args, kwargs)
3101 seen_names = set()
3102 captured = object_identity.ObjectIdentitySet(
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
3442
3443 self._function_cache.missed.add(call_context_key)
-> 3444 graph_function = self._create_graph_function(args, kwargs)
3445 self._function_cache.primary[cache_key] = graph_function
3446
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
3277 arg_names = base_arg_names + missing_arg_names
3278 graph_function = ConcreteFunction(
-> 3279 func_graph_module.func_graph_from_py_func(
3280 self._name,
3281 self._python_function,
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
997 _, original_func = tf_decorator.unwrap(python_func)
998
--> 999 func_outputs = python_func(*func_args, **func_kwargs)
1000
1001 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in wrapped_fn(*args)
3685 attributes=defun_kwargs)
3686 def wrapped_fn(*args): # pylint: disable=missing-docstring
-> 3687 ret = wrapper_helper(*args)
3688 ret = structure.to_tensor_list(self._output_structure, ret)
3689 return [ops.convert_to_tensor(t) for t in ret]
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in wrapper_helper(*args)
3615 if not _should_unpack(nested_args):
3616 nested_args = (nested_args,)
-> 3617 ret = autograph.tf_convert(self._func, ag_ctx)(*nested_args)
3618 if _should_pack(ret):
3619 ret = tuple(ret)
~/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs)
693 except Exception as e: # pylint:disable=broad-except
694 if hasattr(e, 'ag_error_metadata'):
--> 695 raise e.ag_error_metadata.to_exception(e)
696 else:
697 raise
NotImplementedError: in user code:
<ipython-input-2-84d20c31b9c9>:7 preproc *
o = tf.zeros((number,number))
/home/benjs/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:206 wrapper **
return target(*args, **kwargs)
/home/benjs/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2911 wrapped
tensor = fun(*args, **kwargs)
/home/benjs/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2960 zeros
output = _constant_if_small(zero, shape, dtype, name)
/home/benjs/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2896 _constant_if_small
if np.prod(shape) < 1000:
<__array_function__ internals>:5 prod
/home/benjs/Documents/projects/hpe/venv/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3030 prod
return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
/home/benjs/Documents/projects/hpe/venv/lib/python3.8/site-packages/numpy/core/fromnumeric.py:87 _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/benjs/Documents/projects/hpe/venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:867 __array__
raise NotImplementedError(
NotImplementedError: Cannot convert a symbolic Tensor (args_0:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported

Related

How to fix TypeError: x and y must have the same dtype, got tf.int32 != tf.float32 with tensorflow using PyGad?

I have defined a new layer to be put as an input layer using Keras. The code is:
class layer(tensorflow.keras.layers.Layer):
def __init__(self):
super(layer, self).__init__()
H_init = tf.random_normal_initializer()
self.H = tf.Variable(
initial_value=H_init(shape=(1,), dtype="float32"),
trainable=True,
)
b_init = tf.zeros_initializer()
self.b = tf.Variable(
initial_value=b_init(shape=(1,), dtype="float32"), trainable=True
)
n_init = tf.zeros_initializer()
self.n = tf.Variable(
initial_value=n_init(shape=(1,), dtype="float32"), trainable=True
)
def call(self, z):
return self.H**2 / (1+(1/self.b)**(2/self.n)) *(1+((1+z)/self.b)**(2/self.n))
I intend to put this layer as input in my generation callback function, which is
`def callback_generation(ga_instance):
print("Generation"= {generation}".format(generation=ga_instance.generations_completed))
print("Fitness =
{fitness}".format(fitness=ga_instance.best_solution()[1]))
inputs = tensorflow.keras.Input(shape=(1,), name="inputs")
targets = tensorflow.keras.Input(shape=(1,), name="targets")
logits = tensorflow.keras.layers.Dense(15)(inputs)
predictions = layer(name="predictions")(logits, targets)
model = keras.Model(inputs=[inputs, targets],
outputs=predictions)
data = {
"inputs": z,
"targets": H_z,
}
model = tensorflow.keras.Sequential(inputs=input_layer,
outputs=output_layer)
weights_vector =
tensorflow.pygad.kerasga.model_weights_as_vector(model=model)
keras_ga = pygad.kerasga.KerasGA(model=model,
num_solutions=12)
model.summary()`
This gives me an error reporting that the tf__call() method takes 2 positional arguments but 3 were given.
This is the detailed error traceback:
TypeError Traceback (most
recent call last)
<ipython-input-101-e09c7d4f3228> in <module>
20 targets = tensorflow.keras.Input(shape=(1,),
name="targets")
21 logits = tensorflow.keras.layers.Dense(15)(inputs)
---> 22 predictions = layer(name="predictions")(logits,
targets)
23
24 model = keras.Model(inputs=[inputs, targets],
outputs=predictions)
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py in
__call__(self, *args, **kwargs)
950 if _in_functional_construction_mode(self, inputs,
args, kwargs, input_list):
951 return self._functional_construction_call(inputs,
args, kwargs,
--> 952
input_list)
953
954 # Maintains info about the `Layer.call` stack.
~/anaconda3/lib/python3.7/site-
packages/tensorflow/python/keras/engine/base_layer.py in
_functional_construction_call(self, inputs, args, kwargs,
input_list)
1089 # Check input assumptions set after layer
building, e.g. input shape.
1090 outputs = self._keras_tensor_symbolic_call(
-> 1091 inputs, input_masks, args, kwargs)
1092
1093 if outputs is None:
~/anaconda3/lib/python3.7/site-
packages/tensorflow/python/keras/engine/base_layer.py in
_keras_tensor_symbolic_call(self, inputs, input_masks, args,
kwargs)
820 return
nest.map_structure(keras_tensor.KerasTensor, output_signature)
821 else:
--> 822 return self._infer_output_signature(inputs, args,
kwargs, input_masks)
823
824 def _infer_output_signature(self, inputs, args,
kwargs, input_masks):
~/anaconda3/lib/python3.7/site-
packages/tensorflow/python/keras/engine/base_layer.py in
_infer_output_signature(self, inputs, args, kwargs,
input_masks)
861 # TODO(kaftan): do we maybe_build here, or
have we already done it?
862 self._maybe_build(inputs)
--> 863 outputs = call_fn(inputs, *args, **kwargs)
864
865 self._handle_activity_regularization(inputs,
outputs)
~/anaconda3/lib/python3.7/site-
packages/tensorflow/python/autograph/impl/api.py in
wrapper(*args, **kwargs)
668 except Exception as e: # pylint:disable=broad-
except
669 if hasattr(e, 'ag_error_metadata'):
--> 670 raise e.ag_error_metadata.to_exception(e)
671 else:
672 raise
TypeError: in user code:
TypeError: tf__call() takes 2 positional arguments but 3 were given

Tokenize dataset using map on tf.data.Dataset.from_tensor_slices(....)

Note: I am using the free TPU provided on Kaggle.
I want to tokenize the text using transformers such that I tokenize only the batch while training the model instead of first tokenizing the whole dataset and then creating batches from the tokenized dataset as it flows OOM and is also inefficient. Below is a basic overview of what I want
tokenizer = transformers.RobertaTokenizerFast.from_pretrained('roberta-base')
def tokenize(text, labels):
tokenized = tokenizer(text, padding=True, truncation=True, max_length=MAX_LEN)
ids = tokenized['input_ids']
mask = tokenized['attention_mask']
return (ids, mask), labels
train_dataset = tf.data.Dataset.from_tensor_slices((text, train_label_chunk)).batch(BATCH_SIZE)
train_dataset = train_dataset.map(tokenize)
Below is the error it gives. I won't share the whole trace as the error is pretty clear
ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples)
while should be solved by something like this
for i in train_dataset:
sample = i[0]
break
sample.numpy()[0].decode()
which gives a proper string but decoding every single tf.string is not possible. Also, it gives an error anyway when I try this
def tokenize(text, labels):
text = text.numpy()
tokenized = tokenizer(text, padding=True, truncation=True, max_length=MAX_LEN)
ids = tokenized['input_ids']
mask = tokenized['attention_mask']
return (ids, mask), labels
error
AttributeError: in user code:
<ipython-input-37-857b904b7110>:2 tokenize *
text = text.numpy()
AttributeError: 'Tensor' object has no attribute 'numpy'
I am not sure why is it there but in any case, this can't be done. The following GitHub trace can also be seen on the same topic here
Below are some other things that I tried. First I created a new dataset class
class TrainDataset():
def __init__(self, text, label, batch_size):
self.text = text
self.label = label
self.batch_size = batch_size
def __len__(self):
return len(self.text) // self.batch_size
def __getitem__(self, idx):
text = self.text[idx*self.batch_size:(idx+1)*self.batch_size]
label = self.label[idx*self.batch_size:(idx+1)*self.batch_size]
return text, label
ds = TrainDataset()
def train_loop(train_dataset):
with strategy.scope():
for step, (x, y) in enumerate(train_dataset):
train_data = tokenizer(x, padding=True, truncation=True, max_length=MAX_LEN, return_tensors='tf')
inputs = (train_data['input_ids'], train_data['attention_mask'])
with tf.GradientTape() as tape:
preds = model(inputs, training=True)
loss_value = loss_fun(y, preds)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
break
train_loop(ds)
which yields the following error
ValueError: Please use `tf.keras.losses.Reduction.SUM` or `tf.keras.losses.Reduction.NONE` for loss reduction when losses are used with `tf.distribute.Strategy` outside of the built-in training loops. You can implement `tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE` using global batch size like:
with strategy.scope():
loss_obj = tf.keras.losses.CategoricalCrossentropy(reduction=tf.keras.losses.Reduction.NONE)
....
loss = tf.reduce_sum(loss_obj(labels, predictions)) * (1. / global_batch_size)
Please see https://www.tensorflow.org/tutorials/distribute/custom_training for more details.
After which, I changed loss_fun to loss_object as below (Also changed the activation of the last layer to get the logits)
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
which gives the below mentioned error
RuntimeError: `apply_gradients() cannot be called in cross-replica context. Use `tf.distribute.Strategy.run` to enter replica context.
At this point I wrote all custom function
def train_step(inputs):
x, y = inputs
with tf.GradientTape() as tape:
predictions = model(x, training=True)
loss = compute_loss(y, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_accuracy.update_state(y, predictions)
return loss
#tf.function
def distributed_train_step(dataset_inputs):
x, y = dataset_inputs
train_data = tokenizer(x, padding=True, truncation=True, max_length=MAX_LEN, return_tensors='tf')
inputs = (train_data['input_ids'], train_data['attention_mask'])
dataset_inputs = (inputs, y)
per_replica_losses = strategy.run(train_step, args=(dataset_inputs,))
return strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,
axis=None)
for epoch in range(2):
# TRAIN LOOP
total_loss = 0.0
num_batches = 0
for x in tqdm(ds):
total_loss += distributed_train_step(x)
num_batches += 1
train_loss = total_loss / num_batches
template = ("Epoch {}, Loss: {}, Accuracy: {}")
print(template.format(epoch+1, train_loss, train_accuracy.result()*100))
train_accuracy.reset_states()
which fortunately did ran but gave the below error
StagingError Traceback (most recent call last)
<ipython-input-24-2cda132cf9fa> in <module>
4 num_batches = 0
5 for x in tqdm(ds):
----> 6 total_loss += distributed_train_step(x)
7 num_batches += 1
8 train_loss = total_loss / num_batches
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
860 # In this case we have not created variables on the first call. So we can
861 # run the first trace but we should fail if variables are created.
--> 862 results = self._stateful_fn(*args, **kwds)
863 if self._created_variables:
864 raise ValueError("Creating variables on a non-first call to a function"
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
2939 with self._lock:
2940 (graph_function,
-> 2941 filtered_flat_args) = self._maybe_define_function(args, kwargs)
2942 return graph_function._call_flat(
2943 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
3359
3360 self._function_cache.missed.add(call_context_key)
-> 3361 graph_function = self._create_graph_function(args, kwargs)
3362 self._function_cache.primary[cache_key] = graph_function
3363
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
3204 arg_names=arg_names,
3205 override_flat_arg_shapes=override_flat_arg_shapes,
-> 3206 capture_by_value=self._capture_by_value),
3207 self._function_attributes,
3208 function_spec=self.function_spec,
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
988 _, original_func = tf_decorator.unwrap(python_func)
989
--> 990 func_outputs = python_func(*func_args, **func_kwargs)
991
992 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
632 xla_context.Exit()
633 else:
--> 634 out = weak_wrapped_fn().__wrapped__(*args, **kwds)
635 return out
636
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
975 except Exception as e: # pylint:disable=broad-except
976 if hasattr(e, "ag_error_metadata"):
--> 977 raise e.ag_error_metadata.to_exception(e)
978 else:
979 raise
StagingError: in user code:
<ipython-input-19-9d8bdb5f7f7c>:4 distributed_train_step *
train_data = tokenizer(x, padding=True, truncation=True, max_length=MAX_LEN, return_tensors='tf')
/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py:2305 __call__ *
**kwargs,
/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_base.py:2490 batch_encode_plus *
**kwargs,
/opt/conda/lib/python3.7/site-packages/transformers/models/gpt2/tokenization_gpt2_fast.py:163 _batch_encode_plus *
return super()._batch_encode_plus(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py:418 _batch_encode_plus *
for key in tokens_and_encodings[0][0].keys():
IndexError: list index out of range
The IndexError: list index out of range might be solved but the speed of training was really really slow and hence I think something is wrong.
At this point, any help will be highly appreciated.

Resize image and create tfexample for tensorflow 2 dataset results in error

I'm using Tensorflow 2.2, and trying to convert a model into TensorRT. I am following an example, which successfully works for models that accept images as input. Unfortunately, I froze a model which accepts TF Example as input instead of image. Now, trying to create the tf dataset pipeline has become a nightmare.
My code is:
def get_dataset(images_dir,
annotation_path,
batch_size,
input_size,
dtype=tf.float32):
image_ids = None
coco = COCO(annotation_file=annotation_path)
image_ids = coco.getImgIds()
image_paths = []
for image_id in image_ids:
coco_img = coco.imgs[image_id]
image_paths.append(os.path.join(images_dir, coco_img['file_name']))
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
def conv_jpeg_to_tfexample_tensor(input_img_):
feature_dict = {
'image/encoded': dataset_util.bytes_feature(input_img_)
}
temp_var = tf.train.Features(feature=feature_dict)
file_ex = tf.train.Example(features=temp_var).SerializeToString()
return tf.convert_to_tensor(file_ex)
def preprocess_fn(path):
image = tf.io.read_file(path)
if input_size is not None:
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, size=(input_size, input_size))
image = tf.cast(image, tf.uint8)
image = tf.image.encode_jpeg(image) #.numpy()
return image
dataset = dataset.map(map_func=preprocess_fn, num_parallel_calls=3)
dataset = dataset.map(map_func=conv_jpeg_to_tfexample_tensor, num_parallel_calls=3)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat(count=1)
return dataset, image_ids
This results in error with this usage:
dataset, image_ids = get_dataset(
images_dir=args.data_dir,
annotation_path=args.annotation_path,
batch_size=args.batch_size,
input_size=args.input_size)
Error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-152-193739a79a8a> in <module>
5 batch_size=args.batch_size,
----> 6 input_size=args.input_size)
<ipython-input-151-1d1f15019758> in get_dataset(images_dir, annotation_path, batch_size, input_size, dtype)
76 dataset = dataset.map(map_func=preprocess_fn, num_parallel_calls=3)
---> 77 dataset = dataset.map(map_func=conv_jpeg_to_tfexample_tensor, num_parallel_calls=3)
78 dataset = dataset.batch(batch_size)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py in map(self, map_func, num_parallel_calls, deterministic)
1626 num_parallel_calls,
1627 deterministic,
-> 1628 preserve_cardinality=True)
1629
1630 def flat_map(self, map_func):
/opt/conda/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, input_dataset, map_func, num_parallel_calls, deterministic, use_inter_op_parallelism, preserve_cardinality, use_legacy_function)
4018 self._transformation_name(),
4019 dataset=input_dataset,
-> 4020 use_legacy_function=use_legacy_function)
4021 if deterministic is None:
4022 self._deterministic = "default"
/opt/conda/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, func, transformation_name, dataset, input_classes, input_shapes, input_types, input_structure, add_to_graph, use_legacy_function, defun_kwargs)
3219 with tracking.resource_tracker_scope(resource_tracker):
3220 # TODO(b/141462134): Switch to using garbage collection.
-> 3221 self._function = wrapper_fn.get_concrete_function()
3222
3223 if add_to_graph:
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in get_concrete_function(self, *args, **kwargs)
2530 """
2531 graph_function = self._get_concrete_function_garbage_collected(
-> 2532 *args, **kwargs)
2533 graph_function._garbage_collector.release() # pylint: disable=protected-access
2534 return graph_function
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _get_concrete_function_garbage_collected(self, *args, **kwargs)
2494 args, kwargs = None, None
2495 with self._lock:
-> 2496 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
2497 if self.input_signature:
2498 args = self.input_signature
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
2775
2776 self._function_cache.missed.add(call_context_key)
-> 2777 graph_function = self._create_graph_function(args, kwargs)
2778 self._function_cache.primary[cache_key] = graph_function
2779 return graph_function, args, kwargs
/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
2665 arg_names=arg_names,
2666 override_flat_arg_shapes=override_flat_arg_shapes,
-> 2667 capture_by_value=self._capture_by_value),
2668 self._function_attributes,
2669 # Tell the ConcreteFunction to clean up its graph once it goes out of
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
979 _, original_func = tf_decorator.unwrap(python_func)
980
--> 981 func_outputs = python_func(*func_args, **func_kwargs)
982
983 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/opt/conda/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py in wrapper_fn(*args)
3212 attributes=defun_kwargs)
3213 def wrapper_fn(*args): # pylint: disable=missing-docstring
-> 3214 ret = _wrapper_helper(*args)
3215 ret = structure.to_tensor_list(self._output_structure, ret)
3216 return [ops.convert_to_tensor(t) for t in ret]
/opt/conda/lib/python3.7/site-packages/tensorflow/python/data/ops/dataset_ops.py in _wrapper_helper(*args)
3154 nested_args = (nested_args,)
3155
-> 3156 ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
3157 # If `func` returns a list of tensors, `nest.flatten()` and
3158 # `ops.convert_to_tensor()` would conspire to attempt to stack
/opt/conda/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs)
263 except Exception as e: # pylint:disable=broad-except
264 if hasattr(e, 'ag_error_metadata'):
--> 265 raise e.ag_error_metadata.to_exception(e)
266 else:
267 raise
TypeError: in user code:
<ipython-input-143-1d1f15019758>:53 conv_jpeg_to_tfexample_tensor *
feature_dict = {
/opt/conda/lib/python3.7/site-packages/object_detection/utils/dataset_util.py:34 bytes_feature *
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
TypeError: <tf.Tensor 'args_0:0' shape=() dtype=string> has type Tensor, but expected one of: bytes
I was to able to reproduce the error you are facing using a simple bird image.
Code to recreate the error -
%tensorflow_version 2.x
import tensorflow as tf
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array, array_to_img
from matplotlib import pyplot as plt
import numpy as np
from object_detection.utils import dataset_util
def load_file_and_process(path):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.central_crop(image, np.random.uniform(0.50, 1.00))
image = tf.cast(image, tf.uint8)
image = tf.image.encode_jpeg(image)
return image
train_dataset = tf.data.Dataset.list_files('/content/bird.jpg')
train_dataset = train_dataset.map(load_file_and_process)
def conv_jpeg_to_tfexample_tensor(input_img_):
feature_dict = {
'image/encoded': dataset_util.bytes_feature(input_img_)
}
temp_var = tf.train.Features(feature=feature_dict)
file_ex = tf.train.Example(features=temp_var).SerializeToString()
return tf.convert_to_tensor(file_ex)
train_dataset = train_dataset.map(map_func=conv_jpeg_to_tfexample_tensor, num_parallel_calls=3)
Output -
<MapDataset shapes: (), types: tf.string>
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-44-89ae6292ad21> in <module>()
28 return tf.convert_to_tensor(file_ex)
29
---> 30 train_dataset = train_dataset.map(map_func=conv_jpeg_to_tfexample_tensor, num_parallel_calls=3)
10 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py in wrapper(*args, **kwargs)
256 except Exception as e: # pylint:disable=broad-except
257 if hasattr(e, 'ag_error_metadata'):
--> 258 raise e.ag_error_metadata.to_exception(e)
259 else:
260 raise
TypeError: in user code:
<ipython-input-44-89ae6292ad21>:23 conv_jpeg_to_tfexample_tensor *
feature_dict = {
/usr/local/lib/python3.6/dist-packages/object_detection/utils/dataset_util.py:30 bytes_feature *
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
TypeError: <tf.Tensor 'args_0:0' shape=() dtype=string> has type Tensor, but expected one of: bytes
Would recommend you to refer this tutorial that explains end-to-end example of how to read and write image data using TFRecords.
Referring the tutorial, I have written the TFRecord for a single image.
Code -
# This is an example, just using the bird image.
image_string = open('/content/bird.jpg', 'rb').read()
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
if isinstance(value, type(tf.constant(0))):
value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
# Create a dictionary with features that may be relevant.
def image_example(image_string):
image_shape = tf.image.decode_jpeg(image_string).shape
feature = {
'height': _int64_feature(image_shape[0]),
'width': _int64_feature(image_shape[1]),
'depth': _int64_feature(image_shape[2]),
'image_raw': _bytes_feature(image_string),
}
return tf.train.Example(features=tf.train.Features(feature=feature))
for line in str(image_example(image_string)).split('\n')[:15]:
print(line)
record_file = 'images.tfrecords'
with tf.io.TFRecordWriter(record_file) as writer:
image_string = open('/content/bird.jpg', 'rb').read()
tf_example = image_example(image_string)
writer.write(tf_example.SerializeToString())
Output -
features {
feature {
key: "depth"
value {
int64_list {
value: 3
}
}
}
feature {
key: "height"
value {
int64_list {
value: 426
}

TF Dataset. Error = tensorflow:Error reported to Coordinator: No gradients provided for any variable: Tensorflow 2.2.0

I get the error INFO:tensorflow:Error reported to Coordinator: No gradients provided for any variable: when I run the code below.
You'll need the following to run the code
Tensorflow 2.2.0
efficientnet
keras_bert (https://pypi.org/project/keras-bert/)
numpy
pandas
You'll also need to download pre-trained BERT model weights from https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-12_H-768_A-12.zip
The code uses tensorflow dataset API to generate data on the fly.
import tensorflow
print('TensorFlow version =', tensorflow.__version__)
AUTO = tensorflow.data.experimental.AUTOTUNE
import efficientnet.tfkeras as efn
from efficientnet.tfkeras import preprocess_input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout, Input, Embedding, LSTM, Add
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.image import img_to_array as img_to_array
from tensorflow.keras.preprocessing.image import load_img as load_img
from tensorflow.keras.optimizers import SGD
import codecs
from keras_bert import load_trained_model_from_checkpoint
import ast
import pandas as pd
from keras_bert import Tokenizer
import numpy as np
import os
TF_KERAS = 1
pretrained_path = '../Data/BERT/uncased_L-12_H-768_A-12'
config_path = os.path.join(pretrained_path, 'bert_config.json')
checkpoint_path = os.path.join(pretrained_path, 'bert_model.ckpt')
vocab_path = os.path.join(pretrained_path, 'vocab.txt')
SEQ_LEN = 128
token_dict = {}
with codecs.open(vocab_path, 'r', 'utf8') as reader:
for line in reader:
token = line.strip()
token_dict[token] = len(token_dict)
tokenizer = Tokenizer(token_dict)
EPOCHS = 5
NUM_CLASSES = 10
def get_model(base_model, bert_model, NUM_CLASSES, emdedding_size=768):
# add a global spatial average pooling layer
x = base_model.output
x = Dropout(0.05)(x)
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
x = Dense(emdedding_size, activation='relu')(x)
# sequence model
dense = bert_model.get_layer('NSP-Dense').output
# decoder model
decoder1 = Add()([x, dense])
decoder2 = Dense(emdedding_size, activation='relu')(decoder1)
output = Dense(NUM_CLASSES, activation='softmax', name='output')(decoder2)
# tie it together
model = Model(inputs={'input_1': base_model.input, \
'Input-Token': bert_model.inputs[0],\
'Input-Segment': bert_model.inputs[1]}, \
outputs={'output': output})
return model
gpus = tensorflow.config.list_physical_devices('GPU'); print(gpus)
if len(gpus)==1: strategy = tensorflow.distribute.OneDeviceStrategy(device="/gpu:0")
else: strategy = tensorflow.distribute.MirroredStrategy()
max_length = 20
DIM = 224
with strategy.scope():
base_model = efn.EfficientNetB4(weights='imagenet', include_top=False, input_shape=(DIM,DIM,3)) # or weights='noisy-student'
bert_model = load_trained_model_from_checkpoint(
config_path,
checkpoint_path,
training=True,
trainable=True,
seq_len=SEQ_LEN,
)
model = get_model(base_model, bert_model, NUM_CLASSES)
model.compile(optimizer=SGD(lr=.00001, momentum = 0.9), loss ='categorical_crossentropy', metrics=['categorical_accuracy'])
def doaugmentation(img, rand_num=None):
if rand_num==None:
rand_num = random.randint(0, 2)
if rand_num == 0 :
return img
elif rand_num == 1: # brightness
return tensorflow.image.random_brightness( img, 0.4, seed=1 )
else:
return img
def get_dataset(csv_path, mode, batch_size, data_path, debug=False):
if debug: print ('[+] Inside the data function')
df = pd.read_csv(csv_path)
if debug: print ('[+] Read the csv file; shape=', df.shape)
image_paths = df.apply(lambda x: os.path.join(data_path, x['image_name']), axis=1).tolist()
if debug: print ('[+] Image paths recieved')
descriptions = df['text'].apply(lambda x: x.lower()).tolist()
if debug: print ('[+] Descriptions lower cased')
if mode != 'test': ## output
if debug: print ('[+] Mode= {}'.format(mode))
output = df['output'].apply(lambda x: ast.literal_eval(x)).tolist()
if debug: print ('[+] All Ids received')
dataset = tensorflow.data.Dataset.from_tensor_slices((image_paths, descriptions, output))
if debug: print ('[+] Tensor to slices done')
dataset = dataset.shuffle(len(df))
if debug: print ('[+] Dataset shuffled')
else:
dataset = tensorflow.data.Dataset.from_tensor_slices((image_paths, descriptions, [None]*len(image_paths)))
dataset = dataset.batch(batch_size)
if debug: print ('[+] Batch generated')
dataset = dataset.map(lambda img_path, description, output: tensorflow.py_function(process_data,\
[img_path, description, output],\
[tensorflow.float32, tensorflow.float32, tensorflow.float32, tensorflow.int32]), num_parallel_calls=AUTO)
if debug: print ('[+] Final Map done')
dataset = dataset.map(split, num_parallel_calls=AUTO)
if debug: print ('[+] Prefetching now...')
dataset = dataset.prefetch(AUTO)
return dataset
def split(image, description, description_like, output):
return {'input_2': image, 'Input-Token':description, 'Input-Segment': description_like, 'output':output}
def process_data(img_paths, descriptions, output):
global DIM
images = [process_image(img_path, DIM) for img_path in img_paths.numpy()]
desription, desription_like = [process_text(description)[0] for description in descriptions], [process_text(description)[1] for description in descriptions]
if output[0].numpy().any() == None:
return images, desription, desription_like
return images, desription, desription_like, output
def process_image(img_path, im_size):
image_string = tensorflow.io.read_file(img_path)
image = tensorflow.image.decode_jpeg(image_string, channels=3)
image = tensorflow.image.convert_image_dtype(image, tensorflow.float32)
image = tensorflow.image.resize(image, [im_size, im_size])
return image
def process_text(text):
global tokenizer, SEQ_LEN
desription = tokenizer.encode(tensorflow.compat.as_str_any(text.numpy()), max_len=SEQ_LEN)[0]
desription_like = np.zeros_like(desription)
return desription, desription_like
batch_size = 1
dataset_train = get_dataset(file_name, 'train', batch_size, dir_path, True)
dataset_val = get_dataset(file_name, 'val', batch_size, dir_path, True)
H = model.fit(x=dataset_train,
validation_data=dataset_val,
verbose=1,
epochs=1)
You'll also need this file to be in the same directory
text,image_name,output
Honeywell MN Series Portable Air Conditioner with Dehumidifier & Fan for Rooms Up To 450 Sq. Ft.,picture1.jpeg,"[1, 0, 0, 0, 0, 0, 0]"
"TCL 10,000 BTU White Window Air Conditioner with Wi-Fi",picture2.png,"[1, 0, 0, 0, 0, 0, 0]"
Honeywell MN Series Portable Air Conditioner with Dehumidifier & Fan for Rooms Up To 450 Sq. Ft.,picture1.jpeg,"[1, 0, 0, 0, 0, 0, 0]"
"TCL 10,000 BTU White Window Air Conditioner with Wi-Fi",picture2.png,"[1, 0, 0, 0, 0, 0, 0]"
Honeywell MN Series Portable Air Conditioner with Dehumidifier & Fan for Rooms Up To 450 Sq. Ft.,picture1.jpeg,"[1, 0, 0, 0, 0, 0, 0]"
"TCL 10,000 BTU White Window Air Conditioner with Wi-Fi",picture2.png,"[1, 0, 0, 0, 0, 0, 0]"
Honeywell MN Series Portable Air Conditioner with Dehumidifier & Fan for Rooms Up To 450 Sq. Ft.,picture1.jpeg,"[1, 0, 0, 0, 0, 0, 0]"
"TCL 10,000 BTU White Window Air Conditioner with Wi-Fi",picture2.png,"[1, 0, 0, 0, 0, 0, 0]"
and these pictures
I get the following error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-f9c0025d2202> in <module>
2 validation_data=dataset_val,
3 verbose=1,
----> 4 epochs=1)
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs)
64 def _method_wrapper(self, *args, **kwargs):
65 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
---> 66 return method(self, *args, **kwargs)
67
68 # Running inside `run_distribute_coordinator` already.
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
846 batch_size=batch_size):
847 callbacks.on_train_batch_begin(step)
--> 848 tmp_logs = train_function(iterator)
849 # Catch OutOfRangeError for Datasets of unknown size.
850 # This blocks until the batch has finished executing.
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
578 xla_context.Exit()
579 else:
--> 580 result = self._call(*args, **kwds)
581
582 if tracing_count == self._get_tracing_count():
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
625 # This is the first call of __call__, so we have to initialize.
626 initializers = []
--> 627 self._initialize(args, kwds, add_initializers_to=initializers)
628 finally:
629 # At this point we know that the initialization is complete (or less
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\eager\def_function.py in _initialize(self, args, kwds, add_initializers_to)
504 self._concrete_stateful_fn = (
505 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
--> 506 *args, **kwds))
507
508 def invalid_creator_scope(*unused_args, **unused_kwds):
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\eager\function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
2444 args, kwargs = None, None
2445 with self._lock:
-> 2446 graph_function, _, _ = self._maybe_define_function(args, kwargs)
2447 return graph_function
2448
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\eager\function.py in _maybe_define_function(self, args, kwargs)
2775
2776 self._function_cache.missed.add(call_context_key)
-> 2777 graph_function = self._create_graph_function(args, kwargs)
2778 self._function_cache.primary[cache_key] = graph_function
2779 return graph_function, args, kwargs
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
2665 arg_names=arg_names,
2666 override_flat_arg_shapes=override_flat_arg_shapes,
-> 2667 capture_by_value=self._capture_by_value),
2668 self._function_attributes,
2669 # Tell the ConcreteFunction to clean up its graph once it goes out of
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
979 _, original_func = tf_decorator.unwrap(python_func)
980
--> 981 func_outputs = python_func(*func_args, **func_kwargs)
982
983 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\eager\def_function.py in wrapped_fn(*args, **kwds)
439 # __wrapped__ allows AutoGraph to swap in a converted function. We give
440 # the function a weak reference to itself to avoid a reference cycle.
--> 441 return weak_wrapped_fn().__wrapped__(*args, **kwds)
442 weak_wrapped_fn = weakref.ref(wrapped_fn)
443
~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\func_graph.py in wrapper(*args, **kwargs)
966 except Exception as e: # pylint:disable=broad-except
967 if hasattr(e, "ag_error_metadata"):
--> 968 raise e.ag_error_metadata.to_exception(e)
969 else:
970 raise
ValueError: in user code:
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py:571 train_function *
outputs = self.distribute_strategy.run(
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\distribute\distribute_lib.py:951 run **
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\distribute\distribute_lib.py:2290 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\distribute\mirrored_strategy.py:770 _call_for_each_replica
fn, args, kwargs)
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\distribute\mirrored_strategy.py:201 _call_for_each_replica
coord.join(threads)
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\coordinator.py:389 join
six.reraise(*self._exc_info_to_raise)
C:\Users\i24009\Anaconda3\envs\py36TF2x1\lib\site-packages\six.py:703 reraise
raise value
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\training\coordinator.py:297 stop_on_exception
yield
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\distribute\mirrored_strategy.py:998 run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py:541 train_step **
self.trainable_variables)
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py:1804 _minimize
trainable_variables))
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:521 _aggregate_gradients
filtered_grads_and_vars = _filter_grads(grads_and_vars)
C:\Users\i24009\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:1219 _filter_grads
([v.name for _, v in grads_and_vars],))
Full error here https://pastebin.com/c25x2uxu
I'd like to seek community's guidance on the following:
What am I doing wrong in the above code?
Is there a better way to do this given that I am dealing with millions of training / validation images (I looked at data generator from TF dataset but haven't used it yet)
Any answers or suggestions would be highly appreciated.

ResNet model in Tensorflow Federated

I tried to customize the model in "Image classification" tutorial in Tensorflow Federated. (It originally used a sequential model)
I use Keras ResNet50 but when it began to train, there is always an error "Incompatible shapes"
Here are my codes:
NUM_CLIENTS = 4
NUM_EPOCHS = 10
BATCH_SIZE = 2
SHUFFLE_BUFFER = 5
def create_compiled_keras_model():
model = tf.keras.applications.resnet.ResNet50(include_top=False, weights='imagenet',
input_tensor=tf.keras.layers.Input(shape=(100,
300, 3)), pooling=None)
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.SGD(learning_rate=0.02),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
return model
def model_fn():
keras_model = create_compiled_keras_model()
return tff.learning.from_compiled_keras_model(keras_model, sample_batch)
iterative_process = tff.learning.build_federated_averaging_process(model_fn)
Error information:
enter image description here
I feel that the shape is incompatible because the epoch and clients information were somehow missing. Would be very thankful if someone could give me a hint.
Updates:
The Assertion error happened during tff.learning.build_federated_averaging_process
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-164-dac26193d9d8> in <module>()
----> 1 iterative_process = tff.learning.build_federated_averaging_process(model_fn)
2
3 # iterative_process = build_federated_averaging_process(model_fn)
13 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_federated/python/learning/federated_averaging.py in build_federated_averaging_process(model_fn, server_optimizer_fn, client_weight_fn, stateful_delta_aggregate_fn, stateful_model_broadcast_fn)
165 return optimizer_utils.build_model_delta_optimizer_process(
166 model_fn, client_fed_avg, server_optimizer_fn,
--> 167 stateful_delta_aggregate_fn, stateful_model_broadcast_fn)
/usr/local/lib/python3.6/dist-packages/tensorflow_federated/python/learning/framework/optimizer_utils.py in build_model_delta_optimizer_process(model_fn, model_to_client_delta_fn, server_optimizer_fn, stateful_delta_aggregate_fn, stateful_model_broadcast_fn)
349 # still need this.
350 with tf.Graph().as_default():
--> 351 dummy_model_for_metadata = model_utils.enhance(model_fn())
352
353 # ===========================================================================
<ipython-input-159-b2763ace8e5b> in model_fn()
1 def model_fn():
2 keras_model = model
----> 3 return tff.learning.from_compiled_keras_model(keras_model, sample_batch)
/usr/local/lib/python3.6/dist-packages/tensorflow_federated/python/learning/keras_utils.py in from_compiled_keras_model(keras_model, dummy_batch)
211 # Model.test_on_batch() once before asking for metrics.
212 if isinstance(dummy_tensors, collections.Mapping):
--> 213 keras_model.test_on_batch(**dummy_tensors)
214 else:
215 keras_model.test_on_batch(*dummy_tensors)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py in test_on_batch(self, x, y, sample_weight, reset_metrics)
1007 sample_weight=sample_weight,
1008 reset_metrics=reset_metrics,
-> 1009 standalone=True)
1010 outputs = (
1011 outputs['total_loss'] + outputs['output_losses'] + outputs['metrics'])
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py in test_on_batch(model, x, y, sample_weight, reset_metrics, standalone)
503 y,
504 sample_weights=sample_weights,
--> 505 output_loss_metrics=model._output_loss_metrics)
506
507 if reset_metrics:
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py in __call__(self, *args, **kwds)
568 xla_context.Exit()
569 else:
--> 570 result = self._call(*args, **kwds)
571
572 if tracing_count == self._get_tracing_count():
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py in _call(self, *args, **kwds)
606 # In this case we have not created variables on the first call. So we can
607 # run the first trace but we should fail if variables are created.
--> 608 results = self._stateful_fn(*args, **kwds)
609 if self._created_variables:
610 raise ValueError("Creating variables on a non-first call to a function"
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in __call__(self, *args, **kwargs)
2407 """Calls a graph function specialized to the inputs."""
2408 with self._lock:
-> 2409 graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
2410 return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
2411
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in _maybe_define_function(self, args, kwargs)
2765
2766 self._function_cache.missed.add(call_context_key)
-> 2767 graph_function = self._create_graph_function(args, kwargs)
2768 self._function_cache.primary[cache_key] = graph_function
2769 return graph_function, args, kwargs
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
2655 arg_names=arg_names,
2656 override_flat_arg_shapes=override_flat_arg_shapes,
-> 2657 capture_by_value=self._capture_by_value),
2658 self._function_attributes,
2659 # Tell the ConcreteFunction to clean up its graph once it goes out of
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
979 _, original_func = tf_decorator.unwrap(python_func)
980
--> 981 func_outputs = python_func(*func_args, **func_kwargs)
982
983 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py in wrapped_fn(*args, **kwds)
437 # __wrapped__ allows AutoGraph to swap in a converted function. We give
438 # the function a weak reference to itself to avoid a reference cycle.
--> 439 return weak_wrapped_fn().__wrapped__(*args, **kwds)
440 weak_wrapped_fn = weakref.ref(wrapped_fn)
441
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in wrapper(*args, **kwargs)
966 except Exception as e: # pylint:disable=broad-except
967 if hasattr(e, "ag_error_metadata"):
--> 968 raise e.ag_error_metadata.to_exception(e)
969 else:
970 raise
AssertionError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_eager.py:345 test_on_batch *
with backend.eager_learning_phase_scope(0):
/usr/lib/python3.6/contextlib.py:81 __enter__
return next(self.gen)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py:425 eager_learning_phase_scope
assert ops.executing_eagerly_outside_functions()
AssertionError:
Ah, I believe this issue is coming from mismatched expectations on sample_batch. TFF passes sample_batch to Keras, which calls a forward pass with this sample batch to initialize various attributes of the keras model. sample_batch should be either a sample from the literal data you are going to be feeding the model as on the server side, or a batch of fake data which matches the shape and type of the data you will be passing in.
An example of the former can be found here (this uses tf.data.Dataset), and there are several examples of the latter in test code, like here.
From what I see of the definition of the model, likely the x element of your sample_batch should be an ndarray of shape [2, 100, 300, 3] (where 2 is for the batch size, but technically this can be any nonzero dimension), and the y element should also match the expected y structure in the data you are using.
I hope this helps, just ping back if there are any problems!
One thing to note, that may be helpful in thinking about TFF--TFF is building a syntax tree representing the distributed computation you are defining via build_federated_averaging_process. This error actually occurs during construction of this object. TFF must trace the computation you pass it in order to know what structure to generate, and this is what is raising here. Actual training of the model happens when you call next on the returned IterativeProcess.
I have same problem:
if I execute this line
state, metrics = iterative_process.next(state, federated_train_data)
print('round 1, metrics={}'.format(metrics))
I find this error
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/sequential/vgg16/block1_pool/MaxPool}}]]
[[subcomputation/StatefulPartitionedCall_1/ReduceDataset]]
[[subcomputation/StatefulPartitionedCall_1/ReduceDataset/_140]]
(1) Invalid argument: Default MaxPoolingOp only supports NHWC on device type CPU
[[{{node StatefulPartitionedCall/StatefulPartitionedCall/sequential/vgg16/block1_pool/MaxPool}}]]
[[subcomputation/StatefulPartitionedCall_1/ReduceDataset]]
0 successful operations.
0 derived errors ignored.
knowin that I employe VGG16
have you any idea on this type of error