Number of images generated using data augmentation with object detection - tensorflow

I've tried to search the answer in the documentation, the code and here but I had no luck.
I'd like to know what is the final number of images that are generated by the data augmentation using the object detection API in Tensorflow.
For the sake of clarity I'd put an example: let's say that I have a dataset with 2 classes, each one of then with 50 images originally. Then I apply this config:
data_augmentation_options {
ssd_random_crop {
data_augmentation_options {
random_rgb_to_gray {
data_augmentation_options {
random_distort_color {
data_augmentation_options {
ssd_random_crop_pad_fixed_aspect_ratio {
How can I know the final number of images generated to train my model? (if there is a way). BTW, I'm using to train my model.
Thanks in advance.

In file, it can be seen in function augment_input_fn that all data augmentation options are passed to preprocessor.preprocess method.
The details are all in file, specifically in function preprocess:
for option in preprocess_options:
func, params = option
if func not in func_arg_map:
raise ValueError('The function %s does not exist in func_arg_map' %
arg_names = func_arg_map[func]
for a in arg_names:
if a is not None and a not in tensor_dict:
raise ValueError('The function %s requires argument %s' %
(func.__name__, a))
def get_arg(key):
return tensor_dict[key] if key is not None else None
args = [get_arg(a) for a in arg_names]
if (preprocess_vars_cache is not None and
'preprocess_vars_cache' in inspect.getargspec(func).args):
params['preprocess_vars_cache'] = preprocess_vars_cache
results = func(*args, **params)
if not isinstance(results, (list, tuple)):
results = (results,)
# Removes None args since the return values will not contain those.
arg_names = [arg_name for arg_name in arg_names if arg_name is not None]
for res, arg_name in zip(results, arg_names):
tensor_dict[arg_name] = res
Note that in the above code, arg_names contain all the original image names, that means each augmentation option will only be performed on the original images (not on those obtained after previous augmentation options).
Also in, we can see each augmentation option will produce only an image of the same shape as the original image.
So as a result, in your case, four options and 100 original images, 400 augmented images will be added to tensor_dict.


Tensorflow v2.10 mutate output of signature function to be a map of label to results

I'm trying to save my model so that when called from tf-serving the output is:
"results": [
{ "label1": x.xxxxx, "label2": x.xxxxx },
{ "label1": x.xxxxx, "label2": x.xxxxx }
where label1 and label2 are my labels and x.xxxxx are the probability of that label.
This is what I'm trying:
class TFModel(tf.Module):
def __init__(self, model: tf.keras.Model) -> None:
self.labels = ['label1', 'label2']
self.model = model
#tf.function(input_signature=[tf.TensorSpec(shape=(1, ), dtype=tf.string)])
def prediction(self, pagetext: str):
{ 'results': tf.constant([{k: v for dct in [{self.labels[c]: f"{x:.5f}"} for (c,x) in enumerate(results[i])] for k, v in dct.items()}
for i in range(len(results.numpy()))])}
# and then save it:
tf_model_wrapper = TFModel(classifier_model),
Side Note: Apparently in TensorFlow v2.0 if signatures is omitted it should scan the object for the first #tf.function (according to this: but in reality that doesn't seem to work. Instead, the model saves successfully with no errors and the #tf.function is not called, but default output is returned instead.
The error I get from the above is:
ValueError: Got a non-Tensor value <tf.Operation 'PartitionedCall' type=PartitionedCall> for key 'output_0' in the output of the function __inference_prediction_125493 used to generate the SavedModel signature 'serving_default'. Outputs for functions used as signatures must be a single Tensor, a sequence of Tensors, or a dictionary from string to Tensor.
I wrapped the result in tf.constant above because of this error, thinking it might be a quick fix, but I think it's me just being naive and not understanding Tensors properly.
I tried a bunch of other things before learning that [all outputs must be return values].1
How can I change the output to be as I want it to be?
You can see a Tensor as a multidimensional vector, i.e a structure with a fixed size and dimension and containing elements sharing the same type. Your return value is a map between a string and a list of dictionaries. A list of dictionaries cannot be converted to a tensor, because there is no guarantee that the number of dimensions and their size is constant, nor a guarantee that each element is sharing the same type.
You could instead return the raw output of your network, which should be a tensor and do your post processing outside of tensorflow-serving.
If you really want to do something like in your question, you can use a Tensor of strings instead, and you could use some code like that:
labels = tf.constant(['label1', 'label2'])
# if your batch size is dynamic, you can use tf.shape on your results variable to find it at runtime
batch_size = 32
# assuming your model returns something with the shape (N,2)
results = tf.random.uniform((batch_size,2))
res_as_str = tf.strings.as_string(results, precision=5)
return {
"results": tf.stack(
[tf.tile(labels[None, :], [batch_size, 1]), res_as_str], axis=-1
The output will be a dictionary mapping the value "results" to a Tensor of dimensions (Batch, number of labels, 2), the last dimension containing the label name and its corresponding value.

LSTM and GRU vs SimpleRNN: "Type inference failed."

I've created a pretty simple sequential model, but my data is a inconvenient (each sample is a sequence of different lengths). That's OK, as each data item is relatively significant, so it works well to train with each sequence as a unique batch. Got that all working.
The model looks like:
Input(shape=(None, 42*3)) # I have a very preliminary dataset of 8 sequences of ~5000 frames holding 42 x/y/z floats.
SimpleRNN(61, return_sequences=True)
That's the whole thing. When I train for 100 epochs everything goes smoothly, maybe 45 seconds per epoch on my GTX 980ti.
When I try swapping out the SimpleRNN for a GRU or LSTM, however - which should be drop-in replacements in this context (if this is wrong, PLEASE correct me!), I start getting a weird error:
2022-07-27 21:18:15.989066: W tensorflow/core/common_runtime/] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
args {
type_id: TFT_PRODUCT
args {
type_id: TFT_TENSOR
args {
is neither a subtype nor a supertype of the combined inputs preceding it:
args {
type_id: TFT_PRODUCT
args {
type_id: TFT_TENSOR
args {
type_id: TFT_FLOAT
while inferring type of node 'cond_40/output/_19'
Additionally, the training happens MUCH faster - roughly 4-5s for the first epoch, then 1s per epoch afterward. That speedup leads me to suspect "something is wrong here".
My question:
Am I safe to ignore this error/warning?
If not, what's wrong, and how do I resolve it?
Side question:
Are GRUs/LSTMs really that much faster to train, or is something wonky going on? I DO see that for the GRU and LSTM it's "Loaded cuDNN" which I think means it's CUDA-accelerated, but I don't see that anywhere for the SimpleRNN, so perhaps that's the difference?
EDIT: I was asked to include my data format, so here's the generator:
class MyBatchGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, shuffle=True):
allDataPaths = list(sorted(glob.glob('PATH TO NPZ FILES SAVED EARLIER')))
X = []
Y = []
for dp in allDataPaths:
data = np.load(dp, allow_pickle=True)
x = data['handData']
x = x.reshape(x.shape[0], -1)
y = np.array(data['keyData']).astype(float)
y = y.reshape(y.shape[0], -1).astype(float)
maxLen = None
self.X = tf.keras.preprocessing.sequence.pad_sequences(
X, padding="post", value=-1.0, dtype='float', maxlen = maxLen
self.Y = tf.keras.preprocessing.sequence.pad_sequences(
Y, padding="post", value=-1.0, dtype='float', maxlen = maxLen
self.shuffle = shuffle
def __len__(self):
'Denotes the number of batches per epoch'
return len(self.Y)
def __getitem__(self, index):
return self.__data_generation(index)
def on_epoch_end(self):
'Shuffles indexes after each epoch'
self.indexes = np.arange(len(self.Y))
if self.shuffle == True:
def __data_generation(self, index):
return self.X[index][np.newaxis], self.Y[index][np.newaxis]
I can only answer your side question because I ran into the same thing like 3 days ago.
If you checkout the Keras documentations for the 3 layers you will see that SimpleRNNs actually don't support CUDA acceleration, however GRU and LSTM layers actually do. I was a bit confused about that myself but I don't complain. SRNNs personally offer me worse results and take longer to train, while the two are giving me better results and perform way faster on my GPU.

Tensorflow parse_single_example returns all dataset

I'm creating a basic LinearClassifier in Tensorflow, but it seems that my input function returns the whole dataset at the first iteration, instead of just one example & its label.
My TFRecord has the following structure (obtained with print( tf.train.Example.FromString(example.SerializeToString())) )
features {
feature {
key: "attackType"
value {
int64_list {
value: 0
value: 0
feature {
key: "dst_ip_addr"
value {
bytes_list {
value: "EXT_SERVER"
It seems the TFRecord file is well formatted. However, when I try to parse it with the following snippet:
def input_fn_train(repeat=10, batch_size=32):
Reads dataset from tfrecord, apply parser with map
# Import MNIST data
dataset =[processed_bucket+processed_key])
# Map the parser over dataset, and batch results by up to batch_size
dataset =
dataset = dataset.repeat(repeat)
dataset = dataset.batch(batch_size)
return dataset
def _decode(serialized_ex):
'src_ip_addr': tf.FixedLenFeature(src_ip_size,tf.string),
'src_pt': tf.FixedLenFeature(src_pt_size,tf.int64),
'dst_ip_addr': tf.FixedLenFeature(dst_ip_size,tf.string),
'dst_pt': tf.FixedLenFeature(dst_pt_size,tf.int64),
'proto': tf.FixedLenFeature(proto_size,tf.string),
'packets': tf.FixedLenFeature(packets_size,tf.int64),
'subnet': tf.FixedLenFeature(subnet_size,tf.int64),
'attackType': tf.FixedLenFeature(attack_type_size,tf.int64)
parsed_features = tf.parse_single_example(serialized_ex, features)
label = parsed_features.pop('attackType')
return parsed_features, label
sess = tf.Session()
it = input_fn_train().make_one_shot_iterator()
It shows that it.get_next() returns
({'dst_ip_addr': array([[b'OPENSTACK_NET', b'EXT_SERVER',...
This is incorrect since it yields an array of array! The result should be
Any thoughts ? I've been trying to change the shape parameter of FixedLenFeature, with no success.
Ok, seems it's the dataset.batch command that created this strange behavior. Removed it, and it works fine now !

How to perform data augmentation in Tensorflow Estimator's input_fn

Using Tensorflow's Estimator API, at what point in the pipeline should I perform the data augmentation?
According to this official Tensorflow guide, one place to perform the data augmentation is in the input_fn:
def parse_fn(example):
"Parse TFExample records and perform simple data augmentation."
example_fmt = {
"image": tf.FixedLengthFeature((), tf.string, ""),
"label": tf.FixedLengthFeature((), tf.int64, -1)
parsed = tf.parse_single_example(example, example_fmt)
image = tf.image.decode_image(parsed["image"])
# augments image using slice, reshape, resize_bilinear
# |
# |
# |
# v
image = _augment_helper(image)
return image, parsed["label"]
def input_fn():
files ="/path/to/dataset/train-*.tfrecord")
dataset = files.interleave(
dataset =
# ...
return dataset
My question
If I perform data augmentation inside input_fn, does parse_fn return a single example or a batch including the original input image + all of the augmented variants? If it should only return a single [augmented] example, how do I ensure that all images in the dataset are used in its un-augmented form, as well as all variants?
If you use iterators on your dataset, your _augment_helper function will be called with each iteration of the dataset across each block of data fed in ( as you are calling the parse_fn in )
Change your code to
ds_iter = dataset.make_one_shot_iterator()
ds_iter = ds_iter.get_next()
return ds_iter
I've tested this with a simple augmentation function
def _augment_helper(image):
image = tf.image.random_brightness(image,255.0, 1)
image = tf.clip_by_value(image, 0.0, 255.0)
return image
Change 255.0 to whatever the maximum value is in your dataset, I used 255.0 as my example's data set was in 8 bit pixel values
It will return single examples for every call you make to the parse_fn, then if you use the .batch() operation it will return a batch of parsed images

TensorFlow - how to import data with multiple labels

I'm trying to create a model in TensorFlow which predicts ideal item for a user by predicting a vector of numbers.
I have created a dataset in Spark and saved it as a TFRecord using Spark TensorFlow connector.
In the dataset, I have several hundreds of features and 20 labels in each row. For easier manipulation, I have given every column a prefix 'feature_' or 'label_'.
Now I'm trying to write input function for TensorFlow, but I can't figure out how to parse the data.
So far I have written this:
def dataset_input_fn():
path = ['data.tfrecord']
dataset =
def parser(record):
example = tf.train.Example()
# TODO: no idea what to do here
# features = parsed["features"]
# label = parsed["label"]
# return features, label
dataset =
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.repeat(100)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
How can I split the Example into a feature set and a label set? I have tried to split the Example into two parts, but there is no way to even access it. The only way I have managed to access it is by printing the example out, which gives me something like this.
features {
feature {
key: "feature_wishlist_hour"
value {
int64_list {
value: 0
feature {
key: "label_emb_1"
value {
float_list {
value: 0.4
feature {
key: "label_emb_2"
value {
float_list {
value: 0.8
Your parser function should be similar to how you constructed the example proto. In your case its should be something similar to:
# example proto decode
def parser(example_proto):
keys_to_features = {'feature_wishlist_hour':tf.FixedLenFeature((), tf.int64),
'label_emb_1': tf.FixedLenFeature((), tf.float32),
'label_emb_2': tf.FixedLenFeature((), tf.float32)}
parsed_features = tf.parse_single_example(example_proto, keys_to_features)
return parsed_features['feature_wishlist_hour'], (parsed_features['label_emb_1'], parsed_features['label_emb_2'])
EDIT: From the comments it seems you are encoding each of the features as key, value pair, which is not right. Check this answer: Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords? on how to write it in a proper way.