I want to write Tensorflow TFRecords files in this specific format:
'image': Image(shape=(None, None, 3), dtype=tf.uint8),
'image/filename': Text(shape=(), dtype=tf.string),
'image/id': tf.int64,
'objects': Sequence({
'area': tf.int64,
'bbox': BBoxFeature(shape=(4,), dtype=tf.float32),
'id': tf.int64,
'is_crowd': tf.bool,
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=80),
Do you know how to create TFRecords just like that?
Thanks a lot,
Allan F.


I'm currently working on a task of fiber tip tracking on an endoscopic video.
For this purpose I have two models:
classifier that tells whether image contains fiber (is_visible)
regressor that predicts fiber tip position (x, y)
I am using ResNet18 pretrained on ImageNet for this purpose and it works great. But I'm experiencing performance issues,
so I decided to combine these two models into a single one using multi-output approach.
But so far I haven't been able to get it to work.
TensorFlow version: 2.10.1
My dataset is stored in a HDF5 format. Each sample has:
an image (224, 224, 3)
uint8 for visibility flag
and two floats for fiber tip position (x, y)
I am loading this dataset using custom generator as follows:
output_types = (tf.float32, tf.uint8, tf.float32)
output_shapes = (
tf.TensorShape((None, image_height, image_width, number_of_channels)), # image
tf.TensorShape((None, 1)), # is_visible
tf.TensorShape((None, 1, 1, 2)), # x, y
train_dataset =
generator, output_types=output_types, output_shapes=output_shapes,
My model is defined as follows:
model = ResNet18(input_shape=(224, 224, 3), weights="imagenet", include_top=False)
inputLayer = model.input
innerLayer = tf.keras.layers.Flatten()(model.output)
is_visible = tf.keras.layers.Dense(1, activation="sigmoid", name="is_visible")(innerLayer)
position = tf.keras.layers.Dense(2)(innerLayer)
position = tf.keras.layers.Reshape((1, 1, 2), name="position")(position)
model = tf.keras.Model(inputs=[inputLayer], outputs=[is_visible, position])
adam = tf.keras.optimizers.Adam(1e-4)
"is_visible": "binary_crossentropy",
"position": "mean_squared_error",
"is_visible": 1.0,
"position": 1.0
"is_visible": "accuracy",
"position": "mean_squared_error"
Dataset is working great, I can loop through each batch. But when it comes to training
I get the following error
ValueError: Can not squeeze dim[3], expected a dimension of 1, got 2 for '{{node mean_squared_error/weighted_loss/Squeeze}} = SqueezeT=DT_FLOAT, squeeze_dims=[-1]' with input shapes: [?,1,1,2].
I tried to change the dataset format like so:
output_types = (tf.float32, tf.uint8, tf.float32, tf.float32)
output_shapes = (
tf.TensorShape((None, image_height, image_width, number_of_channels)), # image
tf.TensorShape((None, 1)), # is_visible
tf.TensorShape((None, 1)), # x
tf.TensorShape((None, 1)), # y
But these leads to another error:
ValueError: Data is expected to be in format x, (x,), (x, y), or (x, y, sample_weight), found: (<tf.Tensor 'IteratorGetNext:0' shape=(None, 224, 224, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(None, 1) dtype=uint8>, <tf.Tensor 'IteratorGetNext:2' shape=(None, 1) dtype=float32>, <tf.Tensor 'IteratorGetNext:3' shape=(None, 1) dtype=float32>)
I tried to wrap is_visible and (x,y) returned from train_dataset into dictionary like so:
yield image_batch, {"is_visible": is_visible_batch, "position": position_batch}
Also tried these options:
yield image_batch, (is_visible_batch, position_batch)
yield image_batch, [is_visible_batch, position_batch]
But that didn't help
Can anyone tell me what am I doing wrong? I am totally stuck ))

Recently I was trying to train a model on the Wider-Face Dataset. I found it is prebuilt into tfds ( However I am having difficulty parsing it. It's feature map is of the following form -
'faces': Sequence({
'bbox': BBoxFeature(shape=(4,), dtype=tf.float32),
'blur': tf.uint8,
'expression': tf.bool,
'illumination': tf.bool,
'invalid': tf.bool,
'occlusion': tf.uint8,
'pose': tf.bool,
'image': Image(shape=(None, None, 3), dtype=tf.uint8),
'image/filename': Text(shape=(), dtype=tf.string),})
So I passed the following nested dictionary to
feature_description = {'faces': {
'bbox':[], dtype=tf.float32),
'blur':[], dtype=tf.uint8),
'expression':[], dtype=tf.bool),
'illumination':[], dtype=tf.bool),
'invalid':[], dtype=tf.bool),
'occlusion':[], dtype=tf.uint8),
'pose':[], dtype=tf.bool),
'image':[], dtype=tf.uint8),
'image/filename':[], dtype=tf.string),}
But it gives me a value error of ValueError: Unsupported dict. Later I also learnt that Sequence does not support features which are of type
Please let me know how can I parse this type of TFRecords. I didn't find much documentation of how to use the object detection datasets that are build into Tensorflow, so providing some links with examples will also be helpful.

I have a dataset of triplet images that I'm reading from tfrecords, that I've converted to a dataset using the following code
def parse_dataset(record):
def convert_raw_to_image_tensor(raw):
raw =
image_shape = tf.stack([299, 299, 3])
decoded =, channels=3,
dtype=tf.uint8, expand_animations=False)
decoded = tf.cast(decoded, tf.float32)
decoded = tf.reshape(decoded, image_shape)
decoded = tf.math.divide(decoded, 255.)
return decoded
features = {
'n':[], tf.string),
'p':[], tf.string),
'q':[], tf.string)
sample =, features)
neg_image = sample['n']
pos_image = sample['p']
query_image = sample['q']
neg_decoded = convert_raw_to_image_tensor(neg_image)
pos_decoded = convert_raw_to_image_tensor(pos_image)
query_decoded = convert_raw_to_image_tensor(query_image)
return (neg_decoded, pos_decoded, query_decoded)
record_dataset =, num_parallel_reads=4)
record_dataset =
The shape of this resulting dataset is
<MapDataset shapes: ((299, 299, 3), (299, 299, 3), (299, 299, 3)), types: (tf.float32, tf.float32, tf.float32)>
which I think means that each entry contains 3 images (which I confirmed by iterating through the dataset and printing the 1st, 2nd, and 3rd elements). I want to flatten this, so I get a dataset that doesn't contain any tuples but just a flat list of images. I've tried using flat_map but that just converts the images to (299, 3) and I've tried iterating through the dataset, appending each image to a list, then calling convert_to_tensor_slices but that's really inefficient.
I've read this question but it didn't seem to help.
Btw this is the flat_map code I tried
record_dataset = record_dataset.flat_map(lambda *x:
and the resulting dataset has this shape
<FlatMapDataset shapes: ((299, 3), (299, 3), (299, 3)), types: (tf.float32, tf.float32, tf.float32)>
I think you are just unpacking the tuple wrongly.
this ought to do it:
def flatten(*x):
return[i for i in x])
flattened = record_dataset.flat_map(flatten)
so that:
for i in flattened:
(299, 299, 3)
(299, 299, 3)
(299, 299, 3)
(299, 299, 3)
I am trying to use the API read the TFRecord file.
import tensorflow as tf
from PIL import Image
import numpy as np
import os
def train_input_fn():
filenames = ["mytrain.tfrecords"]
dataset =
def parser(record):
keys_to_features = {
"image_data": tf.FixedLenFeature((), tf.string, default_value=""),
"date_time": tf.FixedLenFeature((), tf.int64, default_value=""),
"label": tf.FixedLenFeature((), tf.int64,
default_value=tf.zeros([], dtype=tf.int64)),
parsed = tf.parse_single_example(record, keys_to_features)
image = tf.decode_jpeg(parsed["image_data"])
image = tf.reshape(image, [128, 128, 3])
label = tf.cast(parsed["label"], tf.int32)
return {"image_data": image, "date_time": parsed["date_time"]}, label
dataset =
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.repeat(1)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
output = train_input_fn()
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord = coord)
for i in range(230):
image, label =
img = Image.fromarray(image, 'RGB') + '_''Label_'+str(l)+'.jpg')
print(image, label)
Traceback (most recent call last):
File "E:/Tensorflow/Wenshan_Cai_Nanoletters/", line 34, in
output = train_input_fn()
File "E:/Tensorflow/Wenshan_Cai_Nanoletters/", line 25, in train_input_fn
TypeError: Expected int64, got '' of type 'str' instead.
Note TypeError: Expected int64, got '' of type 'str' instead from your error log. You have a bug in your code.
The bug
In the following line:
"date_time": tf.FixedLenFeature((), tf.int64, default_value=""),
The default value for a tf.int64 type variable is specified as a string "".
A fix
So say your expected default is 0, then you should change line to:
"date_time": tf.FixedLenFeature((), tf.int64, default_value=0),
Hope that helps.

I use Python 2.7.13 and Tensorflow 1.3.0 on CPU.
I want to use DensNet( ) for regression problem. My data contains 60000 jpeg images with 37 float labels for each image.
I saved my data into tfrecords files by:
def Read_Labels(label_path):
labels_csv = pd.read_csv(label_path)
labels = np.array(labels_csv)
return labels[:,1:]
def load_image(addr):
# read an image and resize to (224, 224)
img = cv2.imread(addr)
img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32)
return img
def Shuffle_images_with_labels(shuffle_data, photo_filenames, labels):
if shuffle_data:
c = list(zip(photo_filenames, labels))
addrs, labels = zip(*c)
return addrs, labels
def image_to_tfexample_mine(image_data, image_format, height, width, label):
return tf.train.Example(features=tf.train.Features(feature={
'image/encoded': bytes_feature(image_data),
'image/format': bytes_feature(image_format),
'image/class/label': _float_feature(label),
'image/height': int64_feature(height),
'image/width': int64_feature(width),
def _convert_dataset(split_name, filenames, labels, dataset_dir):
assert split_name in ['train', 'validation']
num_per_shard = int(math.ceil(len(filenames) / float(_NUM_SHARDS)))
with tf.Graph().as_default():
for shard_id in range(_NUM_SHARDS):
output_filename = _get_dataset_filename(dataset_path, split_name, shard_id)
with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
start_ndx = shard_id * num_per_shard
end_ndx = min((shard_id+1) * num_per_shard, len(filenames))
for i in range(start_ndx, end_ndx):
sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
i+1, len(filenames), shard_id))
img = load_image(filenames[i])
image_data = tf.compat.as_bytes(img.tostring())
label = labels[i]
example = image_to_tfexample_mine(image_data, image_format, height, width, label)
# Serialize to string and write on the file
def run(dataset_dir):
labels = Read_Labels(dataset_dir + '/training_labels.csv')
photo_filenames = _get_filenames_and_classes(dataset_dir + '/images_training')
shuffle_data = True
photo_filenames, labels = Shuffle_images_with_labels(
shuffle_data,photo_filenames, labels)
training_filenames = photo_filenames[_NUM_VALIDATION:]
training_labels = labels[_NUM_VALIDATION:]
validation_filenames = photo_filenames[:_NUM_VALIDATION]
validation_labels = labels[:_NUM_VALIDATION]
training_filenames, training_labels, dataset_path)
validation_filenames, validation_labels, dataset_path)
print('\nFinished converting the Flowers dataset!')
And I decode it by:
with tf.Session() as sess:
feature = {
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
'image/format': tf.FixedLenFeature((), tf.string, default_value='jpeg'),
'image/class/label': tf.FixedLenFeature(
[37,], tf.float32, default_value=tf.zeros([37,], dtype=tf.float32)),
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
reader = tf.TFRecordReader()
_, serialized_example =
features = tf.parse_single_example(serialized_example, features=feature)
image = tf.decode_raw(features['image/encoded'], tf.float32)
label = tf.cast(features['image/class/label'], tf.float32)
image = tf.reshape(image, [224, 224, 3])
images, labels = tf.train.shuffle_batch([image, label], batch_size=10, capacity=30, num_threads=1, min_after_dequeue=10)
init_op =, tf.local_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for batch_index in range(6):
img, lbl =[images, labels])
img = img.astype(np.uint8)
for j in range(6):
plt.subplot(2, 3, j+1)
plt.imshow(img[j, ...])
It's all fine up to this point. But when I use the bellow commands for decoding TFRecord files:
reader = tf.TFRecordReader
keys_to_features = {
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
'image/format': tf.FixedLenFeature((), tf.string, default_value='raw'),
'image/class/label': tf.FixedLenFeature(
[37,], tf.float32, default_value=tf.zeros([37,], dtype=tf.float32)),
items_to_handlers = {
'image': slim.tfexample_decoder.Image('image/encoded'),
'label': slim.tfexample_decoder.Tensor('image/class/label'),
decoder = slim.tfexample_decoder.TFExampleDecoder(
keys_to_features, items_to_handlers)
I get the following error.
INFO:tensorflow:Error reported to Coordinator: , assertion failed: [Unable to decode bytes as JPEG, PNG, GIF, or BMP]
[[Node: case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/Assert_1/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/is_bmp, case/If_0/decode_image/cond_jpeg/cond_png/cond_gif/Assert_1/Assert/data_0)]]
INFO:tensorflow:Caught OutOfRangeError. Stopping Training.
INFO:sensorflow:Finished training! Saving model to disk.
To use Densenet for my problem, I should fix this error first.
Could anybody please help me out of this problem. This code works perfectly for the datasets like flowers, MNIST and CIFAR10 available at but does not work for my data.
Thanks to pudae, the problem is solved. I was needed to use:
image_data = tf.gfile.FastGFile(filenames[i], 'rb').read()
Instead of this for loading data. That works perfectly now.
img = load_image(filenames[i])
image_data = tf.compat.as_bytes(img.tostring())
According to the error, I think the problem is that you use an image decoder for array data (decoded data) because you saved decoded data when creating TFRecords. Maybe you have noticed, when you are not using slim, you use tf.decode_raw to decode the data. But when you use slim, the 'image/format': tf.FixedLenFeature((), tf.string, default_value='raw') is not used and by default, slim will use image decoder.
I believe you use the code in slim/data,
where format_key = 'image/format' is you need. So, like this:
keys_to_features = {
'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),
'image/format': tf.FixedLenFeature((), tf.string, default_value='raw'),
'image/class/label': tf.FixedLenFeature(
[1], tf.int64, default_value=tf.zeros([1], dtype=tf.int64)),
items_to_handlers = {
'image': tfexample_decoder.Image(
image_key = 'image/encoded',
format_key = 'image/format',
'label': tfexample_decoder.Tensor('image/class/label'),
decoder = tfexample_decoder.TFExampleDecoder(
keys_to_features, items_to_handlers)
But I am not sure this can solve your problem perfectly because I can't reproduce your work in my machine.
Maybe there is a problem with your image itself as follows: