How can I wrap tf.io.parse_single_example with tf.py_function? - tensorflow

First, I was wondering if I should wrap tf.io.parse_single_example with tf.py_function when reading TFRecord data from dataset.map,
N = config.get_num_listings_per_search()
features={
'qf': tf.io.FixedLenFeature([len(config.get_query_continuous_features())], tf.float32),
'qi': tf.io.FixedLenFeature([len(config.get_query_categorical_features())], tf.int64),
}
def _parse_function(example_proto):
parsed_features = tf.io.parse_single_example(example_proto, features)
return parsed_features['qf'], parsed_features['qi']
dataset = tf.data.TFRecordDataset(training_files)
dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
dataset = dataset.shuffle(buffer_size=1000000)
dataset = dataset.map(_parse_function, num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.batch(config.get_batch_size())
becauseThe tf.data guide mentioned that
For performance reasons, we encourage you to use TensorFlow operations for preprocessing your data whenever possible. However, it is sometimes useful to call external Python libraries when parsing your input data. You can use the tf.py_function() operation in a Dataset.map() transformation.
I tried to wrap with
parsed_features = tf.py_function(tf.io.parse_single_example, (example_proto, features),
(tf.float32, tf.int64))
However, running the code gave me the following error:
TypeError: Tensors in list passed to 'input' of 'EagerPyFunc' Op have types [string, <NOT CONVERTIBLE TO TENSOR>] that are invalid.
It seems to me tf.py_function(tf.io.parse_single_example(example_proto, features)) is not supported because example_proto is of type tf.string ?
The primary reason I might want to do this is because the current input data pipeline is slow. Will I get some performance improvement if I wrap tf.io.parse_single_example with tf.py_function?
The above code is run in tensorflow-gpu==2.0
Thank you!

tf.py_function is meant to wrap external Python libraries like PIL or scipy, not TensorFlow operations like tf.io.parse_single_example. Adding tf.py_function here will probably make performance worse by forcing TensorFlow to call into Python instead of doing the parsing in C++.
The TFRecord guide gives an example of using tf.io.parse_single_example:
raw_image_dataset = tf.data.TFRecordDataset('images.tfrecords')
# Create a dictionary describing the features.
image_feature_description = {
'height': tf.io.FixedLenFeature([], tf.int64),
'width': tf.io.FixedLenFeature([], tf.int64),
'depth': tf.io.FixedLenFeature([], tf.int64),
'label': tf.io.FixedLenFeature([], tf.int64),
'image_raw': tf.io.FixedLenFeature([], tf.string),
}
def _parse_image_function(example_proto):
# Parse the input tf.Example proto using the dictionary above.
return tf.io.parse_single_example(example_proto, image_feature_description)
parsed_image_dataset = raw_image_dataset.map(_parse_image_function)

Related

How to parse an in-house TFRecords dataset when loading it using ImportExampleGen

Following the official tutorial, this is how I should load a TFRecord dataset:
raw_image_dataset = tf.data.TFRecordDataset('images.tfrecords')
# Create a dictionary describing the features.
image_feature_description = {
'height': tf.io.FixedLenFeature([], tf.int64),
'width': tf.io.FixedLenFeature([], tf.int64),
'depth': tf.io.FixedLenFeature([], tf.int64),
'label': tf.io.FixedLenFeature([], tf.int64),
'image_raw': tf.io.FixedLenFeature([], tf.string),
}
def _parse_image_function(example_proto):
# Parse the input tf.train.Example proto using the dictionary above.
return tf.io.parse_single_example(example_proto, image_feature_description)
parsed_image_dataset = raw_image_dataset.map(_parse_image_function)
parsed_image_dataset
The _parse_image_function is where I get the chance to set the type and shape of my loaded tensors.
But then, when I'm loading the same file using ImportExampleGen, I don't see how I can inject my parse function into the mix!
context = InteractiveContext()
example_gen = tfx.components.ImportExampleGen(input_base=_dataset_folder)
context.run(example_gen, enable_cache=True)
Does anyone know what's going to happen to my parse logic when I'm using the ImportExampleGen class instead of loading my dataset directly using TFRecordDataset class?

How to parse a tfds.features.Sequence object ? It is not compatible with tf.io.FixedLenSequenceFeature

Recently I was trying to train a model on the Wider-Face Dataset. I found it is prebuilt into tfds (https://www.tensorflow.org/datasets/catalog/wider_face). However I am having difficulty parsing it. It's feature map is of the following form -
FeaturesDict({
'faces': Sequence({
'bbox': BBoxFeature(shape=(4,), dtype=tf.float32),
'blur': tf.uint8,
'expression': tf.bool,
'illumination': tf.bool,
'invalid': tf.bool,
'occlusion': tf.uint8,
'pose': tf.bool,
}),
'image': Image(shape=(None, None, 3), dtype=tf.uint8),
'image/filename': Text(shape=(), dtype=tf.string),})
So I passed the following nested dictionary to tf.io.parse_single_example
feature_description = {'faces': {
'bbox': tf.io.FixedLenFeature([], dtype=tf.float32),
'blur': tf.io.FixedLenFeature([], dtype=tf.uint8),
'expression': tf.io.FixedLenFeature([], dtype=tf.bool),
'illumination': tf.io.FixedLenFeature([], dtype=tf.bool),
'invalid': tf.io.FixedLenFeature([], dtype=tf.bool),
'occlusion': tf.io.FixedLenFeature([], dtype=tf.uint8),
'pose': tf.io.FixedLenFeature([], dtype=tf.bool),
},
'image': tf.io.FixedLenFeature([], dtype=tf.uint8),
'image/filename': tf.io.FixedLenFeature([], dtype=tf.string),}
But it gives me a value error of ValueError: Unsupported dict. Later I also learnt that Sequence does not support features which are of type tf.io.FixedLenSequenceFeature.
Please let me know how can I parse this type of TFRecords. I didn't find much documentation of how to use the object detection datasets that are build into Tensorflow, so providing some links with examples will also be helpful.
Thanks

How convert TFRecordDataset item to image?

I have a issue with converting tfrecords back to images:
def _parse_test_image_function(img):
image_feature_description = {
'image/file_name': tf.io.FixedLenFeature([], tf.string),
'image/encoded_image': tf.io.FixedLenFeature([], tf.string),
}
return tf.io.parse_single_example(img, image_feature_description)
test_dataset = tf.data.TFRecordDataset(temp_path)
test_dataset = test_dataset.map(_parse_test_image_function)
print(tf.__version__)
images = test_dataset.take(1)
print(images)
2.5.0
<TakeDataset shapes: {image/encoded_image: (), image/file_name: ()}, types: {image/encoded_image: tf.string, image/file_name: tf.string}>
Fields in image_feature_description are correct
also I saw this
Converting TFRecords back into JPEG Images
But this is not very helpful for me because some of functions which is used in answers outdated.
You can get the image as numpy array by using the below code.
import numpy as np
import PIL.Image as Image
gold_fish=Image.open('/content/gold.jpeg')
gold_fish=np.array(gold_fish)
Thank You.

iteratorFromStringHandle device placement cpu/gpu conflict

When restoring a metagraph from disk, TensorFlow complains that it is attempting to create an iterator on the GPU from a handle defined on the CPU.
I'm trying to create a graph that uses tf.Data pipelines with a placeholder string to define the iterator (so that I can swap datasets). I can successfully create a graph which seemingly works on the GPU. However, after I restore the graph from disk, I get an error when trying to bind the dataset handle to the iterator (I think):
"Attempted create an iterator on device "...GPU:0" from handle defined on device "CPU:0"
[[{{node IteratorFromStringHandleV2}} = IteratorFromStringHandleV2output_shapes=[....], output_types=[...], _device="...GPU:0"]]
I've tried explicitly defining where I would like objects placed with with tf.device("/GPU:0"): guards, specifically around where I create the dataset iterator, but that has a different error:
"Cannot assign a device for operation TensorSliceDataset: Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available"
I found a similar problem here,
When use Dataset API, got device placement error with tensorflow >= 1.11
I'm using tf-1.12 (and I cannot use a higher version, unfortunately).
# this is the code which creates the graph
import tensorflow as tf
import numpy as np
def _bytestring_feature(byteStringList):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=byteStringList));
def _int64_feature(intList):
return tf.train.Feature(int64_list=tf.train.Int64List(value=intList));
def _float_feature(intList):
return tf.train.Feature(float_list=tf.train.FloatList(value=intList));
def toTFrecord(tfrec_filewriter, img, label):
feature={
'image': _bytestring_feature([img.tostring()]),
'class': _int64_feature([label])
}
return tf.train.Example(features=tf.train.Features(feature=feature));
# generate data and save it to disk:
print('generating data')
nPartitions=5; # number of file partitions
for p in range(nPartitions):
filename='./tfrec_'+'{:02d}-{}.tfrec'.format(p,nPartitions)
with tf.python_io.TFRecordWriter(filename) as outFile:
# generate some data for this partition
for i in range(10):
example=toTFrecord(outFile, (p*100+i)*np.ones((32,32), np.float32), (p*100+i));
outFile.write(example.SerializeToString());
print('...complete')
# make the network
handle=tf.placeholder(tf.string, shape=[], name='handle')
with tf.device("/GPU:0"):
iter=tf.data.Iterator.from_string_handle(handle, (tf.float32, tf.int64), (tf.TensorShape([tf.Dimension(None), tf.Dimension(32), tf.Dimension(32)]), tf.TensorShape([tf.Dimension(None)])))
img,label=iter.get_next()
network=tf.layers.conv2d(inputs=tf.reshape(img, [-1, tf.shape(img)[1], tf.shape(img)[2], 1]), filters=4, kernel_size=[3,3], dilation_rate=[1,1], padding='same', activation=None, name='networkConv')
with tf.Session(config=tf.ConfigProto(log_device_placement=True, allow_soft_placement=False)) as sess:
sess.run(tf.global_variables_initializer())
saver=tf.train.Saver(keep_checkpoint_every_n_hours=0.5, max_to_keep=1000)
tf.add_to_collection('network', network)
tf.add_to_collection('handle', handle)
saver.save(sess, './demoSession')
#......
# and this is a separate process which restores the graph for training:
import tensorflow as tf
import numpy as np
import glob
def readTFrecord(example):
features={
'image': tf.io.FixedLenFeature([], tf.string),
'class': tf.io.FixedLenFeature([], tf.int64)
};
example=tf.parse_example(example, features)
return tf.reshape(tf.decode_raw(example['image'], tf.float32), [-1, 32, 32]), example['class']
filenames=glob.glob('./tfrec*.tfrec')
ds=tf.data.TFRecordDataset(filenames)
ds=ds.shuffle(5000).batch(4).prefetch(4).map(readTFrecord, num_parallel_calls=2)
with tf.Session(config=tf.ConfigProto(log_device_placement=True, allow_soft_placement=False)) as sess:
new_saver=tf.train.import_meta_graph('demoSession.meta', clear_devices=False)
new_saver.restore(sess, 'demoSession')
network=tf.get_collection('network')[0]
handle=tf.get_collection('handle')[0]
#with tf.device("/GPU:0"):
dsIterator=ds.make_initializable_iterator()
dsHandle=sess.run(dsIterator.string_handle())
sess.run(dsIterator.initializer)
out=sess.run(network, feed_dict={handle:dsHandle})
print(out.shape)
I expect it to work, Mr Bond. Unfortunately, it says that it cannot
tensorflow.python.framework.errors_impl.InvalidArgumentError: Attempted create an iterator on device "/job:localhost/replica:0/task:0/device:GPU:0" from handle defined on device "/job:localhost/replica:0/task:0/device:CPU:0"
[[{{node IteratorFromStringHandleV2}} = IteratorFromStringHandleV2output_shapes=[[?,32,32], [?]], output_types=[DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
It looks like it might be that I need to add
iter=tf.data.Iterator.from_string_handle(...)
saveable_obj = tf.contrib.data.make_saveable_from_iterator(iter)
...
tf.add_to_collection(tf.GraphKeys.SAVEABLE_OBJECTS, saveable_obj)
my initial test seems to work :-D
edit: actually, it progresses past the error I describe above, but it raises another error when I try to create a new save state, so I suspect it's not the actual answer =/

TensorFlow - Decoding images in a given shape from TF records

I am trying to read images from TFrecords file. The images vary in shapes. After reading, I want to preserve their shape which is why I pass the height, width and depth parameters appropriately. But the code just doesn't print anything after the set_shape command. I initialized the session in the main function. Is there a way to get the values of height,w,d tensors so that I can pass it to set_shape? How do I fix this? Any suggestions are welcome. Thanks in advance.
def read_and_decode(sess,filename_queue):
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
serialized_example,
# Defaults are not specified since both keys are required.
features={
'height': tf.FixedLenFeature([], tf.int64),
'width': tf.FixedLenFeature([], tf.int64),
'depth': tf.FixedLenFeature([], tf.int64),
'image_raw': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.int64),
})
# Convert from a scalar string tensor (whose single string has
# length mnist.IMAGE_PIXELS) to a uint8 tensor with shape
# [mnist.IMAGE_PIXELS].
image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape([sess.run(features['height']),sess.run(features['width']),sess.run(features['depth'])])
Here's my main function:
def main(argv=None):
with tf.Graph().as_default():
sess=tf.Session()
pdb.set_trace()
load_inputs(sess,FLAGS.batch_size)
and load_input() function which calls the read_and_decode().
def load_inputs(sess,batch_size):
filenames='/home/dp1248/ICR_TF/original_iam_test_words.tfrecords'
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer([filenames],num_epochs=1)
# Even when reading in multiple threads, share the filename
# queue.
image, label = read_and_decode(sess,filename_queue)