Does the Tensorflow Dataset API include Queue functionality? [duplicate] - tensorflow

TL;DR: how to ensure that data is loaded in multi threaded manner when using Dataset api in tensorflow 0.1.4?
Previously I did something like this with my images in disk:
filename_queue = tf.train.string_input_producer(filenames)
image_reader = tf.WholeFileReader()
_, image_file = image_reader.read(filename_queue)
imsize = 120
image = tf.image.decode_jpeg(image_file, channels=3)
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image_r = tf.image.resize_images(image, [imsize, imsize])
images = tf.train.shuffle_batch([image_r],
batch_size=20,
num_threads=30,
capacity=200,
min_after_dequeue=0)
This ensures that there will be 20 threads getting data ready for next learning iterations.
Now with the Dataset api I do something like:
dataset = tf.data.Dataset.from_tensor_slices((filenames, filenames_up, filenames_blacked))
dataset = dataset.map(parse_upscaler_corrector_batch)
After this I create an iterator:
sess = tf.Session();
iterator = dataset.make_initializable_iterator();
next_element = iterator.get_next();
sess.run(iterator.initializer);
value = sess.run(next_element)
Variable value will be passed for further processing.
So how do I ensure that data is being prepared in multui-threading manner here? Where could I read about Dataset api and multi threading data read?

So it appears that the way to achieve this is as follows:
dataset = dataset.map(parse_upscaler_corrector_batch, num_parallel_calls=12).prefetch(32).batch(self.ex_config.batch_size)
If one changes num_parallel_calls=12 one can see that both network/hdd load and cpu load either spike or decrease.

Related

How do I use all cores of my CPU in reinforcement learning with TF Agents?

I work with an RL algorithm. I'm using tensorflow and tf-agents and training a DQN. My problem is that only one core of the CPU is used when calculating the 10 episodes in the environment for data collection.
My training function looks like this:
def train_step(self, n_steps):
env_steps = tf_metrics.EnvironmentSteps()
#num_episodes = tf_metrics.NumberOfEpisodes()
rew = TFSumOfRewards()
action_hist = tf_metrics.ChosenActionHistogram(
name='ChosenActionHistogram', dtype=tf.int32, buffer_size=1000
)
#add reply buffer and metrict to the observer
replay_observer = [self.replay_buffer.add_batch]
train_metrics = [env_steps, rew]
self.replay_buffer.clear()
driver = dynamic_episode_driver.DynamicEpisodeDriver(
self.train_env, self.collect_policy, observers=replay_observer + train_metrics, num_episodes=self.collect_episodes)
final_time_step, policy_state = driver.run()
print('Number of Steps: ', env_steps.result().numpy())
for train_metric in train_metrics:
train_metric.tf_summaries(train_step=self.global_step, step_metrics=train_metrics)
# Convert the replay buffer to a tf.data.Dataset
# Dataset generates trajectories with shape [Bx2x...]
AUTOTUNE = tf.data.experimental.AUTOTUNE
dataset = self.replay_buffer.as_dataset(
num_parallel_calls=AUTOTUNE,
sample_batch_size=self.batch_size,
num_steps=(self.train_sequence_length + 1)).prefetch(AUTOTUNE)
iterator = iter(dataset)
train_loss = None
for _ in range(n_steps):
# Sample a batch of data from the buffer and update the agent's network.
experience, unused_info = next(iterator)
train_loss = self.agent.train(experience)
def train_agent(self, n_epoch):
for i in range(n_epoch):
self.train_step(int(self.replay_buffer.num_frames().numpy()/self.batch_size))
if(self.IsAutoStoreCheckpoint == True):
self.store_check_point()
pass
As already written above, num_episodes = 10. So it would make sense to calculate 10 episodes in parallel before the network is trained.
If I set the value num_parallel_calls to e.g. 10 nothing changes. What do I have to do to use all cores of my CPU (Ryzen 9 5950x with 16 cores)?
Thanks!
masterkey

How to perform data augmentation in Tensorflow Estimator's input_fn

Using Tensorflow's Estimator API, at what point in the pipeline should I perform the data augmentation?
According to this official Tensorflow guide, one place to perform the data augmentation is in the input_fn:
def parse_fn(example):
"Parse TFExample records and perform simple data augmentation."
example_fmt = {
"image": tf.FixedLengthFeature((), tf.string, ""),
"label": tf.FixedLengthFeature((), tf.int64, -1)
}
parsed = tf.parse_single_example(example, example_fmt)
image = tf.image.decode_image(parsed["image"])
# augments image using slice, reshape, resize_bilinear
# |
# |
# |
# v
image = _augment_helper(image)
return image, parsed["label"]
def input_fn():
files = tf.data.Dataset.list_files("/path/to/dataset/train-*.tfrecord")
dataset = files.interleave(tf.data.TFRecordDataset)
dataset = dataset.map(map_func=parse_fn)
# ...
return dataset
My question
If I perform data augmentation inside input_fn, does parse_fn return a single example or a batch including the original input image + all of the augmented variants? If it should only return a single [augmented] example, how do I ensure that all images in the dataset are used in its un-augmented form, as well as all variants?
If you use iterators on your dataset, your _augment_helper function will be called with each iteration of the dataset across each block of data fed in ( as you are calling the parse_fn in dataset.map )
Change your code to
ds_iter = dataset.make_one_shot_iterator()
ds_iter = ds_iter.get_next()
return ds_iter
I've tested this with a simple augmentation function
def _augment_helper(image):
print(image.shape)
image = tf.image.random_brightness(image,255.0, 1)
image = tf.clip_by_value(image, 0.0, 255.0)
return image
Change 255.0 to whatever the maximum value is in your dataset, I used 255.0 as my example's data set was in 8 bit pixel values
It will return single examples for every call you make to the parse_fn, then if you use the .batch() operation it will return a batch of parsed images

Why am I getting shape errors when trying to pass a batch from the Tensorflow Dataset API to my session operations?

I am dealing with an issue in my conversion over to the Dataset API and I guess I just don't have enough experience yet with the API to know how to handle the below situation. We currently have image augmentation that we perform currently using queueing and batching. I was tasked with checking out the new Dataset API and converting over our existing implementation using it rather than queues.
What we would like to do is get a reference to all the paths and handle all operations from just that reference. As you see in the dataset initialization, I have mapped the parse_fn to the dataset itself which then goes about reading the file and extracting the initial values from the filenames. However when I then go about calling the iterators next_batch method and then pass those values to get_summary, I'm now getting an error around shape. I have been trying a number of things which just keeps changing the error and so I felt I should see if anyone on SO saw possibly that I was going about this all wrong and should be taking a different route. Does anything jump out as absolutely wrong in my use of the Dataset API?
Should I not be calling the ops this way any longer? I noticed the majority of the examples I saw they would get the batch, pass the variables to the op and then capture that in a variable and pass that to sess.run, however I haven't found an easy way of doing that as of yet with our setup that wasn't erroring so this was the approach I took instead (but its still erroring). I'll be continuing to try to trace down the problem and post here should I find anything, but if anyone sees something please advise. Thanks!
Current Error:
... in get_summary summary, acc = sess.run([self._summary_op,
self._accuracy], feed_dict=feed_dict) ValueError: Cannot feed value of
shape (32,) for Tensor 'ph_input_labels:0', which has shape '(?, 1)
Below is the block where the get_summary method is called and error is fired:
def perform_train():
if __name__ == '__main__':
#Get all our image paths
filenames = data_layer_train.get_image_paths()
next_batch, iterator = preproc_image_fn(filenames=filenames)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
with sess.graph.as_default():
# Set the random seed for tensorflow
tf.set_random_seed(cfg.RNG_SEED)
classifier_network = c_common.create_model(len(products_to_class_dict), is_training=True)
optimizer, global_step_var = c_common.create_optimizer(classifier_network)
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
# Init tables and dataset iterator
sess.run(tf.tables_initializer())
sess.run(iterator.initializer)
cur_epoch = 0
blobs = None
try:
epoch_size = data_layer_train.get_steps_per_epoch()
num_steps = num_epochs * epoch_size
for step in range(num_steps):
timer_summary.tic()
if blobs is None:
#Now populate from our training dataset
blobs = sess.run(next_batch)
# *************** Below is where it is erroring *****************
summary_train, acc = classifier_network.get_summary(sess, blobs["images"], blobs["labels"], blobs["weights"])
...
Believe the error is in preproc_image_fn:
def preproc_image_fn(filenames, images=None, labels=None, image_paths=None, cells=None, weights=None):
def _parse_fn(filename, label, weight):
augment_instance = False
paths=[]
selected_cells=[]
if vals.FIRST_ITER:
#Perform our check of the path to see if _data_augmentation is within it
#If so set augment_instance to true and replace the substring with an empty string
new_filename = tf.regex_replace(filename, "_data_augmentation", "")
contains = tf.equal(tf.size(tf.string_split([filename], "")), tf.size(tf.string_split([new_filename])))
filename = new_filename
if contains is True:
augment_instance = True
core_file = tf.string_split([filename], '\\').values[-1]
product_id = tf.string_split([core_file], ".").values[0]
label = search_tf_table_for_entry(product_id)
weight = data_layer_train.get_weights(product_id)
image_string = tf.read_file(filename)
img = tf.image.decode_image(image_string, channels=data_layer_train._channels)
img.set_shape([None, None, None])
img = tf.image.resize_images(img, [data_layer_train._target_height, data_layer_train._target_width])
#Previously I was returning the below, but I was getting an error from the op when assigning feed_dict stating that it didnt like the dictionary
#retval = dict(zip([filename], [img])), label, weight
retval = img, label, weight
return retval
num_files = len(filenames)
filenames = tf.constant(filenames)
#*********** Setup dataset below ************
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels, weights))
dataset=dataset.map(_parse_fn)
dataset = dataset.repeat()
dataset = dataset.batch(32)
iterator = dataset.make_initializable_iterator()
batch_features, batch_labels, batch_weights = iterator.get_next()
return {'images': batch_features, 'labels': batch_labels, 'weights': batch_weights}, iterator
def search_tf_table_for_entry(self, product_id):
'''Looks up keys in the table and outputs the values. Will return -1 if not found '''
if product_id is not None:
return self._products_to_class_table.lookup(product_id)
else:
if not self._real_eval:
logger().info("class not found in training {} ".format(product_id))
return -1
Where I create the model and have the placeholders used previously:
...
def create_model(self):
weights_regularizer = tf.contrib.layers.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)
biases_regularizer = weights_regularizer
# Input data.
self._input_images = tf.placeholder(
tf.float32, shape=(None, self._image_height, self._image_width, self._num_channels), name="ph_input_images")
self._input_labels = tf.placeholder(tf.int64, shape=(None, 1), name="ph_input_labels")
self._input_weights = tf.placeholder(tf.float32, shape=(None, 1), name="ph_input_weights")
self._is_training = tf.placeholder(tf.bool, name='ph_is_training')
self._keep_prob = tf.placeholder(tf.float32, name="ph_keep_prob")
self._accuracy = tf.reduce_mean(tf.cast(self._correct_prediction, tf.float32))
...
self.create_summaries()
def create_summaries(self):
val_summaries = []
with tf.device("/cpu:0"):
for var in self._act_summaries:
self._add_act_summary(var)
for var in self._train_summaries:
self._add_train_summary(var)
self._summary_op = tf.summary.merge_all()
self._summary_op_val = tf.summary.merge(val_summaries)
def get_summary(self, sess, images, labels, weights):
feed_dict = {self._input_images: images, self._input_labels: labels,
self._input_weights: weights, self._is_training: False}
summary, acc = sess.run([self._summary_op, self._accuracy], feed_dict=feed_dict)
return summary, acc
Since the error says:
Cannot feed value of shape (32,) for Tensor 'ph_input_labels:0', which has shape '(?, 1)
My guess is your labels in get_summary has the shape [32]. Can you just reshape it to (32, 1)? Or maybe reshape the label earlier in _parse_fn?

tensorflow error - you must feed a value for placeholder tensor 'in'

I'm trying to implement queues for my tensorflow prediction but get the following error -
you must feed a value for placeholder tensor 'in' with dtype float and shape [1024,1024,3]
The program works fine if I use the feed_dict, Trying to replace feed_dict with queues.
The program basically takes a list of positions and passes the image np array to the input tensor.
for each in positions:
y,x = each
images = img[y:y+1024,x:x+1024,:]
a = images.astype('float32')
q = tf.FIFOQueue(capacity=200,dtypes=dtypes)
enqueue_op = q.enqueue(a)
qr = tf.train.QueueRunner(q, [enqueue_op] * 1)
tf.train.add_queue_runner(qr)
data = q.dequeue()
graph=load_graph('/home/graph/frozen_graph.pb')
with tf.Session(graph=graph,config=tf.ConfigProto(log_device_placement=True)) as sess:
p_boxes = graph.get_tensor_by_name("cat:0")
p_confs = graph.get_tensor_by_name("sha:0")
y = [p_confs, p_boxes]
x = graph.get_tensor_by_name("in:0")
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord,sess=sess)
confs, boxes = sess.run(y)
coord.request_stop()
coord.join(threads)
How can I make sure the input data that I populated to the queue is recognized while running the graph in the session.
In my original run I call the
confs, boxes = sess.run([p_confs, p_boxes], feed_dict=feed_dict_testing)
I'd suggest not using queues for this problem, and switching to the new tf.data API. In particular tf.data.Dataset.from_generator() makes it easier to feed in data from a Python function. You can rewrite your code to be much simpler, as follows:
def generator():
for y, x in positions:
images = img[y:y+1024,x:x+1024,:]
yield images.astype('float32')
dataset = tf.data.Dataset.from_generator(
generator, tf.float32, [1024, 1024, img.shape[3]])
# Add any extra transformations in here, like `dataset.batch()` or
# `dataset.repeat()`.
# ...
iterator = dataset.make_one_shot_iterator()
data = iterator.get_next()
Note that in your program, there's no connection between the data tensor and the graph you loaded in load_graph() (at least, assuming that load_graph() doesn't grab data from the global state!). You will probably need to use tf.import_graph_def() and the input_map argument to associate data with one of the tensors in your frozen graph (possibly "in:0"?) to complete the task.

Writing tfrecords with images and multilabels for classification

I want to perform a multi-label classification with TensorFlow.
I have about 95000 images and for each image there is a corresponding label vector. For every image there are 7 labels. These 7 labels are represented as a tensor with size 7. Each image has the shape of (299,299,3).
How can I now write the image with the corresponding label vector/tensor to the .tfrecords File
my current code/approach:
def get_decode_and_resize_image(image_id):
image_queue = tf.train.string_input_producer(['../../original-data/'+image_id+".jpg"])
image_reader = tf.WholeFileReader()
image_key, image_value = image_reader.read(image_queue)
image = tf.image.decode_jpeg(image_value,channels=3)
resized_image= tf.image.resize_images(image, 299, 299, align_corners=False)
return resized_image
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
# Start populating the filename queue.
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# get all labels and image ids
csv= pd.read_csv('../../filteredLabelsToPhotos.csv')
#create a writer for writing to the .tfrecords file
writer = tf.python_io.TFRecordWriter("tfrecords/data.tfrecords")
for index,row in csv.iterrows():
# the labels
image_id = row['photo_id']
lunch = tf.to_float(row["lunch"])
dinner= tf.to_float(row["dinner"])
reservations= tf.to_float(row["TK"])
outdoor = tf.to_float(row["OS"])
waiter = tf.to_float(row["WS"])
classy = tf.to_float(row["c"])
gfk = tf.to_float(row["GFK"])
labels_list = [lunch,dinner,reservations,outdoor,waiter,classy,gfk]
labels_tensor = tf.convert_to_tensor(labels_list)
#get the corresponding image
image_file= get_decode_and_resize_image(image_id=image_id)
#here : how do I now create a TFExample and write it to the .tfrecords file
coord.request_stop()
coord.join(threads)
And after I´ve created the .tfrecords file, can i then read it from my TensorFlow Training Code and batch the data automatically?
To expand on Alexandre's answer, you can do something like this:
# Set this up before your for-loop, you'll use this repeatedly
tfrecords_filename = 'myfile.tfrecords'
writer = tf.python_io.TFRecordWriter(tfrecords_filename)
# Then within your for-loop, you can write like so:
for ...:
#here : how do I now create a TFExample and write it to the .tfrecords file
example = tf.train.Example(features=tf.train.Features(feature={
'image_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_file])),
# the other features, labels you wish to include go here too
}))
writer.write(example.SerializeToString())
# then finally, don't forget to close the writer.
writer.close()
This assumes you have already converted the image into a byte array in the image_file variable.
I adapted this from this very helpful post that goes into detail on serialising images & may be helpful to you if my assumption above is false.
To create a tf.train.Example simply do example = tf.train.Example(). You can then manipulate it using the normal protocol buffers python API.