TensorFlow: Read batch features to an array - tensorflow

I am using tf.contrib.learn.ReadBatchFeatures (https://www.tensorflow.org/versions/master/api_docs/python/contrib.learn/input_processing#read_batch_features) to read in Example protos as part of my input function, which returns a dict of Tensor objects. After training my model, calling predict on my Estimator returns one batch of predictions as an array, which I would like to compare to the known values.
I try to obtain the known values by calling tf.Session().run(labels), where labels is a Tensor of known values, returned from the input function. However, at this point, my program hangs. I suspect it is stuck in an infinite loop reading labels from the disk, rather than just reading one batch as I would like.
Is this the correct way to obtain one batch of values in the labels Tensor?
Edit: I have tried to start the queue runners, is the following correct?
_, labels = eval_input_fn()
with tf.Session().as_default():
tf.local_variables_initializer()
tf.train.start_queue_runners()
label_values = labels.eval()
print(label_values)

The whole setup you need is:
_, labels = eval_input_fn()
with tf.Session() as sess:
sess.run([
tf.local_variables_initializer(),
tf.global_variables_initializer()
])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
print(sess.run(label))
except tf.errors.OutOfRangeError as error:
coord.request_stop(error)
finally:
coord.request_stop()
coord.join(threads)

Related

How to use Test data on saved model with queue approach (without feed_dict) #tensorflow?

I am new to tensorflow. I have build a convonet for mnist image classification as follows I am using queues to read images(png) from the disk batch it and pass it to train op (I am quite comfortable with this now) It's all good till train and I am evaluating my accuracy op at certain number of steps while training.
I am saving the model with Saver object and can see the meta and checkpoint file being written on the disk.
Now the real challenge is to restore the model once it has finished training and use it for predictions on new images
One of the first step in my graph (to train) is like below which takes x_image (images from train queue) h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
As I am not using feed dictionary approach, I can not just restore the accuracy op using saver and pass the new data. I have to define the queue for test data and rebuild the graph (exactly as earlier) with reference x_image changed to point to test data Queue.
How can I now restore the learned weights while training and use it to with this new graph to simply run my predict/accuracy op.
I tried to follow
- https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py tutorial but got lost with eval code.
Also if I add a dummy constant in my training graph and then try to retrive it's value, I am able to retrive it.
Can any 1 please help. Thanks
OK, So I have found the answer.
The original challenge was to to toggle between train and test data while training and validation phase when using queues.
Now as queues are part of graph structure, we can't simply modify them.
I found an article to use tf.case to toggle between train and test queue but I wasn't able to use shuffle batch along with it.
The real task at hand was to save the model post training and use the saved model to predict in production.
So here is the flow:
Training
create a method that creates your graph (will take image tensor as
input).
Build a training graph by passing training image batches
Perform training and save the model with saver object.
Evaluation
Now reconstruct the same graph with test image batches.
In the session use saver object to restore the weights (Note you dont need to pass which variables to restore, by default it restores all restore able variables)
Dont run gloabl variable initializer at this time
Run your predict op (generated from the newly constructed graph)
Also make sure you switch off the drop out functionality in the eval as it would keep varying the output for the same input
Below is the pseudocode
train_op, y_predict, accuracy = create_graph(train_input, train_label)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
model_saver = tf.train.Saver()
for i in range(2000):
if i%100 == 0:
train_accuracy = sess.run(accuracy)
print("step %d, training accuracy %f" %(i, train_accuracy))
sess.run(train_op)
print(sess.run(accuracy))
model_saver.save(sess, 'model/simple_model', global_step=100)
coord.request_stop()
coord.join(threads)
For evaluation
_, y_predict, accuracy = create_graph(test_input, test_label)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, tf.train.latest_checkpoint("./model/"))
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
label_predict = sess.run([y_predict])

How to make a network that relies on tf.train.shuffle_batch ready for production

I've created a tfrecords file, which I read via tf.TFRecordReader, which has been great for training the network. However, I'm not sure how to dynamically reduce the batch size for production, nor how to feed and override some variable when loading the graph with tf.train.import_meta_graph
reader = tf.TFRecordReader()
data = tf.train.shuffle_batch(...)
# batch_size 100
IS_TRAINING = tf.placeholder(tf.bool, shape=(), name="is_training")
# tried constant, variable and placeholder with no luck
custom_data = tf.Variable(...)
_data = tf.cond(
IS_TRAINING,
lambda: data,
lambda: custom_data,
name="condition"
)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord, sess=sess)
# network graph
coord.request_stop()
coord.join(threads)
sess.close()
I've tried to import the trained graph with tf.train.import_meta_graph and using feed_dict, tried to override IS_TRAINING so the graph uses data that I also feed via feed_dict. But nothing has worked so far.
e.g.
sess.run([variable], feed_dict={IS_TRAINING:False, custom_data:data})
If you're loading your data manually instead of from a TF records file you'll want to eliminate the use of QueueRunner to load samples (this is only used if you are loading samples using a tensorflow record reader), and instead load the data in sess.run([ops], feed_dict={data:my_custom_data})
Change your model to not use shuffle batch anymore, this should not affect your ability to load the checkpoint:
data = tf.placeholder(tf.float32, shape=(input_shape), name="data")
It would be easier just to change the model rather than use the conditional assignment as you've shown. But if you want to use the conditional statement, then custom_data should be a placeholder, not a variable.
If you want your code to work in both cases I would use a python if statement at the point where you define the graph, not at graph runtime.

rationale behind the evaluation in tensorflow's tutorial code cifar10_eval.py

In TF's official tutorial code 'cifar10', there is an evaluation snippet:
def evaluate():
with tf.Graph().as_default() as g:
# Get images and labels for CIFAR-10.
eval_data = FLAGS.eval_data == 'test'
images, labels = cifar10.inputs(eval_data=eval_data)
# Build a Graph that computes the logits predictions from the
# inference model.
logits = cifar10.inference(images)
# Calculate predictions.
top_k_op = tf.nn.in_top_k(logits, labels, 1)
# Restore the moving average version of the learned variables for eval.
variable_averages = tf.train.ExponentialMovingAverage(
cifar10.MOVING_AVERAGE_DECAY)
variables_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
# Build the summary operation based on the TF collection of Summaries.
summary_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter(FLAGS.eval_dir, g)
while True:
eval_once(saver, summary_writer, top_k_op, summary_op)
if FLAGS.run_once:
break
time.sleep(FLAGS.eval_interval_secs)
At runtime, it evaluates one batch of test samples and prints out 'precision' in the console every other eval_interval_secs, my questions are:
each time eval_once() is executed, one batch of samples (128) are dequeued from the data queue, but why I didn't see the evaluation stop after enough batches, 10000/128 + 1 = 79 batches? I thought it should stop after 79 batches.
Are batches from the first 79 sampling mutually exclusive? I'd assume so but want to double-check this.
If each batch is indeed dequeued from the data queue, what are the samples after 79 times of sampling? some random sampling from the entire duplicate data queue again?
since in_top_k() is taking in some unnormalized logit values and output a string of booleans, this masks the internal conversions of softmax() + thresholding. Is there a TF op for such explicit computations? Ideally, it'd be useful to be able to tune the threshold and see different classification results.
Please help.
Thanks!
You can see the following line in "inputs" def of cifar10_input.py
filename_queue = tf.train.string_input_producer(filenames)
More about tf.train.string_input_producer :
string_input_producer(
string_tensor,
num_epochs=None,
shuffle=True,
seed=None,
capacity=32,
shared_name=None,
name=None,
cancel_op=None
)
num_epochs : produces each string from string_tensor num_epochs times before generating an OutOfRange error. If not specified, string_input_producer can cycle through the strings in string_tensor an unlimited number of times.
In our case, num_epochs is not specified. That's why it does not stop after few batches. It can run unlimited times.
By default, shuffle option is set to True in tf.train.string_input_producer. So, it shuffles the data first and copies that shuffled 10K filenames again and again.
Therefore, it's mutually exclusive. You can print filenames to see this.
As explained in 1, they are repeated samples. (not any random data)
You could avoid using tf.nn.in_top_k. Use tf.nn.softmax and tf.greater_equal to obtain boolean tensor that has softmax value above the specific threshold.
I hope this helps. Please comment if there is any misunderstanding.

Using multiple input pipelines in TensorFlow

I know how to use an input pipeline to read data from files:
input = ... # Read from file
loss = network(input) # build a network
train_op = ... # Using SGD or other algorithms to train the network.
But how can I switch between multiple input pipelines? Say, if I want to train a network for 1000 batches on the training set from the training pipeline, then validate it on a validation set from another pipeline, then keep training, then validate, then train, ..., and so forth.
It's easy to implement this with feed_dict. I also know how to use checkpoints to achieve this, just like in the cifar-10 example. But it's kind of cumbersome: I need to dump the model to disk then read it from disk again.
Can I just switch between two input pipelines (one for training data, one for validation data) to achieve this? Reading 1000 batches from the training data queue, then a few batched from the validation data queue, and so forth. If it is possible, how to do it?
Not sure if this is exactly what you are looking for, but I am doing training and validation in the same code in two separate loops. My code reads numeric and string data from .CSV files, not images. I am reading from two separate CSV files, one for training and one for validation. I'm sure you can generalize it to read from two 'sets' of files, rather than just single files, as the code is there.
Here are the code snippets in case it helps. Note that this code first reads everything as string and then converts the necessary cells into floats, just given my own requirements. If your data is purely numeric, you should just set the defaults to floats and all should be easier. Also, there are a couple of lines in there that drop Weights and Biases into a CSV file AND serialize them into the TF checkpoint file, depending on which way you'd prefer.
#first define the defaults:
rDefaults = [['a'] for row in range((TD+TS+TL))]
# this function reads line-by-line from CSV and separates cells into chunks:
def read_from_csv(filename_queue):
reader = tf.TextLineReader(skip_header_lines=False)
_, csv_row = reader.read(filename_queue)
data = tf.decode_csv(csv_row, record_defaults=rDefaults)
dateLbl = tf.slice(data, [0], [TD])
features = tf.string_to_number(tf.slice(data, [TD], [TS]), tf.float32)
label = tf.string_to_number(tf.slice(data, [TD+TS], [TL]), tf.float32)
return dateLbl, features, label
#this function loads the above lines and spits them out as batches of N:
def input_pipeline(fName, batch_size, num_epochs=None):
filename_queue = tf.train.string_input_producer(
[fName],
num_epochs=num_epochs,
shuffle=True)
dateLbl, features, label = read_from_csv(filename_queue)
min_after_dequeue = 10000
capacity = min_after_dequeue + 3 * batch_size # max of how much to load into memory
dateLbl_batch, feature_batch, label_batch = tf.train.shuffle_batch(
[dateLbl, features, label],
batch_size=batch_size,
capacity=capacity,
min_after_dequeue=min_after_dequeue)
return dateLbl_batch, feature_batch, label_batch
# These are the TRAINING features, labels, and meta-data to be loaded from the train file:
dateLbl, features, labels = input_pipeline(fileNameTrain, batch_size, try_epochs)
# These are the TESTING features, labels, and meta-data to be loaded from the test file:
dateLblTest, featuresTest, labelsTest = input_pipeline(fileNameTest, batch_size, 1) # 1 epoch here regardless of training
# then you define the model, start the session, blah blah
# fire up the queue:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
#This is the TRAINING loop:
try:
while not coord.should_stop():
dateLbl_batch, feature_batch, label_batch = sess.run([dateLbl, features, labels])
_, acc, summary = sess.run([train_step, accuracyTrain, merged_summary_op], feed_dict={x: feature_batch, y_: label_batch,
keep_prob: dropout,
learning_rate: lRate})
except tf.errors.OutOfRangeError: # (so done reading the file(s))
# by the way, this dumps weights and biases into a CSV file, since you asked for that
np.savetxt(fPath + fIndex + '_weights.csv', sess.run(W),
# and this serializes weight and biases into the TF-formatted protobuf:
# tf.train.Saver({'varW': W, 'varB': b}).save(sess, fileNameCheck)
finally:
coord.request_stop()
# now re-start the runners for the testing file:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
try:
while not coord.should_stop():
# so now this line reads features, labels, and meta-data, but this time from the training file:
dateLbl_batch, feature_batch, label_batch = sess.run([dateLblTest, featuresTest, labelsTest])
guessY = tf.argmax(y, 1).eval({x: feature_batch, keep_prob: 1})
trueY = tf.argmax(label_batch, 1).eval()
accuracy = round(tf.reduce_mean(tf.cast(tf.equal(guessY, trueY), tf.float32)).eval(), 2)
except tf.errors.OutOfRangeError:
acCumTest /= i
finally:
coord.request_stop()
coord.join(threads)
This may differ from what you are trying to do in the sense that it first completes the Training loop and THEN restarts the queues for the Testing loop. Not sure how you'd do this if you want to go back and fourth, but you can try to experiment with the two functions defined above by passing them the relevant file names (or lists) interchangeably.
Also I'm not sure if re-starting the queues after training is the best way to go, but it works for me. Would love to see a better example out there, as most TF examples use some built-in wrappers around the MNIST dataset to do the training in one go...

tf.train.string_input_producer behavior in a loop

The following snippet has been taken from the TensorFlow 0.12 API documentation
def input_pipeline(filenames, batch_size, num_epochs=None):
filename_queue = tf.train.string_input_producer(
filenames, num_epochs=num_epochs, shuffle=True)
example, label = read_my_file_format(filename_queue)
# min_after_dequeue defines how big a buffer we will randomly sample
# from -- bigger means better shuffling but slower start up and more
# memory used.
# capacity must be larger than min_after_dequeue and the amount larger
# determines the maximum we will prefetch. Recommendation:
# min_after_dequeue + (num_threads + a small safety margin) * batch_size
min_after_dequeue = 10000
capacity = min_after_dequeue + 3 * batch_size
example_batch, label_batch = tf.train.shuffle_batch(
[example, label], batch_size=batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue)
return example_batch, label_batch
The question I have might be very basic for a regular TensorFlow user, but I am an absolute beginner. The question is the following :
tf.train.string_input_producer creates a queue for holding the filenames. As the input_pipeline() is called over and over again during training, how will it be ensured that everytime the same queue is used ? I guess, it is important since, if different calls to input_pipeline() result in a creation of a new queue, there does not seem to be a way to ensure that different images are picked everytime and epoch counter and shuffling can be properly maintained.
The input_pipeline function only creates the part of a (usually larger) graph that is responsible for producing batches of data. If you were to call input_pipeline twice - for whatever reason - you would be creating two different queues indeed.
In general, the function tf.train.string_input_producer actually creates a queue node (or operation) in the currently active graph (which is the default graph unless you specify something different). read_my_file_format then reads from that queue and sequentially produces single "example" tensors, while tf.train.shuffle_batch then batches these into bundles of length batch_size each.
However, the output of tf.train.shuffle_batch, two Tensors here that are returned from the input_pipeline function, only really takes on a (new) value when it is evaluated under a session. If you evaluate these tensors multiple times, they will contain different values - taken, through read_my_file_format, from files listed in the input queue.
Think of it like so:
X_batch, Y_batch = input_pipeline(..., batch_size=100)
with tf.Session() as sess:
sess.run(tf.global_variable_initializer())
tf.train.start_queue_runners()
# get the first 100 examples and labels
X1, Y1 = sess.run((X_batch, Y_batch))
# get the next 100 examples and labels
X2, Y2 = sess.run((X_batch, Y_batch))
# etc.
The boilerplate code to get it running is a bit more complex, e.g. because queues need to actually be started and stopped in the graph, because they will throw a tf.errors.OutOfRangeError when they run dry, etc.
A more complete example could look like this:
with tf.Graph().as_default() as graph:
X_batch, Y_batch = input_pipeline(..., batch_size=100)
prediction = inference(X_batch)
optimizer, loss = optimize(prediction, Y_batch)
coord = tf.train.Coordinator()
with tf.Session(graph=graph) as sess:
init = tf.group(tf.local_variable_initializer(),
tf.global_variable_initializer())
sess.run(init)
# start the queue runners
threads = tf.train.start_queue_runners(coord=coord)
try:
while not coord.should_stop():
# now you're really indirectly querying the
# queue; each iteration will see a new batch of
# at most 100 values.
_, loss = sess.run((optimizer, loss))
# you might also want to do something with
# the network's output - again, this would
# use a fresh batch of inputs
some_predicted_values = sess.run(prediction)
except tf.errors.OutOfRangeError:
print('Training stopped, input queue is empty.')
finally:
coord.request_stop()
# stop the queue(s)
coord.request_stop()
coord.join(threads)
For a deeper understanding, you might want to look at the Reading data documentation.