Training from CSV file - use every train example per epoch - tensorflow

I have a CSV file with 200.000 training samples that I would like to train my network with.
I'm using an InputProducer and DecodeCSV to get the data. I then run all the data through shuffle_batch, where I set batch_size=50, min_after_dequeue=10000 and capacity=min_after_dequeue + 3 * batch_size.
I then run a loop and call sess.run() repeatedly.
The question I have is that I now want to run this for several epochs. In each epoch I would like to exaust the entire training set. I don't think the current setup does this. How would I go about doing that?
I'm not even sure, if I understood the inner workings of shuffle_batch and its parameters fully as of yet.
Thank you in advance.

The queue should block at the end of the epoch. When that happens, you will know that you have exhausted the training set. More information in this related question: Tensor Flow shuffle_batch() blocks at end of epoch

Related

TensorFlow image classification colab sheet from training material: newbie questions

Apologies if my questions are relatively simple, but I have been approaching the TensorFlow bit recently with the aim to learn new skills.
In the example, but there are several things I can't get:
in the explore data section, the size of the datasets return as 60/10k respectively for train and test.
where the size of the train/test size declared?
packages like SkLearn allows this to be specified in percentage when invoking the split methods.
in the training model part, when the 5 epochs are trained, the 1875 number appear below.
- what is that?
- I was expecting the training to run over the 60k items, but even by multiplying 1875 by 5 the number doesn't reach the 10k.
Dataset is loaded using tensorflow datasets API
The source itself has the split of 60K (Train) and 10K (Test)
https://www.tensorflow.org/datasets/catalog/fashion_mnist
An Epoch is a complete run with all the training samples. The training is done in batches. In the example you refer to, a batch size of 32 is used. So to complete one epoch, 1875 batches (60000 / 32) are run.
Hope this helps.

How to control how frequent data is being recorded in Tensorboard?

So, I created a TensoBoard callback, but, I'm training for 1000's of epochs, and when I view TensorBoard, it is too sluggish because of the enormity of the data to be loaded and plotted, basically millions of datapoints, that's because it is writing everything happening at a batch level. I want only results at the end of epoch, not batch. Can I get to control that?
Additionally, it is by default recording: loss, validation loss, and plenty other things that I didn't ask for. How can I control what is being recorded?
Try update_freq=10000.
This will make it update every 10000 sample.
Or maybe update_freq='epoch', this will update after the epoch end.
https://github.com/keras-team/keras/blob/master/keras/callbacks.py#L997

Training Estimators less than one epoch using dataset API?

I am trying to train a model on a large dataset. I would like to run the evaluation step multiple times before one epoch of training has been completed. Looking at the implementation of Dataset API with Estimators it looks like every time I restart the training after the evaluation step, Estimator creates a fresh dataset from scratch and the training never completes for the full data.
I have written an input function very similar to one provided on the tensorflow website.
def train_input_fn(features, labels, batch_size):
"""An input function for training"""
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features),
labels))
# Shuffle, repeat, and batch the examples.
dataset = dataset.repeat(1).batch(batch_size)
# Return the read end of the pipeline.
return dataset
I then use the tf.estimator.Estimator.train to call my input function. I call the above input function with the following method.
classifier.train(input_fn=lambda: train_input_fn,
steps=n_steps)
where n_steps in number less than the total step taken to complete one epoch.
I then call an evaluation function like this.
classifier.evaluate(input_fn=lambda: eval_input_fn())
I want the run both the step in a loop.
Every time the loop reaches training, It initialization the dataset in the train_input_fn. This applies the training only in first n_steps of training data.
If you want to evaluate multiple times during training, you can check InMemoryEvaluatorHook.
You can probably refer this discussion about train_and_evaluate and InMemoryEvaluatorHook for more details on how to better use them.

Tensorflow: How to get the accuracy/prediction for whole test dataset? not for each batch

I am trying to use Tensorboard to visualize my testing procedure. My purpose is, when every epoch completed, I would like to test the network's accuracy using the whole test dataset, and store this accuracy result into a summary file, so that I can visualize it in Tensorboard.
Tensorflow has summary_op to do it, however (all the existing examples) seems only work for one batch when running the code sess.run(summary_op). I need to calculate the accuracy for the whole test dataset. How can I do that?
Is there any example to do it? Any help will be appreciated.
You could calculate it by:
Batching your test dataset in case it is too large; e.g. into n_test_batches and start with a buffer like buffer_accuracies = 0.0
Adding the batch accuracies into the buffer variable buffer_accuracies
Finally when you processed the whole test dataset divide buffer_accuracies by total number of test_batches
Now you would have test_accuracy = buffer_accuracies/n_test_batchesas a regular python variable
No we can create a summary for that python variable as follows
test_accuracy_summary = tf.Summary()
test_accuracy_summary.add(tag="test_accuracy", simple_value = test_accuracy)
Finally write that into your tensorflow FileWriter e.g.
test_writer.add_summary(test_accuracy_summary,iteration_step)

Running training op multiple times using tf.contrib.data.Dataset?

I'm playing around with the tf.contrib.data.Dataset API. Basically, I have an input tensor generated by dataset.batch(BATCH_SIZE).get_next() and I can currently do something like
for _ in range(100):
sess.run(train_op)
to repeat the train step 100 times. I don't need to feed anything, since all the inputs come from dataset. Doing this in a loop seems like a waste, so is there a way to tell TF to repeat the run step 100 times without having to fall back into Python code between every iteration?
I saw some similar questions about preventing the CPU-GPU transfer between iterations by feeding persistent tensors residing on the GPU, but that's a different problem.