How to get both loss and model output at once, on a batch of data in Keras? - tensorflow

I'm using Keras w/ Tensorflow backend to train a NN.
I'm using train_on_batch for training, which returns the loss on the given batch. How do I also get the output classification on that batch ? (I'd like to do some visualisations of the output)
To do that I currently do another call to predict to get the model output, but that's redundant since train_on_batch have already passed the input batch "forward".
In Caffe, when an image is fed forward, the intermediate layer outputs stay stored in net.blobs, but in Keras/Tensorflow it seems that if we want to get an intermediate output we have to rerun the computational graph for each intermediate output we want to access on CPU, as described here. Is there a way to access many/all intermediate layers' outputs without rerunning the graph for each ?
I don't mind having a tensorflow-specific workaround.

If you use the function API, this is pretty straight forward.

In addition to #MohamedEzz's answer, you can create a custom callback which can perform the operations you require during the training process. They have methods which will run your code onEpochEnd, onEpochStart, onTrainingEnd and so on...
This way you can preserve the batch.

Related

Does it make sense to use Tensorflow Dataset over a Keras DataGenerator?

I am training a model using tf.keras and I have many small .npy files with single observations in a folder on local disk. I have build a DataGeneretor(keras.utils.Sequence) class and it works correctly, although I have a warning:
'tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.'
I have found out that I can simply create something like this:
ds = tf.data.Dataset.from_generator(
DataGenerator, args=[...],
output_types=(tf.float16, tf.uint8),
output_shapes=([None,256,256,3], [None,256,256,1]),
)
and then my Keras DataGenerator would work as a single file reader and a TF Dataset as interface to create batches. My question is: does it make any sense? Would it be safer? Would it read next batch during the training of previous batch, when using simple model.fit?

How is teacher-forcing implemented for the Transformer training?

In this part of Tensorflow's tutorial here, they mentioned that they are training with teacher-forcing. To my knowledge, teacher-forcing involves feeding the target output into the model so that it converges faster. So I'm curious as to how this is done here? The real target is tar_real, and as far as I can see, it is only used to calculate loss and accuracy. I'm curious as to how this code is implementing teacher-forcing?
Thanks in advance.
Each train_step takes in inp and tar objects from the dataset in the training loop. Teacher forcing is indeed used since the correct example from the dataset is always used as input during training (as opposed to the "incorrect" output from the previous training step):
tar is split into tar_inp, tar_real (offset by one character)
inp, tar_inp is used as input to the model
model produces an output which is compared with tar_real to calculate loss
model output is discarded (not used anymore)
repeat loop
Teacher forcing is a procedure ... in which during training the model receives the ground truth output y(t) as input at time t+1.
Page 372, Deep Learning, 2016.
Source: https://github.com/tensorflow/tensorflow/issues/30852#issuecomment-513528114

Purpose of batch channel in tensorflow model on forward pass of 1 input

So far I have trained a couple different models in TensorFlow (with Keras) and I see that getting the batch_size right seems to be important not just for speed of training but also the resultant accuracy of the model.
What confuses me is a case where a model has an actual batch channel as the first dimension on the input (and output as well). If my batch size is 32 but I'm always inputting 1 data at run-time then where does the batch channel apply? How could I utilize the vast majority of it if I'm inherently only using 1/batch_size amount of it in forward pass?
If you are curious the model I am researching, it is this one:
https://github.com/pierluigiferrari/ssd_keras/blob/master/models/keras_ssd300.py
see:
Output shape of predictions: (batch, n_boxes_total, n_classes + 4 + 8)
predictions = Concatenate(axis=2, name='predictions')([mbox_conf_softmax, mbox_loc, mbox_priorbox])
The tensors had run through numerous other layers that had constants and such pretrained with [batch_size] as well. To me it just seems like inputs at various batch index would have to yield different results. Maybe I just need something incredibly obvious pointed out to me.
It would seem that after training you must recompile the model with a batch size of 1, then transfer the weights from the training model to the new model for evaluation. The alternative is performing 'batch_size' count of predictions at once (which of course is not always feasible per application). If there are alternatives (or if I read wrong) please feel free to add an answer.

How to smoothly produce Tensorflow auc summaries for training and test sets?

Tensorflow describes writing file summaries to visualize graph execution.
I envision three stages:
training the data (with optimization)
measuring accuracy on the training set (no optimization)
measuring accuracy on the test set (no optimization!)
I'd like all stages in the same script, as in the evaluate function of the wide_and_deep tutorial, but with the low-level API. I'd like three different graphs for stats like loss or AUC, one for each stage.
Suppose I use one session, and in each stage I define an AUC summary op:
# define auc
auc, auc_op = tf.metrics.auc(labels, predictions)
# summary scalar to track it
tf.summary.scalar("auc", auc_op, family=family_name)
# merge all summaries for evaluation and later writing
summary_op = tf.summary.merge_all()
...
summary_writer.add_summary(summary, step_num)
There are three graphs, but the first graph has all three runs on it, and the second graph has the last two runs (see below). What's worse, each stage starts from the previous state. This makes sense, because all the variables from the previous stages are still around.
I could use a different session for each stage, but that would throw away the model as well.
What is the smooth way to handle this?
I'd like to just clear some of the summary variables. I've tried re-initializing some variables, looked at related questions, read about name scope and variable scope and tried not to re-use variables for AUC, read about variables and sharing, looked into pruning nodes (though I don't understand it), etc. I have not made it work yet.
I am using the low-level API. I saw something like this in the high-level API in _eval_metric_ops, but I don't understand how they 'clear' the different stages. With name_scope?
Do I have to save and load the model into a new session just for this, or is there some clean way to graph each summary separately?
The metric ops will be local variables, so you could run tf.local_variables_initializer() in your Session, which will reset all of your metrics. You could also look through the local variables collection for those with "auc" in the name if you wanted to be a bit more discerning. The high-level way to do this would be to use an Estimator, which will manage metrics for you.

Tensorflow input pipeline

I have an input pipeline where samples are generated on fly. I use keras and custom ImageDataGenerator and corresponding Iterator to get samples in memory.
Under assumption that keras in my setup is using feed_dict (and that assumption is a question to me) I am thinking of speeding things up by switching to raw tensorflow + Dataset.from_generator().
Here I see that suggested solution for input pipelines that generate data on fly in the most recent Tensorflow is to use Dataset.from_generator().
Questions:
Does keras with Tensorflow backend use feed_dict method?
If I switch to raw tensorflow + Dataset.from_generator(my_sample_generator) will that cut feed_dict memory copy overhead and buy me performance?
During predict (evaluation) phase apart from batch_x, batch_y I have also opaque index vector from my generator output. That vector corresponds to sample ids in the batch_x. Does that mean that I'm stuck with feed_dict approach for predict phase because I need that extra batch_z output from iterator?
The new tf.contrib.data.Dataset.from_generator() can potentially speed up your input pipeline by overlapping the data preparation with training. However, you will tend to get the best performance by switching over to TensorFlow ops in your input pipeline wherever possible.
To answer your specific questions:
The Keras TensorFlow backend uses tf.placeholder() to represent compiled function inputs, and feed_dict to pass arguments to a function.
With the recent optimizations to tf.py_func() and feed_dict copy overhead, I suspect the amount of time spent in memcpy() will be the same. However, you can more easily use Dataset.from_generator() with Dataset.prefetch() to overlap the training on one batch with preprocessing on the next batch.
It sounds like you can define a separate iterator for the prediction phase. The tf.estimator.Estimator class does something similar by instantiating different "input functions" with different signatures for training and evaluation, then building a separate graph for each role.
Alternatively, you could add a dummy output to your training iterator (for the batch_z values) and switch between training and evaluation iterators using a "feedable iterator".