Summary for the a specific branch - tensorflow

I have a tensorflow graph that has a complicated loss function for the training, but a simpler one for evaluation (they share ancestors). Essentially this
train_op = ... (needs more things in feed_dict etc.)
acc = .... (just needs one value for placeholer)
to better understand what's going on, I added summaries. But calling
merged = tf.summary.merge_all()
and then
(summ, acc) = session.run([merged, acc_eval], feed_dict={..})
tensorflow complains that values for placeholders are missing.

As far as I understand your question, to summary a specific tensorflow operation, you should run it specifically.
For example:
# define accuracy ops
correct_prediction = tf.equal(tf.argmax(Y, axis=1), tf.argmax(Y_labels, axis=1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, dtype=tf.float32))
# summary_accuracy is the Summary protocol buffer you need to run,
# instead of merge_all(), if you want to summary specific ops
summary_accuracy = tf.summary.scalar('testing_accuracy', accuracy)
# define writer file
sess.run(tf.global_variables_initializer())
test_writer = tf.summary.FileWriter('log/test', sess.graph)
(summ, acc) = sess.run([summary_accuracy, accuracy], feed_dict={..})
test_writer.add_summary(summ)
Also, you can use tf.summary.merge(), which is documented here.
Hope this help !

Related

A Tensorflow training agnostic to Eager and Graph modes

I spend some of my time coding novel (I wish) RNN cells in Tensorflow.
To prototype, I use eager mode (easier to debug).
In order to train, I migrate the code to a graph (runs faster).
I am looking for a wrapper code/example that can run forward pass and training in a way that will be agnostic to the mode I run it - eager or graph, as much as possible. I have in mind a set of functions/classes, to which the particular neural network/optimizer/data can be inserted, and that these set of functions/classes could run in both modes with minimal changes between the two. In addition, it is of course good that it would be compatible with many types of NN/optimizers/data instances.
I am quite sure that many had this idea.
I wonder if something like this is feasible given the current eager/graph integration in TF.
Yes. I have been wondering the same. In the Tensorflow documentation you can see:
The same code written for eager execution will also build a graph during graph execution. Do this by simply running the same code in a new Python session where eager execution is not enabled.
But this is hard to achieve, mostly because working with graphs means dealing with placeholders, which can not be used in Eager mode. I tried to get rid off placeholders using object-oriented layers and the Dataset API. This is the closest I could get to totally compatible code:
m = 128 # num_examples
n = 5 # num features
epochs = 2
batch_size = 32
steps_per_epoch = m // 32
dataset = tf.data.Dataset.from_tensor_slices(
(tf.random_uniform([m, n], dtype=tf.float32),
tf.random_uniform([m, 1], dtype=tf.float32)))
dataset = dataset.repeat(epochs)
dataset = dataset.batch(batch_size)
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_dim=n),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(1)
])
def train_eagerly(model, dataset):
optimizer = tf.train.AdamOptimizer()
iterator = dataset.make_one_shot_iterator()
print('Training graph...')
for epoch in range(epochs):
print('Epoch', epoch)
progbar = tf.keras.utils.Progbar(target=steps_per_epoch, stateful_metrics='loss')
for step in range(steps_per_epoch):
with tf.GradientTape() as tape:
features, labels = iterator.get_next()
predictions = model(features, training=True)
loss_value = tf.losses.mean_squared_error(labels, predictions)
grads = tape.gradient(loss_value, model.variables)
optimizer.apply_gradients(zip(grads, model.variables))
progbar.add(1, values=[('loss', loss_value.numpy())])
def train_graph(model, dataset):
optimizer = tf.train.AdamOptimizer()
iterator = dataset.make_initializable_iterator()
print('Training graph...')
with tf.Session() as sess:
sess.run(iterator.initializer)
sess.run(tf.global_variables_initializer())
for epoch in range(epochs):
print('Epoch', epoch)
progbar = tf.keras.utils.Progbar(target=steps_per_epoch, stateful_metrics='loss')
for step in range(steps_per_epoch):
with tf.GradientTape() as tape:
features, labels = sess.run(iterator.get_next())
predictions = model(features, training=True)
loss_value = tf.losses.mean_squared_error(labels, predictions)
grads = tape.gradient(loss_value, model.variables)
optimizer.apply_gradients(zip(grads, model.variables))
progbar.add(1, values=[('loss', loss_value.eval())])
As you can see, the main difference is that I use a one_shot_iterator during Eager training (of course, during graph training, I have to run operations within a session).
I tried to do the same using optimizer.minimize instead of applying the gradients myself, but I could not come up with a code that worked both for eager and graph modes.
Also, I'm sure this becomes much harder to do with not so simple models, like the one you are working with.

Using two tensorflow models where one is for inference and another is for training

I'm new to tensorflow and I'm trying to combine two models in one graph because I need one model's inference result to modify the loss function of the other model. I wrote the code and it runs without errors but I'm not sure whether I wrote it correctly so I'm writing this thread.
In the code, I loaded two graphs like this
with tf.variable_scope("modelA"):
new_saver = tf.train.import_meta_graph('modelA-1000.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
with tf.variable_scope("modelB"):
new_saver = tf.train.import_meta_graph('modelB-1000.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./'))
and I used modelA's result to modify modelB's loss funcion as follows
output_A = tf.get_default_graph().get_tensor_by_name("modelA_output:0")
output_B = tf.get_default_graph().get_tensor_by_name("modelB_output:0")
loss = tf.reduce_mean(-tf.reduce_sum(output_A * tf.log(output_B ), reduction_indices=[1]))
Then for training, I included only modelB variables to train since I want to make Model A for inference only.
model_vars = tf.trainable_variables()
var_B = [var for var in model_vars if 'modelB' in var.name]
gradient = tf.gradients(loss,var_B)
trainer = tf.train.GradientDescentOptimizer(0.1)
train_step = trainer.apply_gradients(zip(gradient ,var_B))
... declare session and prepare batch for training ...
for i in range(10000):
loss_ = train_step.run(loss, feed_dict={x: batch[0]})
I ran it and the code runs but the loss does not decrease. What did I do wrong? Thanks for reading!
I am not sure how this code runs. train_step is an Operation. Operation.run() method takes the feed_dict and an optional session. I don't know how train_step.run(loss, feed_dict={x: batch[0]}) can run. Generally, you would do something like this:
with tf.Session() as sess:
_, _loss = sess.run([train_step, loss], feed_dict=...)
As a side note, if you have the code that produced the modelA and modelB in the first place. It is better (i.e. less brittle) to rerun that code to recreate the graph. Once the graph is created, you can restore the variable values from your checkpoint using Saver. This avoids doing brittle extractions like
output_A = tf.get_default_graph().get_tensor_by_name("modelA_output:0")

If the batch_size equals 1 in tf.layers.batch_normalization(), will it works correctly?

everyone. I am using tensorflow 1.4 to train a model like U-net for my purpose. Due to the constraints of my hardware, when training, the batch_size could only set to be 1 otherwise there will be OOM error.
Here comes my question. In this case, when the batch_size equals to 1, will the tf.layers.batch_normalization() works correctly(saying moving average, moving variance, gamma, beta)? will small batch_size makes it working unstable?
In my work, I set training=True when training, and training=False when testing. When training, I use
logits = mymodel.inference()
loss = tf.mean_square_error(labels, logits)
updata_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
...
saver = tf.train.Saver(tf.global_variables())
with tf.Session() as sess:
sess.run(tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer()))
sess.run(train_op)
...
saver.save(sess, save_path, global_step)
when testing, I use:
logits = model.inference()
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, checkpoint)
sess.run(tf.local_variables_initializer())
results = sess.run(logits)
Could anyone tell me that am I using this wrong? And how much influence with batch_size equals to 1 in tf.layers.batch_normalization()?
Any help will be appreciated! Thanks in advance.
Yes, tf.layers.batch_normalization() works with batches of single elements. Doing batch normalization on such batches is actually named instance normalization (i.e. normalization of a single instance).
#Maxim made a great post about instance normalization if you want to know more. You can also find more theory on the web and in the literature, e.g. Instance Normalization: The Missing Ingredient for Fast Stylization.

How to average summaries over multiple batches?

Assuming I have a bunch of summaries defined like:
loss = ...
tf.scalar_summary("loss", loss)
# ...
summaries = tf.merge_all_summaries()
I can evaluate the summaries tensor every few steps on the training data and pass the result to a SummaryWriter.
The result will be noisy summaries, because they're only computed on one batch.
However, I would like to compute the summaries on the entire validation dataset.
Of course, I can't pass the validation dataset as a single batch, because it would be too big.
So, I'll get summary outputs for each validation batch.
Is there a way to average those summaries so that it appears as if the summaries have been computed on the entire validation set?
Do the averaging of your measure in Python and create a new Summary object for each mean. Here is what I do:
accuracies = []
# Calculate your measure over as many batches as you need
for batch in validation_set:
accuracies.append(sess.run([training_op]))
# Take the mean of you measure
accuracy = np.mean(accuracies)
# Create a new Summary object with your measure
summary = tf.Summary()
summary.value.add(tag="%sAccuracy" % prefix, simple_value=accuracy)
# Add it to the Tensorboard summary writer
# Make sure to specify a step parameter to get nice graphs over time
summary_writer.add_summary(summary, global_step)
I would avoid calculating the average outside the graph.
You can use tf.train.ExponentialMovingAverage:
ema = tf.train.ExponentialMovingAverage(decay=my_decay_value, zero_debias=True)
maintain_ema_op = ema.apply(your_losses_list)
# Create an op that will update the moving averages after each training step.
with tf.control_dependencies([your_original_train_op]):
train_op = tf.group(maintain_ema_op)
Then, use:
sess.run(train_op)
That will call maintain_ema_op because it is defined as a control dependency.
In order to get your exponential moving averages, use:
moving_average = ema.average(an_item_from_your_losses_list_above)
And retrieve its value using:
value = sess.run(moving_average)
This calculates the moving average within your calculation graph.
I think it's always better to let tensorflow do the calculations.
Have a look at the streaming metrics. They have an update function to feed the information of your current batch and a function to get the averaged summary.
It's going to look somewhat like this:
accuracy = ...
streaming_accuracy, streaming_accuracy_update = tf.contrib.metrics.streaming_mean(accuracy)
streaming_accuracy_scalar = tf.summary.scalar('streaming_accuracy', streaming_accuracy)
# set up your session etc.
for i in iterations:
for b in batches:
sess.run([streaming_accuracy_update], feed_dict={...})
streaming_summ = sess.run(streaming_accuracy_scalar)
writer.add_summary(streaming_summary, i)
Also see the tensorflow documentation: https://www.tensorflow.org/versions/master/api_guides/python/contrib.metrics
and this question:
How to accumulate summary statistics in tensorflow
You can average store the current sum and recalculate the average after each batch, like:
loss_sum = tf.Variable(0.)
inc_op = tf.assign_add(loss_sum, loss)
clear_op = tf.assign(loss_sum, 0.)
average = loss_sum / batches
tf.scalar_summary("average_loss", average)
sess.run(clear_op)
for i in range(batches):
sess.run([loss, inc_op])
sess.run(average)
For future reference, the TensorFlow metrics API now supports this by default. For example, take a look at tf.mean_squared_error:
For estimation of the metric over a stream of data, the function creates an update_op operation that updates these variables and returns the mean_squared_error. Internally, a squared_error operation computes the element-wise square of the difference between predictions and labels. Then update_op increments total with the reduced sum of the product of weights and squared_error, and it increments count with the reduced sum of weights.
These total and count variables are added to the set of metric variables, so in practice what you would do is something like:
x_batch = tf.placeholder(...)
y_batch = tf.placeholder(...)
model_output = ...
mse, mse_update = tf.metrics.mean_squared_error(y_batch, model_output)
# This operation resets the metric internal variables to zero
metrics_init = tf.variables_initializer(
tf.get_default_graph().get_collection(tf.GraphKeys.METRIC_VARIABLES))
with tf.Session() as sess:
# Train...
# On evaluation step
sess.run(metrics_init)
for x_eval_batch, y_eval_batch in ...:
mse = sess.run(mse_update, feed_dict={x_batch: x_eval_batch, y_batch: y_eval_batch})
print('Evaluation MSE:', mse)
I found one solution myself. I think it's kind of hacky and I hope there is a more elegant solution.
During setup:
valid_loss_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
valid_loss_summary = tf.scalar_summary("valid loss", valid_loss_placeholder)
Or for tensorflow versions after 0.12 (change in name for tf.scalar_summary):
valid_loss_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
valid_loss_summary = tf.summary.scalar("valid loss", valid_loss_placeholder)
Within training loop:
# Compute valid loss in python by doing sess.run() for each batch
# and averaging
valid_loss = ...
summary = sess.run(valid_loss_summary, {valid_loss_placeholder: valid_loss})
summary_writer.add_summary(summary, step)
As of August 2018, streaming metrics have been depreciated. However, unintuitively, all metrics are streaming. So, use tf.metrics.accuracy.
However, if you want accuracy (or another metric) over only a subset of batches, then you can use Exponential Moving Average, as in the answer by #MZHm or reset any of the the tf.metric's by following this very informative blog post
For quite some time I'm only saving the summary once per epoch. I never knew that TensorFlows summary would then only save the summary for the last run batch.
Shocked I looked into this problem. This is the solution I came up with (using the dataset API):
loss = ...
train_op = ...
loss_metric, loss_metric_update = tf.metrics.mean(ae_loss)
tf.summary.scalar('loss', loss_metric)
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(os.path.join(res_dir, 'train'))
test_writer = tf.summary.FileWriter(os.path.join(res_dir, 'test'))
init_local = tf.initializers.local_variables()
init_global = tf.initializers.global_variables()
sess.run(init_global)
def train_run(epoch):
sess.run([dataset.train_init_op, init_local]) # test_init_op is the operation that switches to test data
for i in range(dataset.num_train_batches): # num_test_batches is the number of batches that should be run for the test set
sess.run([train_op, loss_metric_update])
summary, cur_loss = sess.run([merged, loss_metric])
train_writer.add_summary(summary, epoch)
return cur_loss
def test_run(epoch):
sess.run([dataset.test_init_op, init_local]) # test_init_op is the operation that switches to test data
for i in range(dataset.num_test_batches): # num_test_batches is the number of batches that should be run for the test set
sess.run(loss_metric_update)
summary, cur_loss = sess.run([merged, loss_metric])
test_writer.add_summary(summary, epoch)
return cur_loss
for epoch in range(epochs):
train_loss = train_run(epoch+1)
test_loss = test_run(epoch+1)
print("Epoch: {0:3}, loss: (train: {1:10.10f}, test: {2:10.10f})".format(epoch+1, train_loss, test_loss))
For the summary I'm just wrapping the tensor I'm interested in into tf.metrics.mean(). For each batch run I call the metrics update operation. At the end of every epoch the metrics tensor will return the correct mean of all batch results.
Don't forget to initialize local variables every time you switch between training and test data. Otherwise your train and test metrics will be near identical.
I had the same problem when I realized I had to iterate over my validation data when the memory space cramped up and the OOM errors flooding.
As multiple of these answers say, the tf.metrics have this built in, but I'm not using tf.metrics in my project. So inspired by that, I made this:
import tensorflow as tf
import numpy as np
def batch_persistent_mean(tensor):
# Make a variable that keeps track of the sum
accumulator = tf.Variable(initial_value=tf.zeros_like(tensor), dtype=tf.float32)
# Keep count of batches in accumulator (needed to estimate mean)
batch_nums = tf.Variable(initial_value=tf.zeros_like(tensor), dtype=tf.float32)
# Make an operation for accumulating, increasing batch count
accumulate_op = tf.assign_add(accumulator, tensor)
step_batch = tf.assign_add(batch_nums, 1)
update_op = tf.group([step_batch, accumulate_op])
eps = 1e-5
output_tensor = accumulator / (tf.nn.relu(batch_nums - eps) + eps)
# In regards to the tf.nn.relu, it's a hacky zero_guard:
# if batch_nums are zero then return eps, else it'll be batch_nums
# Make an operation to reset
flush_op = tf.group([tf.assign(accumulator, 0), tf.assign(batch_nums, 0)])
return output_tensor, update_op, flush_op
# Make a variable that we want to accumulate
X = tf.Variable(0., dtype=tf.float32)
# Make our persistant mean operations
Xbar, upd, flush = batch_persistent_mean(X)
Now you send Xbar to your summary e.g. tf.scalar_summary("mean_of_x", Xbar), and where you'd do sess.run(X) before, you'll do sess.run(upd). And between epochs you'd do sess.run(flush).
Testing behaviour:
### INSERT ABOVE CODE CHUNK IN S.O. ANSWER HERE ###
sess = tf.InteractiveSession()
with tf.Session() as sess:
sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
# Calculate the mean of 1+2+...+20
for i in range(20):
sess.run(upd, {X: i})
print(sess.run(Xbar), "=", np.mean(np.arange(20)))
for i in range(40):
sess.run(upd, {X: i})
# Now Xbar is the mean of (1+2+...+20+1+2+...+40):
print(sess.run(Xbar), "=", np.mean(np.concatenate([np.arange(20), np.arange(40)])))
# Now flush it
sess.run(flush)
print("flushed. Xbar=", sess.run(Xbar))
for i in range(40):
sess.run(upd, {X: i})
print(sess.run(Xbar), "=", np.mean(np.arange(40)))

Loss functions in tensorflow (with an if - else)

I am trying a different loss functions in tensorflow.
The loss function I want is a kind of an epsilon insensitive function (this is componentwise):
if(|yData-yModel|<epsilon):
loss=0
else
loss=|yData-yModel|
I tried this solution:
yData=tf.placeholder("float",[None,numberOutputs])
yModel=model(...
epsilon=0.2
epsilonTensor=epsilon*tf.ones_like(yData)
loss=tf.maximum(tf.abs(yData-yModel)-epsilonTensor,tf.zeros_like(yData))
optimizer = tf.train.GradientDescentOptimizer(0.25)
train = optimizer.minimize(loss)
I also used
optimizer = tf.train.MomentumOptimizer(0.001,0.9)
I do not find any error in the implementation. However, it does not converge, while the loss = tf.square(yData-yModel) converges and loss=tf.maximum(tf.square(yData-yModel)-epsilonTensor,tf.zeros_like(yData)) also converges.
So, I also tried something simpler loss=tf.abs(yData-yModel) and it also does not converge. Am I making some mistake, or having problems with the non-differentiability of the abs at zero or something else? What is happenning with the abs function?
When your loss is something like Loss(x)=abs(x-y), then solution is an unstable fixed point of SGD -- start your minimization with a point arbitrarily close to the solution, and the next step will increase the loss.
Having a stable fixed point is a requirement for convergence of an iterative procedure like SGD. In practice this means your optimization will move towards a local minimum, but after getting close enough, will jump around the solution with steps proportional to the learning rate. Here's a toy TensorFlow program that illustrates the problem
x = tf.Variable(0.)
loss_op = tf.abs(x-1.05)
opt = tf.train.GradientDescentOptimizer(0.1)
train_op = opt.minimize(loss_op)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
xvals = []
for i in range(20):
unused, loss, xval = sess.run([train_op, loss_op, x])
xvals.append(xval)
pyplot.plot(xvals)
Some solutions to the problem:
Use a more robust solver such as the Proximal Gradient Method
Use more SGD friendly loss function such as Huber Loss
Use learning rate schedule to gradually decrease learning rate
Here's a way to implement (3) on the toy problem above
x = tf.Variable(0.)
loss_op = tf.abs(x-1.05)
step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(
0.2, # Base learning rate.
step, # Current index into the dataset.
1, # Decay step.
0.9 # Decay rate
)
opt = tf.train.GradientDescentOptimizer(learning_rate)
train_op = opt.minimize(loss_op, global_step=step)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
xvals = []
for i in range(40):
unused, loss, xval = sess.run([train_op, loss_op, x])
xvals.append(xval)
pyplot.plot(xvals)