Computing exact moving average over multiple batches in tensorflow - tensorflow

During training, I would like to write the average loss over the last N mini-batches to SummaryWriter as a way of smoothing the very noisy batch loss. It's easy to compute this in python and print it, but I would like to add this to a summary so that I can see it in tensorboard. Here's an overly simplified example of what I'm doing now.
losses = []
for i in range(10000):
_, loss = session.run([train_op, loss_op])
losses.append(loss)
if i % 100 == 0:
# How to produce a scalar_summary here?
print sum(losses)/len(losses)
losses = []
I'm aware that I could use ExponentialMovingAverage with a decay of 1.0, but I would still need some way to reset this every N batches. Really, if all I care about is visualizing loss in tensorboard, the reset probably isn't necessary, but I'm still curious how one would go about aggregating across batches for other reasons (e.g. computing total accuracy over a test dataset that is too big to run in a single batch).

You can manually construct the Summary object, like this:
from tensorflow.core.framework import summary_pb2
def make_summary(name, val):
return summary_pb2.Summary(value=[summary_pb2.Summary.Value(tag=name,
simple_value=val)])
summary_writer.add_summary(make_summary('myvalue', myvalue), step)

Passing data from python to a graph function like tf.scalar_summary can be done using a placeholder and feed_dict.
average_pl = tf.placeholder(tf.float32)
average_summary = tf.summary.scalar("average_loss", average_pl)
writer = tf.summary.FileWriter("/tmp/mnist_logs", sess.graph_def)
losses = []
for i in range(10000):
_, loss = session.run([train_op, loss_op])
losses.append(loss)
if i % 100 == 0:
# How to produce a scalar_summary here?
feed = {average_pl: sum(losses)/len(losses)}
summary_str = sess.run(average_summary, feed_dict=feed)
writer.add_summary(summary_str, i)
losses = []
I haven't tried it and this was hastily copied from the visualizing data how to but I expect something like this would work.

Related

Training runs out of memory as RAM consumption keeps growing

I am not sure since when am having this issue and I have to believe that this happened at some point between today and a few months ago but it would seem that the RAM (CPU) consumption grows over time during epochs.
self.model.fit(
train_data,
initial_epoch=self.status.valid_last.epoch,
epochs=train_config.epochs,
steps_per_epoch=train_config.steps_per_epoch,
callbacks=self._get_experiment_callbacks(),
validation_data=valid_data,
validation_steps=train_config.validation_steps,
)
The only thing out of the ordinary here might be the callbacks I am passing but there's actually nothing special here. One is a TensorBoard (TB) callback and the other is a custom Metric which is not doing much except plotting the learning rate and other general metrics to TB.
def _get_experiment_callbacks(self) -> List[tf.keras.callbacks.Callback]:
tensorboard_cb = tf.keras.callbacks.TensorBoard(
log_dir=os.path.join(out_dir, "logs"),
update_freq="epoch",
profile_batch=profile_batch,
write_images=True,
)
# Not interested in whatever is plotted in those
tensorboard_cb.on_epoch_end = lambda *args: ...
tensorboard_cb.on_test_end = lambda *args: ...
return [
tensorboard_cb,
Metrics(tensorboard_cb, update_freq=100),
]
This leaves us with the last suspect which is the valid_data itself. This is essentially just a list of protobuf files (shards) which I am loading like so:
def load_shards(
decode_example_fn: Callable,
shard_fps: List[str],
training: bool,
buffer_size: int = None # 50 * 1000 ** 2,
) -> tf.data.Dataset:
if not len(shard_fps) > 0:
raise ValueError("Argument shard_fps must be a list to shards but is empty.")
def make_dense_(example):
for k, v in example.items():
if isinstance(v, tf.SparseTensor):
example[k] = tf.sparse.to_dense(v)
return example
def load_records_(filenames):
record_dataset = tf.data.TFRecordDataset(filenames, buffer_size=buffer_size)
record_dataset = record_dataset.map(decode_example_fn)
record_dataset = record_dataset.map(make_dense_)
return record_dataset
if not training:
shard_fps = sorted(shard_fps)
dataset = tf.data.Dataset.from_tensor_slices(tf.constant(shard_fps))
options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.DATA
dataset = dataset.with_options(options)
if training:
dataset = dataset.interleave(load_records_, num_parallel_calls=tf.data.AUTOTUNE, deterministic=False)
else:
dataset = dataset.apply(load_records_)
dataset = dataset.prefetch(tf.data.AUTOTUNE)
return dataset
and from then on there's just preprocessing and transformation mappings on the inputs. So.. I would not expect any memory leak at this point
Still, I am observing a continuous increase of memory consumption over time. The screenshot below shows the consumption after a restart.
At first we use ~28GB of RAM. After 100 steps there's a sharp increase, to ~33GB and from there it kind of seems to stabilize at around 38GB. The next big jump at 216k steps is coming from an evaluation. From there it's just constantly growing ..
From the looks it appears as if the memory usage stabilized and the jump only occurs after each epoch (1 epoch = 6000 steps).
There could be any number of things that could be wrong. TensorBoard could possibly not be reusing the same graph, but instead is adding graphs, which leads to OOM. I don't use TensorBoard myself because I remember this as happening to me a few years back. It's also possible that using model.fit is the problem and that you're loading your data at every epoch. You could try writing the training loop something like:
for epoch in tf.range(epochs):
batch_train_loss = []
batch_train_acc = []
for batch, (X, Y) in train_dataset.enumerate():
train_loss = train_fn(X, Y, model, loss, optimizer, metric, batch) # do the actual training
train_acc = metric.result().numpy() # get the training accuracy
batch_train_loss.append(train_loss) # save the training loss above
batch_train_acc.append(train_acc) # save the training accuracy above
metric.reset_states() # reset the metric after every batch
where the train_fn is:
def get_apply_train_fn():
#tf.function
def train_function(X, Y, model, loss, optimizer, metric, step):
with tf.GradientTape() as tape:
predictions = model(X, training=True)
loss_value = loss(Y, predictions)
gradients = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_acc = metric.update_state(Y, predictions)
return loss_value
return train_function
train_fn = get_apply_train_fn()
Now, this is a stupidly complicated way of writing model.fit, but it does work.
Another way in which I've had to combat OOM on GPU side is use Python's multiprocessing, but this was in a context where I was doing 10-fold cross-validation and the training would crash after 7 or 8 folds with OOM.
Alternatively, you could try turning eager execution on or off with
tf.config.run_functions_eagerly(False) # or True

How can I implement iterative incremental training for xgboost? Is it worth doing?

I am currently using Xgboost version 1.3.1. There is a custom docker image created out of training scripts and uses SageMaker to run training. Training data is also present in S3. I am facing an issue recently that input data size (data frame) required is more than what the box could support (and there is no higher instance after that). And hence facing OOM issue
I would like to know, if there is a way to resolve this big data issue. Or is it possible to load data iteratively and train using xgb_model option? If so how?
Thanks in advance
I don't know about Sage, but to train using XGB incrementally in batches of rows, I do the following. For your case, I guess you will have to see if this helps you make data fit.
first, I split the dataframe into X,y, then convert them to np arrays
e.g.
X = pd.read_csv(final_ds)
y = X.pop('target')
X = X.values # convert to numpy array
y = y.values # convert to numpy array
Then do the split into the usual X_train, X_valid, y_train, y_valid
Determine batch size desired, like for example here :
size = len(X_train)
print()
print(f'Size of X_train: {size}')
print()
for i in range(1, size):
if (size % i) == 0:
print(f'{i}', end=' ') # choose from these
*The smaller the batch size, the slower the training
Split data into batches
batch_size = <your selected batch size>
col_size = <the number of columns of X_train>
X_train_batched, y_train_batched = X_train.reshape(-1,batch_size,col_size), y_train.reshape(-1,batch_size)
Then use the parameter xgb_model in the xgb fit(). This tells the fit() to resume from the last trial.
e.g.
param = <your xgb parmeters>
model_xgbc = XGBClassifier(**param,use_label_encoder =False)
'''.
For incremental, use xgb_model parameter in fit().
Run the 1st fit() first without xgb_model as parameter, and
next fits with the xgb_model which contains the model object of the last training.
'''
# Fit Model
for i, (X_batch, y_batch) in enumerate(zip(X_train_batched, y_train_batched)):
print(f'Step: {i}',end = ' ')
if i == 0:
model_xgbc.fit(X_batch, y_batch, eval_set=[(X_valid, y_valid)],
verbose=False, eval_metric = ['logloss'],
early_stopping_rounds = 400)
else:
model_xgbc.fit(X_batch, y_batch, eval_set=[(X_valid, y_valid)],
verbose=False, eval_metric = ['logloss'],
early_stopping_rounds = 400,
xgb_model = model_xgbc
)
preds = model_xgbc.predict(X_valid)
rmse = metrics.mean_squared_error(y_valid, preds,squared=False)
print(rmse)

Pytorch how to get the gradient of loss function twice

Here is what I'm trying to implement:
We calculate loss based on F(X), as usual. But we also define "adversarial loss" which is a loss based on F(X + e). e is defined as dF(X)/dX multiplied by some constant. Both loss and adversarial loss are backpropagated for the total loss.
In tensorflow, this part (getting dF(X)/dX) can be coded like below:
grad, = tf.gradients( loss, X )
grad = tf.stop_gradient(grad)
e = constant * grad
Below is my pytorch code:
class DocReaderModel(object):
def __init__(self, embedding=None, state_dict=None):
self.train_loss = AverageMeter()
self.embedding = embedding
self.network = DNetwork(opt, embedding)
self.optimizer = optim.SGD(parameters)
def adversarial_loss(self, batch, loss, embedding, y):
self.optimizer.zero_grad()
loss.backward(retain_graph=True)
grad = embedding.grad
grad.detach_()
perturb = F.normalize(grad, p=2)* 0.5
self.optimizer.zero_grad()
adv_embedding = embedding + perturb
network_temp = DNetwork(self.opt, adv_embedding) # This is how to get F(X)
network_temp.training = False
network_temp.cuda()
start, end, _ = network_temp(batch) # This is how to get F(X)
del network_temp # I even deleted this instance.
return F.cross_entropy(start, y[0]) + F.cross_entropy(end, y[1])
def update(self, batch):
self.network.train()
start, end, pred = self.network(batch)
loss = F.cross_entropy(start, y[0]) + F.cross_entropy(end, y[1])
loss_adv = self.adversarial_loss(batch, loss, self.network.lexicon_encoder.embedding.weight, y)
loss_total = loss + loss_adv
self.optimizer.zero_grad()
loss_total.backward()
self.optimizer.step()
I have few questions:
1) I substituted tf.stop_gradient with grad.detach_(). Is this correct?
2) I was getting "RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time." so I added retain_graph=True at the loss.backward. That specific error went away.
However now I'm getting a memory error after few epochs (RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.cu:58
). I suspect I'm unnecessarily retaining graph.
Can someone let me know pytorch's best practice on this? Any hint / even short comment will be highly appreciated.
I think you are trying to implement generative adversarial network (GAN), but from the code, I don't understand and can't follow to what you are trying to achieve as there are a few missing pieces for a GAN to works. I can see there's a discriminator network module, DNetwork but missing the generator network module.
If to guess, when you say 'loss function twice', I assumed you mean you have one loss function for the discriminator net and another for the generator net. If that's the case, let me share how I would implement a basic GAN model.
As an example, let's take a look at this Wasserstein GAN Jupyter notebook
I'll skip the less important bits and zoom into the important ones here:
First, import PyTorch libraries and set up
# Set up batch size, image size, and size of noise vector:
bs, sz, nz = 64, 64, 100 # nz is the size of the latent z vector for creating some random noise later
Build a discriminator module
class DCGAN_D(nn.Module):
def __init__(self):
... truncated, the usual neural nets stuffs, layers, etc ...
def forward(self, input):
... truncated, the usual neural nets stuffs, layers, etc ...
Build a generator module
class DCGAN_G(nn.Module):
def __init__(self):
... truncated, the usual neural nets stuffs, layers, etc ...
def forward(self, input):
... truncated, the usual neural nets stuffs, layers, etc ...
Put them all together
netG = DCGAN_G().cuda()
netD = DCGAN_D().cuda()
Optimizer needs to be told what variables to optimize. A module automatically keeps track of its variables.
optimizerD = optim.RMSprop(netD.parameters(), lr = 1e-4)
optimizerG = optim.RMSprop(netG.parameters(), lr = 1e-4)
One forward step and one backward step for Discriminator
Here, the network can calculate gradient during the backward pass, depends on the input to this function. So, in my case, I have 3 type of losses; generator loss, dicriminator real image loss, dicriminator fake image loss. I can get gradient of loss function three times for 3 different net passes.
def step_D(input, init_grad):
# input can be from generator's generated image data or input image from dataset
err = netD(input)
err.backward(init_grad) # backward pass net to calculate gradient
return err # loss
Control trainable parameters [IMPORTANT]
Trainable parameters in the model are those that require gradients.
def make_trainable(net, val):
for p in net.parameters():
p.requires_grad = val # note, i.e, this is later set to False below in netG update in the train loop.
In TensorFlow, this part can be coded like below:
grad = tf.gradients(loss, X)
grad = tf.stop_gradient(grad)
So, I think this will answer your first question, "I substituted tf.stop_gradient with grad.detach_(). Is this correct?"
Train loop
You can see here how's the 3 different loss functions are being called here.
def train(niter, first=True):
for epoch in range(niter):
# Make iterable from PyTorch DataLoader
data_iter = iter(dataloader)
i = 0
while i < n:
###########################
# (1) Update D network
###########################
make_trainable(netD, True)
# train the discriminator d_iters times
d_iters = 100
j = 0
while j < d_iters and i < n:
j += 1
i += 1
# clamp parameters to a cube
for p in netD.parameters():
p.data.clamp_(-0.01, 0.01)
data = next(data_iter)
##### train with real #####
real_cpu, _ = data
real_cpu = real_cpu.cuda()
real = Variable( data[0].cuda() )
netD.zero_grad()
# Real image discriminator loss
errD_real = step_D(real, one)
##### train with fake #####
fake = netG(create_noise(real.size()[0]))
input.data.resize_(real.size()).copy_(fake.data)
# Fake image discriminator loss
errD_fake = step_D(input, mone)
# Discriminator loss
errD = errD_real - errD_fake
optimizerD.step()
###########################
# (2) Update G network
###########################
make_trainable(netD, False)
netG.zero_grad()
# Generator loss
errG = step_D(netG(create_noise(bs)), one)
optimizerG.step()
print('[%d/%d][%d/%d] Loss_D: %f Loss_G: %f Loss_D_real: %f Loss_D_fake %f'
% (epoch, niter, i, n,
errD.data[0], errG.data[0], errD_real.data[0], errD_fake.data[0]))
"I was getting "RuntimeError: Trying to backward through the graph a second time..."
PyTorch has this behaviour; to reduce GPU memory usage, during the .backward() call, all the intermediary results (if you have like saved activations, etc.) are deleted when they are not needed anymore. Therefore, if you try to call .backward() again, the intermediary results don't exist and the backward pass cannot be performed (and you get the error you see).
It depends on what you are trying to do. You can call .backward(retain_graph=True) to make a backward pass that will not delete intermediary results, and so you will be able to call .backward() again. All but the last call to backward should have the retain_graph=True option.
Can someone let me know pytorch's best practice on this
As you can see from the PyTorch code above and from the way things are being done in PyTorch which is trying to stay Pythonic, you can get a sense of PyTorch's best practice there.
If you want to work with higher-order derivatives (i.e. a derivative of a derivative) take a look at the create_graph option of backward.
For example:
loss = get_loss()
loss.backward(create_graph=True)
loss_grad_penalty = loss + loss.grad
loss_grad_penalty.backward()

Tensorflow step size incredibly small to prevent errors?

I'm trying to do a simple linear regression problem using Gradient Descent with Tensorflow, but unless I set my step size really, really small, the weight and bias balloon and overflow almost immediately. Here's my code:
import numpy as np
import tensorflow as tf
# Read the data
COLUMNS = ["url", "title_length", "article_length", "keywords", "shares"]
data = np.genfromtxt("OnlineNewsPopularitySample3.csv", delimiter=',', names=COLUMNS)
# We're looking for shares based on article_length
article_length = tf.placeholder("float")
shares = tf.placeholder("float")
# Set up the variables we're going to use
initial_m = 1.0
initial_b = 1.0
w = tf.Variable([initial_m, initial_b], name="w")
predicted_shares = tf.multiply(w[0], article_length) + w[1]
error = tf.square(predicted_shares - shares)
# This is as big as I can make it; any larger, and I have problems.
step_size = .000000025
optimizer = tf.train.GradientDescentOptimizer(step_size).minimize(error)
model = tf.global_variables_initializer()
with tf.Session() as session:
# First initialize all the variables
session.run(model)
# Now we're going to run the optimizer
for i in range(100000):
session.run(optimizer, feed_dict={article_length: data['article_length'], shares: data['shares']})
if (i % 100 == 0):
print (session.run(w))
# Once it's done, we need to get the value of w so we can display it.
w_value = session.run(w)
print("Predicted model: {a:.3f}x + {b:.3f}".format(a=w_value[0], b=w_value[1]))
So basically, when I run this, the outputs become "NaN" almost immediately. Any ideas?
Thanks in advance!
A very low learning rate means a very small update to the weights. In your case, even a relatively small learning rate is blowing up your weights, its because the weight updates (dE/dW) seems to be very large. And the update is a function of the output Error. If the labels are large values, your squared error will be high at the start as the predictions will be quite low. Try scaling the outputs to avoid this problem.

How to average summaries over multiple batches?

Assuming I have a bunch of summaries defined like:
loss = ...
tf.scalar_summary("loss", loss)
# ...
summaries = tf.merge_all_summaries()
I can evaluate the summaries tensor every few steps on the training data and pass the result to a SummaryWriter.
The result will be noisy summaries, because they're only computed on one batch.
However, I would like to compute the summaries on the entire validation dataset.
Of course, I can't pass the validation dataset as a single batch, because it would be too big.
So, I'll get summary outputs for each validation batch.
Is there a way to average those summaries so that it appears as if the summaries have been computed on the entire validation set?
Do the averaging of your measure in Python and create a new Summary object for each mean. Here is what I do:
accuracies = []
# Calculate your measure over as many batches as you need
for batch in validation_set:
accuracies.append(sess.run([training_op]))
# Take the mean of you measure
accuracy = np.mean(accuracies)
# Create a new Summary object with your measure
summary = tf.Summary()
summary.value.add(tag="%sAccuracy" % prefix, simple_value=accuracy)
# Add it to the Tensorboard summary writer
# Make sure to specify a step parameter to get nice graphs over time
summary_writer.add_summary(summary, global_step)
I would avoid calculating the average outside the graph.
You can use tf.train.ExponentialMovingAverage:
ema = tf.train.ExponentialMovingAverage(decay=my_decay_value, zero_debias=True)
maintain_ema_op = ema.apply(your_losses_list)
# Create an op that will update the moving averages after each training step.
with tf.control_dependencies([your_original_train_op]):
train_op = tf.group(maintain_ema_op)
Then, use:
sess.run(train_op)
That will call maintain_ema_op because it is defined as a control dependency.
In order to get your exponential moving averages, use:
moving_average = ema.average(an_item_from_your_losses_list_above)
And retrieve its value using:
value = sess.run(moving_average)
This calculates the moving average within your calculation graph.
I think it's always better to let tensorflow do the calculations.
Have a look at the streaming metrics. They have an update function to feed the information of your current batch and a function to get the averaged summary.
It's going to look somewhat like this:
accuracy = ...
streaming_accuracy, streaming_accuracy_update = tf.contrib.metrics.streaming_mean(accuracy)
streaming_accuracy_scalar = tf.summary.scalar('streaming_accuracy', streaming_accuracy)
# set up your session etc.
for i in iterations:
for b in batches:
sess.run([streaming_accuracy_update], feed_dict={...})
streaming_summ = sess.run(streaming_accuracy_scalar)
writer.add_summary(streaming_summary, i)
Also see the tensorflow documentation: https://www.tensorflow.org/versions/master/api_guides/python/contrib.metrics
and this question:
How to accumulate summary statistics in tensorflow
You can average store the current sum and recalculate the average after each batch, like:
loss_sum = tf.Variable(0.)
inc_op = tf.assign_add(loss_sum, loss)
clear_op = tf.assign(loss_sum, 0.)
average = loss_sum / batches
tf.scalar_summary("average_loss", average)
sess.run(clear_op)
for i in range(batches):
sess.run([loss, inc_op])
sess.run(average)
For future reference, the TensorFlow metrics API now supports this by default. For example, take a look at tf.mean_squared_error:
For estimation of the metric over a stream of data, the function creates an update_op operation that updates these variables and returns the mean_squared_error. Internally, a squared_error operation computes the element-wise square of the difference between predictions and labels. Then update_op increments total with the reduced sum of the product of weights and squared_error, and it increments count with the reduced sum of weights.
These total and count variables are added to the set of metric variables, so in practice what you would do is something like:
x_batch = tf.placeholder(...)
y_batch = tf.placeholder(...)
model_output = ...
mse, mse_update = tf.metrics.mean_squared_error(y_batch, model_output)
# This operation resets the metric internal variables to zero
metrics_init = tf.variables_initializer(
tf.get_default_graph().get_collection(tf.GraphKeys.METRIC_VARIABLES))
with tf.Session() as sess:
# Train...
# On evaluation step
sess.run(metrics_init)
for x_eval_batch, y_eval_batch in ...:
mse = sess.run(mse_update, feed_dict={x_batch: x_eval_batch, y_batch: y_eval_batch})
print('Evaluation MSE:', mse)
I found one solution myself. I think it's kind of hacky and I hope there is a more elegant solution.
During setup:
valid_loss_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
valid_loss_summary = tf.scalar_summary("valid loss", valid_loss_placeholder)
Or for tensorflow versions after 0.12 (change in name for tf.scalar_summary):
valid_loss_placeholder = tf.placeholder(dtype=tf.float32, shape=[])
valid_loss_summary = tf.summary.scalar("valid loss", valid_loss_placeholder)
Within training loop:
# Compute valid loss in python by doing sess.run() for each batch
# and averaging
valid_loss = ...
summary = sess.run(valid_loss_summary, {valid_loss_placeholder: valid_loss})
summary_writer.add_summary(summary, step)
As of August 2018, streaming metrics have been depreciated. However, unintuitively, all metrics are streaming. So, use tf.metrics.accuracy.
However, if you want accuracy (or another metric) over only a subset of batches, then you can use Exponential Moving Average, as in the answer by #MZHm or reset any of the the tf.metric's by following this very informative blog post
For quite some time I'm only saving the summary once per epoch. I never knew that TensorFlows summary would then only save the summary for the last run batch.
Shocked I looked into this problem. This is the solution I came up with (using the dataset API):
loss = ...
train_op = ...
loss_metric, loss_metric_update = tf.metrics.mean(ae_loss)
tf.summary.scalar('loss', loss_metric)
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(os.path.join(res_dir, 'train'))
test_writer = tf.summary.FileWriter(os.path.join(res_dir, 'test'))
init_local = tf.initializers.local_variables()
init_global = tf.initializers.global_variables()
sess.run(init_global)
def train_run(epoch):
sess.run([dataset.train_init_op, init_local]) # test_init_op is the operation that switches to test data
for i in range(dataset.num_train_batches): # num_test_batches is the number of batches that should be run for the test set
sess.run([train_op, loss_metric_update])
summary, cur_loss = sess.run([merged, loss_metric])
train_writer.add_summary(summary, epoch)
return cur_loss
def test_run(epoch):
sess.run([dataset.test_init_op, init_local]) # test_init_op is the operation that switches to test data
for i in range(dataset.num_test_batches): # num_test_batches is the number of batches that should be run for the test set
sess.run(loss_metric_update)
summary, cur_loss = sess.run([merged, loss_metric])
test_writer.add_summary(summary, epoch)
return cur_loss
for epoch in range(epochs):
train_loss = train_run(epoch+1)
test_loss = test_run(epoch+1)
print("Epoch: {0:3}, loss: (train: {1:10.10f}, test: {2:10.10f})".format(epoch+1, train_loss, test_loss))
For the summary I'm just wrapping the tensor I'm interested in into tf.metrics.mean(). For each batch run I call the metrics update operation. At the end of every epoch the metrics tensor will return the correct mean of all batch results.
Don't forget to initialize local variables every time you switch between training and test data. Otherwise your train and test metrics will be near identical.
I had the same problem when I realized I had to iterate over my validation data when the memory space cramped up and the OOM errors flooding.
As multiple of these answers say, the tf.metrics have this built in, but I'm not using tf.metrics in my project. So inspired by that, I made this:
import tensorflow as tf
import numpy as np
def batch_persistent_mean(tensor):
# Make a variable that keeps track of the sum
accumulator = tf.Variable(initial_value=tf.zeros_like(tensor), dtype=tf.float32)
# Keep count of batches in accumulator (needed to estimate mean)
batch_nums = tf.Variable(initial_value=tf.zeros_like(tensor), dtype=tf.float32)
# Make an operation for accumulating, increasing batch count
accumulate_op = tf.assign_add(accumulator, tensor)
step_batch = tf.assign_add(batch_nums, 1)
update_op = tf.group([step_batch, accumulate_op])
eps = 1e-5
output_tensor = accumulator / (tf.nn.relu(batch_nums - eps) + eps)
# In regards to the tf.nn.relu, it's a hacky zero_guard:
# if batch_nums are zero then return eps, else it'll be batch_nums
# Make an operation to reset
flush_op = tf.group([tf.assign(accumulator, 0), tf.assign(batch_nums, 0)])
return output_tensor, update_op, flush_op
# Make a variable that we want to accumulate
X = tf.Variable(0., dtype=tf.float32)
# Make our persistant mean operations
Xbar, upd, flush = batch_persistent_mean(X)
Now you send Xbar to your summary e.g. tf.scalar_summary("mean_of_x", Xbar), and where you'd do sess.run(X) before, you'll do sess.run(upd). And between epochs you'd do sess.run(flush).
Testing behaviour:
### INSERT ABOVE CODE CHUNK IN S.O. ANSWER HERE ###
sess = tf.InteractiveSession()
with tf.Session() as sess:
sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
# Calculate the mean of 1+2+...+20
for i in range(20):
sess.run(upd, {X: i})
print(sess.run(Xbar), "=", np.mean(np.arange(20)))
for i in range(40):
sess.run(upd, {X: i})
# Now Xbar is the mean of (1+2+...+20+1+2+...+40):
print(sess.run(Xbar), "=", np.mean(np.concatenate([np.arange(20), np.arange(40)])))
# Now flush it
sess.run(flush)
print("flushed. Xbar=", sess.run(Xbar))
for i in range(40):
sess.run(upd, {X: i})
print(sess.run(Xbar), "=", np.mean(np.arange(40)))