Can you save the inferences/predictions on a list and use adam optimizer in each of them after - tensorflow

I am new to Tensorflow and in my current project I can't inmediately calculate the loss after a prediction/inference but rather every 2 or 3 predictions, so I was thinking in saving the tensors of each prediction in a list and running them trough the optimizer after.
I am new to tensorflow and not very familiarized with it so if there is no way to do this, other ways to tackle the problem are welcome.
Thanks in advance for your help !

why you can calculate the loss?
If I understand your question right. your situation is very similar to in-graph distributed model. let each GPU/server compute a batch then gather all inference and loss , compute their average, then update variable.

Related

Is there any way in tensorflow to iterate all the training error during training?

I am trying to design a new loss function to iterate all the training errors for each data in a training batch and calculate the new loss base on the magnitude of different errors.
Do there have any way to achieve it? Because when you design the loss function, the error.shape[0] would be None, so the traditional ways may not be used to iterate the errors.
error=Ypred-Ytrue and all of their shape[0] are None, so I don't know how to iterate the errors now. I need to know the errors during training and compare their magnitude with one specific value to know how many errors are larger than it. And then calculate the loss base on it.
In short, I want to calculate the mean error of the errors larger than 0.5 and the mean error smaller than 0.5 in a batch respectively, and then use their addition as the loss function.
Do there has any way can achieve it?
Larger_Error=Error[Error>0.5] can work.
It is a quite silly problem, but still keep this problem for the starters in deep learning just like me :-)

Binary classification of every time series step based on past and future values

I'm currently facing a Machine Learning problem and I've reached a point where I need some help to proceed.
I have various time series of positional (x, y, z) data tracked by sensors. I've developed some more features. For example, I rasterized the whole 3D space and calculated a cell_x, cell_y and cell_z for every time step. The time series itself have variable lengths.
My goal is to build a model which classifies every time step with the labels 0 or 1 (binary classification based on past and future values). Therefore I have a lot of training time series where the labels are already set.
One thing which could be very problematic is that there are very few 1's labels in the data (for example only 3 of 800 samples are labeled with 1).
It would be great if someone can help me in the right direction because there are too many possible problems:
Wrong hyperparameters
Incorrect model
Too few 1's labels, but I think that's not a big problem because I only need the model to suggests the right time steps. So I would only use the peaks of the output.
Bad or too less training data
Bad features
I appreciate any help and tips.
Your model seems very strange. Why only use 2 units in lstm layer? Also your problem is a binary classification. In this case you should choose only one neuron in your output layer (try to insert one additional dense layer between and lstm layer and try dropout layers between them).
Binary crossentropy does not make much sense with 2 output neurons, if you don't have a multi label problem. But if you're switching to one output neuron it's the right one. You also need sigmoid then as activation function.
As last advice: Try class weights.
http://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html
This can make a huge difference, if you're label are unbalanced.
You can create the model using tensorflow BasicLSTMCell, the shape of your data fits for BasicLSTMCell in TensorFlow you can find Documentation for BasicLSTMCell here and for creating the model this Documentation contain code that will help to build BasicLstmCell model . Hope this will help you, Cheers.

difference between losses/clone_0/softmax_cross_entropy_loss and losses/clone_0/aux_loss/value

What is the difference between losses/clone_0/softmax_cross_entropy_loss and losses/clone_0/aux_loss/value in inception-v4?
Currently, I'm training a large-scale model using tf-slim and inception-v4 network on 4 GPUs(--num_clones=4 ). but these two charts are completely different. after 190K steps with batch-size=128, I get these charts:Losses
as you can see in the image total loss and have a similar trend. but the softmax_cross_entropy have a completely different procedure!
Which one of these losses can describe the training procedure better?
You should use the first, main one. You can read about the auxiliary head and its loss in here Does the Inception Model have two softmax outputs?
This aux_loss is defined here: https://github.com/tensorflow/models/blob/4bd29ac0ba1004d7393b7d029b05257dffd5cbe6/inception/inception/inception_model.py#L135

What is the average log-perplexity in seq2seq modules in tensorflow?

Output of the following tensorflow function should give average log perplexity. I went through the source code. But I don't understand how they calculate that loss.
tf.contrib.legacy_seq2seq.sequence_loss(logits, targets, weights, average_across_timesteps=True, average_across_batch=True, softmax_loss_function=None, name=None)
I went through the tensorflow implementation. Through the perplexity has some broad meaning here in this function perplexity means
two to the power of your total cross entropy loss.
Please refer the first answer of this question.

Unaggregated gradients / gradients per example in tensorflow

Given a simple mini-batch gradient descent problem on mnist in tensorflow (like in this tutorial), how can I retrieve the gradients for each example in the batch individually.
tf.gradients() seems to return gradients averaged over all examples in the batch. Is there a way to retrieve gradients before aggregation?
Edit: A first step towards this answer is figuring out at which point tensorflow averages the gradients over the examples in the batch. I thought this happened in _AggregatedGrads, but that doesn't appear to be the case. Any ideas?
tf.gradients returns the gradient with respect to the loss. This means that if your loss is a sum of per-example losses, then the gradient is also the sum of per-example loss gradients.
The summing up is implicit. For instance if you want to minimize the sum of squared norms of Wx-y errors, the gradient with respect to W is 2(WX-Y)X' where X is the batch of observations and Y is the batch of labels. You never explicitly form "per-example" gradients that you later sum up, so it's not a simple matter of removing some stage in the gradient pipeline.
A simple way to get k per-example loss gradients is to use batches of size 1 and do k passes. Ian Goodfellow wrote up how to get all k gradients in a single pass, for this you would need to specify gradients explicitly and not rely on tf.gradients method
To partly answer my own question after tinkering with this for a while. It appears that it is possible to manipulate gradients per example while still working in batch by doing the following:
Create a copy of tf.gradients() that accepts an extra tensor/placeholder with example-specific factors
Create a copy of _AggregatedGrads() and add a custom aggregation method that uses the example-specific factors
Call your custom tf.gradients function and give your loss as a list of slices:
custagg_gradients(
ys=[cross_entropy[i] for i in xrange(batch_size)],
xs=variables.trainable_variables(),
aggregation_method=CUSTOM,
gradient_factors=gradient_factors
)
But this will probably have the same complexity as doing individual passes per example, and I need to check if the gradients are correct :-).
One way of retrieving gradients before aggregation is to use the grads_ys parameter. A good discussion is found here:
Use of grads_ys parameter in tf.gradients - TensorFlow
EDIT:
I haven't been working with Tensorflow a lot lately, but here is an open issue tracking the best way to compute unaggregated gradients:
https://github.com/tensorflow/tensorflow/issues/675
There is a lot of sample code solutions provided by users (including myself) that you can try based on your needs.