In a dynamic_rnn do I need to bucket samples in order to balance batches? - tensorflow

I am implementing a bidirectional dynamic rnn. Now I face the question whether I need to bucket my training samples.
My thought (and fear) is that if I don't bucket I might face the following situation: In a batch with 32 samples and maybe all but one samples being below 500 characters long and one samples being say 10.000 characters long the backprop will behave basically as if I had only a batch size of 1 and might result in NANs quickly or throw off the learned weights pretty badly every time that situation occurs.
Any experiences before I write code and check for days of training and debugging? Thx

Related

CNN: Unstable of model score vs iteration

I got my model score vs iteration graph is unstable. How can I improve it?
This is what I get
Here is my code
Code 1
Code 2
Code 3
Code 4
Code 5
Your network looks fairly stock/copy and pasted. I'm pretty sure I've seen this code before.
Without knowing much about your input data I'm not sure if you're solving a classification problem or not but try first switching it to softmax and negative log likelihood on the output.
The output activation and loss function are mainly for binary classification.
You can also get rid of the ReNormalizeL2PerLayer. That might hinder the network from learning depending on your data.
It's also hard to help without knowing much about your input data but sometimes unit mean zero variance may not be suitable for your data set. Consider switching to a zero to 1 scaling instead.
Lastly, for quick iteration times consider overfitting on a small amount of data first when testing. That will help you see if there's any signal in your data and if your network can learn.

Tensorflow / Keras: Normalize train / test / realtime Data or how to handle reality?

I started developing some LSTM-models and now have some questions about normalization.
Lets pretend I have some time series data that is roughly ranging between +500 and -500. Would it be more realistic to scale the Data from -1 to 1, or is 0 to 1 a better way, I tested it and 0 to 1 seemed to be faster. Is there a wrong way to do it? Or would it just be slower to learn?
Second question: When do I normalize the data? I split the data into training and testdata, do I have to scale / normalize this data seperately? maybe the trainingdata is only ranging between +300 to -200 and the testdata ranges from +600 to -100. Thats not very good I guess.
But on the other hand... If I scale / normalize the entire dataframe and split it after that, the data is fine for training and test, but how do I handle real new incomming data? The model is trained to scaled data, so I have to scale the new data as well, right? But what if the new Data is 1000? the normalization would turn this into something more then 1, because its a bigger number then everything else before.
To make a long story short, when do I normalize data and what happens to completely new data?
I hope I could make it clear what my problem is :D
Thank you very much!
Would like to know how to handle reality as well tbh...
On a serious note though:
1. How to normalize data
Usually, neural networks benefit from data coming from Gaussian Standard distribution (mean 0 and variance 1).
Techniques like Batch Normalization (simplifying), help neural net to have this trait throughout the whole network, so it's usually beneficial.
There are other approaches that you mentioned, to tell reliably what helps for which problem and specified architecture you just have to check and measure.
2. What about test data?
Mean to subtract and variance to divide each instance by (or any other statistic you gather by any normalization scheme mentioned previously) should be gathered from your training dataset. If you take them from test, you perform data leakage (info about test distribution is incorporated into training) and you may get false impression your algorithm performs better than in reality.
So just compute statistics over training dataset and use them on incoming/validation/test data as well.

Setting up training in Tensorflow with overlapping data?

I am trying to train a neural network to forecast using time series data. I'm trying to train a neural network to predict temperature 10 minutes into the future, and lets say I have data points of temperature every 5 minutes and I want to give it 15 minutes worth of data to use in the prediction and the data I have is this.
[1,2,3,4,5,6,7,8,9,10,11,12]
so if I were to train on the data one potential training sample is [1,2,3] as x and [5] as y (as it's 10 minute into the future (two 5 minute steps)).
I want a way to train on all possible inputs, these are as follow.
[1,2,3][5]
[2,3,4][6]
[3,4,5][7]
[4,5,6][8]
[5,6,7][9]
[6,7,8][10]
[7,8,9][11]
[8,9,10][12]
But I don't want to train by first saving each possible example to disk then training from that. This takes up more space than is necessary as the data is duplicated. I would like to do this in some kind of preprocessing of the data.
All the instructions and examples I have found of using the tensorflow input pipeline such as here https://www.tensorflow.org/guide/datasets all use "non overlapping" data, I can't find anything to deal with my scenario.
The problem I'm having is I really have no idea how to set this overlapping data scenario in tensorflow without saving massive amounts of duplicated data to disk. If anyone has any links or guides as to the best way to do this I'd very much appreciate it thank you.
You are probably looking for this transformation: https://www.tensorflow.org/api_docs/python/tf/contrib/data/sliding_window_batch
tf.contrib.data.sliding_window_batch(window_size=3, stride=1)

Inference on several inputs in order to calculate the loss function

I am modeling a perceptual process in tensorflow. In the setup I am interested in, the modeled agent is playing a resource game: it has to choose 1 out of n resouces, by relying only on the label that a classifier gives to the resource. Each resource is an ordered pair of two reals. The classifier only sees the first real, but payoffs depend on the second. There is a function taking first to second.
Anyway, ideally I'd like to train the classifier in the following way:
In each run, the classifier give labels to n resources.
The agent then gets the payoff of the resource corresponding to the highest label in some predetermined ranking (say, A > B > C > D), and randomly in case of draw.
The loss is taken to be the normalized absolute difference between the payoff thus obtained and the maximum payoff in the set of resources. I.e., (Payoff_max - Payoff) / Payoff_max
For this to work, one needs to run inference n times, once for each resource, before calculating the loss. Is there a way to do this in tensorflow? If I am tackling the problem in the wrong way feel free to say so, too.
I don't have much knowledge in ML aspects of this, but from programming point of view, I can see doing it in two ways. One is by copying your model n times. All the copies can share the same variables. The output of all of these copies would go into some function that determines the the highest label. As long as this function is differentiable, variables are shared, and n is not too large, it should work. You would need to feed all n inputs together. Note that, backprop will run through each copy and update your weights n times. This is generally not a problem, but if it is, I heart about some fancy tricks one can do by using partial_run.
Another way is to use tf.while_loop. It is pretty clever - it stores activations from each run of the loop and can do backprop through them. The only tricky part should be to accumulate the inference results before feeding them to your loss. Take a look at TensorArray for this. This question can be helpful: Using TensorArrays in the context of a while_loop to accumulate values

Getting each example exactly once

For monitoring my model's performance on my evaluation dataset, I'm using tf.train.string_input_producer for the filenames queue on .tfr files, then I feed the parsed examples to the tf.train.batch function, that produces batches of a fixed size.
Assume my evaluation dataset contains exactly 761 examples (a prime number). To read all the examples exactly once, I have to have a batch size that divides 761, but there is no such, except 1 that will be too slow and 761 that will not fit in my GPU. Any standard way for reading each example exactly once?
Actually, my dataset size is not 761, but there is no number in the reasonable range of 50-300 that divides it exactly. Also I'm working with many different datasets, and finding a number that approximately divides the number of examples in each dataset can be a hassle.
Note that using the num_epochs parameter to tf.train.string_input_producer does not solve the issue.
Thanks!
You can use reader.read_up_to as in this example. Your last batch will be smaller, so you need to make sure your network doesn't hard-wire batch-size anywhere