Keras variable input - tensorflow

Im working through a Keras example at https://www.tensorflow.org/tutorials/text/text_generation
The model is built here:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
During training, they always pass in a length 100 array of ints.
But during prediction, they are able to pass in any length of input and the output is the same length as the input. I was always under the impression that the lengths of the time steps had to be the same. Is that not the case and the # of time steps of the RNN somehow can change?

RNNs are sequence models, ie. they take in a sequence of input and give out a sequence of outputs. The sequence length is also called the time steps is number of time the RNN cell is unwrapped and for each unwrapping an input is passed and RNN cell using its gates gives out an output (per each unwrapping). So in theory you can have as long sequence as you want. Now lets assume you have different inputs of different size, since you cannot have variable size inputs in a single batches you have to collect the inputs of same size an make a batch if you want to train using batches. You can as well use batch size of 1 and not worry about all this, but training become painfully slow.
In ptractical situations, while training we divide input into same sizes so that training become fast. There are situations like language translation models where this is not feasible.
So in theory RNNs does not have any limitation on the sequence length, however large sequence will start to loose the context at the begging as the sequence length increases.
While predictions you can use any sequence length you want to.
In you case your output size is same as input size because of return_sequences=True. You can as well have single output by using return_sequences=False where in only the output of last unwrapping is returned by keras.

Length of training sequences should not be equal to predicted length.
RNN deals with two vectors: new word and hidden state (accumulated from the previous words). It doesn't keep length of sequence.
But to get good prediction of long sequences - you have to train RNN with long sequences - because RNN should learn a long context.

Related

Encoder Decoder for time series forecasting

I want to predict for 7 days from training size of 55 days. I tried to apply models given here and here, but I am getting output value for all 7 days as 1.
I am also confused about how to give time series as input to encoder decoder and it's code, I tried based on my understanding.
model.add(LSTM(150, input_shape=(None, 1)))
model.add(RepeatVector(8))
model.add(LSTM(150, return_sequences=True))
model.add(TimeDistributed(Dense(1, activation='softmax')))
model.compile(loss='mse', optimizer='adam')
for i in range(7):
x=df[i*7:(i+1)*7]
y=df[(i+1)*7:(i+2)*7]
x=np.array(x)
x=np.insert(x,0,len(x))
x=x.reshape(1,len(x),1)
y=np.array(y)
y=np.insert(y,0,len(y))
y=y.reshape(1,len(y),1)
model.fit(x, y, epochs=1, verbose=2)
after training I am predicting from entire train sequence for 7 days.
second I tried from link 2
#functions define_models and predict_sequence same as link
for i in range(0,47):
x1=df[i:i+7]
print(len(x1))
x2=df[i+1:i+8]
print(len(x2))
y=df[i+1:i+8]
x1=np.array(x1)
x1=np.insert(x1,0,len(x1))
print(len(x1))
x1=x1.reshape(len(x1),1,1)
x2=np.array(x2)
x2=np.insert(x2,0,0)
print(len(x2))
x2=x2.reshape(len(x2),1,1)
y=np.array(y)
y=np.insert(y,0,len(y))
y=y.reshape(len(y),1,1)
model.fit([x1,x2],y,epochs=1)
this is also giving output as 1.
I dont know exactly what x2 should be here.
Please correct me where I am wrong.
The first problem is that to train a deep network you should do the following steps:
Create a clear dataset. By a "clear dataset" I mean an instance of tf.Dataset object. To create an instance of tf.Dataset you should first organize your dataset in a NumPy array with shape (Maximum sequence length, Batch size, Size of each record). In your case, the size of the X array which contains the training data should be (7, 1, 1), and the Y array which contains the labels of the training data should be (7,1).
After organizing the data according to the explained format, you can create an instance of tf.Dataset using the function tf.Dataset.from_tensor_slices()
You should use the model.fit() function using the created tf.Dataset instance and specifying a suitable number of epochs which is more than 1. The parameter specifies the number of times the network should iterate on the dataset to be trained. The value of this parameter is somehow arbitrary, but, you should try different values to reach the best one fitting your problem.
Note that using this process you do not need to make a for-loop anymore. The loop will be executed inside of the model.fit function.
For more information about how to implement and train an encoder-decoder model in TensorFlow take a look at the official sample for neural machine translation.

What dimension is the LSTM model considers the data sequence?

I know that an LSTM layer expects a 3 dimension input (samples, timesteps, features). But which of it dimension the data is considered as a sequence.
Reading some sites I understood that is the timestep, so I tried to create a simple problem to test.
In this problem, the LSTM model needs to sum the values in timesteps dimension. Then, assuming that the model will consider the previous values of the timestep, it should return as an output the sum of the values.
I tried to fit with 4 samples and the result was not good. Does my reasoning make sense?
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, LSTM
X = np.array([
[5.,0.,-4.,3.,2.],
[2.,-12.,1.,0.,0.],
[0.,0.,13.,0.,-13.],
[87.,-40.,2.,1.,0.]
])
X = X.reshape(4, 5, 1)
y = np.array([[6.],[-9.],[0.],[50.]])
model = Sequential()
model.add(LSTM(5, input_shape=(5, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=1000, batch_size=4, verbose=0)
print(model.predict(np.array([[[0.],[0.],[0.],[0.],[0.]]])))
print(model.predict(np.array([[[10.],[-10.],[10.],[-10.],[0.]]])))
print(model.predict(np.array([[[10.],[20.],[30.],[40.],[50.]]])))
output:
[[-2.2417212]]
[[7.384143]]
[[0.17088854]]
First of all, yes you're right that timestep is the dimension take as data sequence.
Next, I think there is some confusion about what you mean by this line
"assuming that the model will consider the previous values of the
timestep"
In any case, LSTM doesn't take previous values of time step, but rather, it takes the output activation function of the last time step.
Also, the reason that your output is wrong is because you're using a very small dataset to train the model. Recall that, no matter what algorithm you use in machine learning, it'll need many data points. In your case, 4 data points are not enough to train the model. I used slightly more number of parameters and here's the sample results.
However, remember that there is a small problem here. I initialised the training data between 0 and 50. So if you make predictions on any number outside of this range, this won't be accurate anymore. Farther the number from this range, lesser the accuracy. This is because, it has become more of a function mapping problem than addition. By function mapping, I mean that your model will learn to map all values that are in training set(provided it's trained on enough number of epochs) to outputs. You can learn more about it here.

Strange sequence classification performance after shuffling sequence elements

I have one million sequences I'm trying to classify as either 0 or 1. The outcome is fairly well balanced (class 0:70%, class 1:30%). Maximum sequence length is 50, and I've post-padded by sequences with zeroes. There are 100 unique sequence symbols. Embedding length is 30. It's an LSTM NN trained on two outputs (one is the main output node, and the other is right after the LSTM). The code is below.
As a sanity check, I ran three versions of this: One in which I randomize the outcome labels (I expect terrible performance), another one where the labels are correct but I randomize the sequence of events in each sequence but the outcome labels are correct (I also expected bad performance), and finally one where everything is left unshuffled (I expected good performance).
Instead I found the following:
Shuffled labels: Accuracy = 69.5% (Model predicts every sequence is class 0)
Shuffled sequence symbols: Accuracy = 88%!
Nothing is shuffled: Accuracy = 90%
What do you make of this? All I can think of is that there is little signal to be gained from analyzing the sequences, and maybe most of the signal is from the presence or lack of presence of symbols in the sequence. Maybe RNNs and LSTMs are overkill here?
# Input 1: event type sequences
# Take the event integer sequences, run them through an embedding layer to get float vectors, then run through LSTM
main_input = Input(shape =(max_seq_length,), dtype = 'int32', name = 'main_input')
x = Embedding(output_dim = embedding_length, input_dim = num_unique_event_symbols, input_length = max_seq_length, mask_zero=True)(main_input)
lstm_out = LSTM(32)(x)
# Auxiliary loss here from first input
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
# An abitrary number of dense, hidden layers here
x = Dense(64, activation='relu')(lstm_out)
# The main output node
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
## Compile and fit the model
model = Model(inputs=[main_input], outputs=[main_output, auxiliary_output])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'], loss_weights=[1., 0.2])
print(model.summary())
np.random.seed(21)
model.fit([train_X1], [train_Y, train_Y], epochs=1, batch_size=200)
Assuming you've played around with the size of the LSTM, your conclusion seems reasonable. Beyond that, it's hard to say as it depends what the dataset is. For example, it could be that shorter sequences are more unpredictable, and if most of your sequences are short, then this would support the conclusion as well.
It's worth it to also try truncating your sequences in length, to say the first 25 entries.

Tensorflow dynamic_rnn parameters meaning

I'm struggling to understand the cryptic RNN docs. Any help with the following will be greatly appreciated.
tf.nn.dynamic_rnn(cell, inputs, sequence_length=None, initial_state=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None)
I'm struggling to understand how these parameters relate to the mathematical LSTM equations and RNN definition. Where is the cell unroll size? Is it defined by the 'max_time' dimension of the inputs? Is the batch_size only a convenience for splitting long data or it's related to minibatch SGD? Is the output state passed across batches?
tf.nn.dynamic_rnn takes in a batch (with the minibatch meaning) of unrelated sequences.
cell is the actual cell that you want to use (LSTM, GRU,...)
inputs has a shape of batch_size x max_time x input_size in which max_time is the number of steps in the longest sequence (but all sequences could be of the same length)
sequence_length is a vector of size batch_size in which each element gives the length of each sequence in the batch (leave it as default if all your sequences are of the same size. This parameter is the one that defines the cell unroll size.
Hidden state handling
The usual way of handling hidden state is to define an initial state tensor before the dynamic_rnn, like this for instance :
hidden_state_in = cell.zero_state(batch_size, tf.float32)
output, hidden_state_out = tf.nn.dynamic_rnn(cell,
inputs,
initial_state=hidden_state_in,
...)
In the above snippet, both hidden_state_in and hidden_state_out have the same shape [batch_size, ...] (the actual shape depends on the type of cell you use but the important thing is that the first dimension is the batch size).
This way, dynamic_rnn has an initial hidden state for each sequence. It will pass on the hidden state from time step to time step for each sequence in the inputs parameter on its own, and hidden_state_out will contain the final output state for each sequence in the batch. No hidden state is passed between sequences of the same batch, but only between time steps of the same sequence.
When do I need to feed back the hidden state manually?
Usually, when you're training, every batch is unrelated so you don't have to feed back the hidden state when doing a session.run(output).
However, if you're testing, and you need the output at each time step, (i.e. you have to do a session.run() at every time step) you'll want to evaluate and feed back the output hidden state using something like this :
output, hidden_state = sess.run([output, hidden_state_out],
feed_dict={hidden_state_in:hidden_state})
otherwise tensorflow will just use the default cell.zero_state(batch_size, tf.float32) at each time step which equates to reinitialising the hidden state at each time step.

Tensorflow unrolled LSTM longer than input sequence

I want to create an LSTM in tensorflow to predict time-series data. My training data is a set of input/output sequences of different lengths. Can I include multiple sequences of different lengths in the same training batch? Or do I need to pad them to equal lengths? If so, how?
Also: What will tensorflow do if the unrolled RNN is longer than the input sequence? The rnn() method contains an optional sequence_length argument which appears designed to handle this eventuality, but I'm not clear what it does.
Do you want to build the model from scratch? Otherwise you might want to look into the translate.py-model. Here your issue is taken care of by:
- padding the input (and output) sequences with a PAD-symbol (basically a neutral "no info"-symbol)
- buckets: For different groups of lengths you can create different buckets (makes sense only if your sequence-lengths are very different shortest to longest
You DONT have to batch inputs/output sequence of same length into a batch. TF has a way to specify the input size. The parameter "sequence_length", controls the number of time steps a cell is unrolled. So the TF will unroll your cell only up to sequence_length but not to the step size.
So while feeding the inputs and outputs also feed a sequence_length array which contain the length of each input
tf.nn.bidirectional_rnn(fwd_stacked_lstm_cells, bwd_stacked_lstm_cells,
reshaped_inputs,
sequence_length=sequence_length)
.....
feed_dict={
model.inputs: x,
model.targets: y,
model.sequence_length: lengths})
where
len(lengths) == batch_size and
for all i, lengths[i] == length of input x[i] (same as length of outpu y[i])