Unable to convert Tensorflow from 1.0 to Tensorflow 2.0 - tensorflow

I have tensorflow 1.0 version code and unable to convert tensorflow 2.0 using below syntax.
Could you please help me out ?
A)
lstm_cell =tf.keras.layers.LSTM(units=hidden_unit)
#lstm_cell = tf.compat.v1.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_keep_prob)
Q -1) how to use drop out for the lstm_cell on Tf2.0?
B)
self._initial_state = lstm_cell.zero_state(self.batch_size, tf.float32)
Q-2 ) when I use above syntax,am getting an error "LSTM cell does not have zero_state cell for TF2.0"
How to initialize lSTM cell?
C) how to use tf.keras.layers.RNN cell for TF2.0

Thank #AlexisBRENON !!! ..
Here is my code . Please let me know if I did any mistake .
lstm_cell =tf.keras.layers.LSTM(units=hidden_unit)
lstm_cell = tf.nn.RNNCellDropoutWrapper(lstm_cell, output_keep_prob=self.dropout_keep_prob)
self._initial_state = lstm_cell.get_initial_state(self.batch_size, tf.float32)
inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(pooled_concat,num_or_size_splits=int(reduced),axis=1)]
outputs, state_size =tf.keras.layers.RNN(lstm_cell, inputs, initial_state=self._initial_state, return_sequences=self.real_len)
==>>> Want to Collect the appropriate last words into variable output (dimension = batch x embedding_size)
output = outputs[0]
ERROR:-
self._initial_state = lstm_cell.get_initial_state(self.batch_size, tf.float32)
ValueError: slice index 0 of dimension 0 out of bounds. for 'strided_slice' (op: 'StridedSlice') with input shapes: [0], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.

For the RNN dropout, the DropoutWrapper has been move to tf.nn.RNNCellDropoutWrapper.
I suppose that tf.keras.layers.LSTMCell.get_initial_state is the new name of zero_state.
You should be more precise on what you want to do with RNNs. tf.keras.layers.RNN is a base class for recurrent layers and should not be used as is. Instead, you should use some sub-classes like SimpleRNN, GRU or LSTM, or make your own sub-class. Take a look at the tutorial on recurrent neural network.

Related

Different outputs with LSTM in PyTorch vs Tensorflow

I am trying to convert a Tensorflow(1.15) model to PyTorch model. Since I was getting very different loss values, I tried comparing the output of the LSTM in the forward pass for the same input. The declaration and initialization of the LSTM is given below:
Tensorflow Code
rnn_cell_video_fw = tf.contrib.rnn.LSTMCell(
num_units=self.options['rnn_size'],
state_is_tuple=True,
initializer=tf.orthogonal_initializer()
)
rnn_cell_video_fw = tf.contrib.rnn.DropoutWrapper(
rnn_cell_video_fw,
input_keep_prob=1.0 - rnn_drop,
output_keep_prob=1.0 - rnn_drop
)
sequence_length = tf.expand_dims(tf.shape(video_feat_fw)[1], axis=0)
initial_state = rnn_cell_video_fw.zero_state(batch_size=batch_size, dtype=tf.float32)
rnn_outputs_fw, _ = tf.nn.dynamic_rnn(
cell=rnn_cell_video_fw,
inputs=video_feat_fw,
sequence_length=sequence_length,
initial_state=initial_state,
dtype=tf.float32
)
PyTorch code
self.rnn_video_fw = nn.LSTM(self.options['video_feat_dim'], self.options['rnn_size'], dropout = self.options['rnn_drop'])
rnn_outputs_fw, _ = self.rnn_video_fw(video_feat_fw)
Initialization for LSTM in train.py
def init_weight(m):
if type(m) in [nn.LSTM]:
for param in m.parameters():
nn.init.orthogonal_(m.weight_hh_l0)
nn.init.orthogonal_(m.weight_ih_l0)
The output for tensorflow
The output for pytorch
The same is pretty much the case for every data item and my PyTorch model isn't converging. Is my suspicion of difference in output LSTM being the reason for it correct? If so, where am I going wrong?
Link to the paper
Link to TF code
let me know if anything else is required.

How to customize a RNN cell

I would like to implement a custom LSTM or GRU cell in TensorFlow (Python 3). For example, I want to scale the cell state signal from the cell at time step T before entering the cell at time step T+1. I've tried searching in TensorFlow documentation without a success.
Could you give me a hint?
Thank you.
EDITHaving checked the answer given by #vijay m, I create my model as follows:
def dynamic_scale_RNN(x, timescale, seqlen, weights, biases, keep_prop):
batch_size = tf.shape(x)[0]
# Unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.unstack(x, max_seq_len, 1)
timescale_unstack = tf.unstack(timescale, max_seq_len, 1)
gru_cell = tf.contrib.rnn.GRUCell(n_hidden)
#init_state has to be set to zero
init_state = gru_cell.zero_state(batch_size, dtype=tf.float32)
outputs = []
# Create a loop of N LSTM cells, N = time_steps.
for i in range(len(x)):
output, state= tf.nn.static_rnn(gru_cell, [x[i]], dtype=tf.float32, initial_state= init_state)
# copy the init_state with the new state
mask = tf.tile(tf.expand_dims(timescale_unstack[i],axis=1),[1,state[0].get_shape()[-1]])
init_state = tf.multiply(state,mask)
# init_state = state
outputs.append(output)
# Transform the output to [batch_size, time_steps, vector_size]
outputs = tf.transpose(tf.squeeze(tf.stack(outputs)), [1, 0, 2])
In the code above, timescale is a tensor of shape [batch_size, sequence_length, 1] and I want to scale the cell state using this tensor. Even though the code can run, it returns nan for cost function.
If I uncomment the line init_state = state, it works, but it won't scale the cell state.
My question, for now, is that: Why I get nan values for cost function?
I leave my answer here in case it can help someone.
The reason for 'nan' cost value is that the init_state is set too high. While I don't know the appropriate range for this value, I can observe that if I scale it by a small factor like 0.1, I don't see 'nan' anymore.

Tensorflow 1.0 LSTM Cell in dynamic_rnn throws dimension error

I am trying to implement an LSTM Model as a model_fn input to an Estimator. My X is only a .txt with a time series of prices. Before going into my first hidden layer, I try to define the lstm cell as:
def lstm_cell():
return tf.contrib.rnn.BasicLSTMCell(
size, forget_bias=0.0, state_is_tuple=True)
attn_cell = lstm_cell
if is_training and keep_prob < 1:
def attn_cell():
return tf.contrib.rnn.DropoutWrapper(
lstm_cell(), output_keep_prob=keep_prob)
cell = tf.contrib.rnn.MultiRNNCell([attn_cell() for _ in range(num_layers)], state_is_tuple=True)
initial_state = cell.zero_state(batch_size, data_type())
inputs = tf.unstack(X, num=num_steps, axis=0)
outputs = []
outputs, state = tf.nn.dynamic_rnn(cell, inputs,
initial_state=initial_state)
This then is supposed to go into:
first_hidden_layer = tf.contrib.layers.relu(outputs, 1000)
Unfortunately, it throws an error idicating that "ValueError: Dimension must be 1 but is 3 for 'transpose' (op: 'Transpose') with input shapes: [1], [3]."
I gather that my problem is the "inputs" tensor. In its description, the inputs variable is supposed to be a tensor with form [batch_size,max_time,...], but Ihave no idea how to translate this into above structure since, through the estimator, only input values X and target values y are fed to the system. So my question would be how to create a tensor that can serve as an inputs variable to the dynamic_rnn class.
Thanks a lot.
I believe you don't need the line:
inputs = tf.unstack(X, num=num_steps, axis=0)
you can supply X directly to dynamic_rnn since dynamic_rnn doesn't take a list of tensors; It takes one tensor where the time axis is dimension 0 (if time_major == True) or dimension 1 (if time_major == False).
Actually, it seems that X has 2 dimensions only, since inputs is list of 1 dimensional tensors (as indicated by the error message). so you should replace the unstack line with:
inputs = tf.expand_dims(X, axis=2)
This will add a 3rd dimension of size 1 that is needed by dynamic_rnn

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?

RNN & Batches in Tensorflow

The batche approach for RNN in Tensorflow is not clear to me. For example tf.nn.rnn Take as input list of Tensors [BATCH_SIZE x INPUT_SIZE]. We normally are feeding to session batches of data, so why it take list of batches not single batch?
This leads to next confusion for me:
data = []
for _ in range(0, len(train_input)):
data.append(tf.placeholder(tf.float32, [CONST_BATCH_SIZE, CONST_INPUT_SIZE]))
lstm = tf.nn.rnn_cell.BasicLSTMCell(CONST_NUM_OF_HIDDEN_STATES)
val, state = tf.nn.rnn(lstm, data, dtype=tf.float32)
I pass list of Tensors [CONST_BATCH_SIZE x CONST_INPUT_OTPUT_SIZE] to tf.nn.rnn and got output value that is list of Tensors [CONST_BATCH_SIZE x CONST_NUM_OF_HIDDEN_STATES]. Now I want to use softmax for all HIDDEN_STATES outputs and need to calculate weights with matmaul + bias
Should I use for matmul:
weight = tf.Variable(tf.zeros([CONST_NUM_OF_HIDDEN_STATES, CONST_OTPUT_SIZE]))
for i in val:
mult = tf.matmul(i, weight)
bias = tf.Variable(tf.zeros([CONST_OTPUT_SIZE]))
prediction = tf.nn.softmax(mult + bias)
Or should I create 2D array from val and then use tf.matmul without for?
This should work. output is batched data from RNN. For all the batch input probs will have the probability.
logits = tf.matmul(output, softmax_w) + softmax_b
probs = tf.nn.softmax(logits)