Multilayer LSTM without tf.contrib.rnn.MultiRNNCell - tensorflow

To implement a multilayer LSTM network, I usually use the following code:
def lstm_cell():
return tf.contrib.rnn.LayerNormBasicLSTMCell(model_settings['rnn_size'])
attn_cell = lstm_cell
def attn_cell():
return tf.contrib.rnn.DropoutWrapper(lstm_cell(), output_keep_prob=0.7)
cell = tf.contrib.rnn.MultiRNNCell([attn_cell() for _ in range(num_layers)], state_is_tuple=True)
outputs_, _ = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32)
But, this way, I do not have access to the hidden layer outputs, if I want to manipulate the arrangements of the hidden layer outputs.
Is there any other way to make a multilayer LSTM network without using tf.contrib.rnn.MultiRNNCell?

You can simply stack several LSTM layers, for example via the Sequential module:
model = Sequential()
model.add(layers.LSTM(..., return_sequences = True, input_shape = (...)))
model.add(layers.LSTM(..., return_sequences = True)
...
model.add(layers.LSTM(...))
In this case the return sequences keyword is crucial for the intermediate layers.

Related

How to extract the hidden vector (the output of the ReLU after the third encoder layer) as the image representation

I am implementing an autoencoder using the Fashion Mnsit dataset. The code for the encoder-
class MNISTClassifier(Model):
def __init__(self):
super(MNISTClassifier, self).__init__()
self.encoder = Sequential([
layers.Dense(128, activation = "relu"),
layers.Dense(64, activation = "relu"),
layers.Dense(32, activation = "relu")
])
self.decoder = Sequential([
layers.Dense(64, activation = "relu"),
layers.Dense(128, activation= "relu"),
layers.Dense(784, activation= "relu")
])
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
autoencoder = MNISTClassifier()
now I want to train an SVM classifier on the image representations extracted from the above autoencoder mean
Once the above fully-connected autoencoder is trained, for each image, I want to extract the 32-
dimensional hidden vector (the output of the ReLU after the third encoder layer) as the
image representation and then train a linear SVM classifier on the training images of fashion mnist based on the 32-
dimensional features.
How to extract the output 32-
dimensional hidden vector??
Thanks in Advance!!!!!!!!!!!!
I recommend to use Functional API in order to define multiple outputs of your model because of a more clear code. However, you can do this with Sequential model by getting the output of any layer you want and add to your model's output.
Print your model.summary() and check your layers to find which layer you want to branch. You can access each layer's output by it's index with model.layers[index].output .
Then you can create a multi-output model of the layers you want, like this:
third_layer = model.layers[2]
last_layer = model.layers[-1]
my_model = Model(inputs=model.input, outputs=(third_layer.output, last_layer.output))
Then, you can access the outputs of both of layers you have defined:
third_layer_predict, last_layer_predict = my_model.predict(X_test)

How to merge ReLU after quantization aware training

I have a network which contains Conv2D layers followed by ReLU activations, declared as such:
x = layers.Conv2D(self.hparams['channels_count'], kernel_size=(4,1))(x)
x = layers.ReLU()(x)
And it is ported to TFLite with the following representaiton:
Basic TFLite network without Q-aware training
However, after performing quantization-aware training on the network and porting it again, the ReLU layers are now explicit in the graph:
TFLite network after Q-aware training
This results in them being processed separately on the target instead of during the evaluation of the Conv2D kernel, inducing a 10% performance loss in my overall network.
Declaring the activation with the following implicit syntax does not produce the problem:
x = layers.Conv2D(self.hparams['channels_count'], kernel_size=(4,1), activation='relu')(x)
Basic TFLite network with implicit ReLU activation
TFLite network with implicit ReLU after Q-aware training
However, this restricts the network to basic ReLU activation, whereas I would like to use ReLU6 which cannot be declared in this way.
Is this a TFLite issue? If not, is there a way to prevent the ReLU layer from being split? Or alternatively, is there a way to manually merge the ReLU layers back into the Conv2D layers after the quantization-aware training?
Edit:
QA training code:
def learn_qaware(self):
quantize_model = tfmot.quantization.keras.quantize_model
self.model = quantize_model(self.model)
training_generator = SCDataGenerator(self.training_set)
validate_generator = SCDataGenerator(self.validate_set)
self.model.compile(
optimizer=self.configure_optimizers(qa_learn=True),
loss=self.get_LLP_loss(),
metrics=self.get_metrics(),
run_eagerly=config['eager_mode'],
)
self.model.fit(
training_generator,
epochs = self.hparams['max_epochs'],
batch_size = 1,
shuffle = self.hparams['shuffle_curves'],
validation_data = validate_generator,
callbacks = self.get_callbacks(qa_learn=True),
)
Quantized TFLite model generation code:
def tflite_convert(classifier):
output_file = get_tflite_filename(classifier.model_path)
# Convert the model to the TensorFlow Lite format without quantization
saved_shape = classifier.model.input.shape.as_list()
fixed_shape = saved_shape
fixed_shape[0] = 1
classifier.model.input.set_shape(fixed_shape) # Force batch size to 1 for generation
converter = tf.lite.TFLiteConverter.from_keras_model(classifier.model)
classifier.model.input.set_shape(saved_shape)
# Set the optimization flag.
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Enforce integer only quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# Provide a representative dataset to ensure we quantize correctly.
if config['eager_mode']:
tf.executing_eagerly()
def representative_dataset():
for x in classifier.validate_set.get_all_inputs():
rs = x.reshape(1, x.shape[0], 1, 1).astype(np.float32)
yield([rs])
converter.representative_dataset = representative_dataset
model_tflite = converter.convert()
# Save the model to disk
open(output_file, "wb").write(model_tflite)
return TFLite_model(output_file)
I have found a workaround which works by instantiating a non-trained version of the model, then copying over the weights from the quantization aware trained model before converting to TFLite.
This seems like quite a hack, so I'm still on the lookout for a cleaner solution.
Code for the workaround:
def dequantize(self):
if not hasattr(self, 'fp_model') or not self.fp_model:
self.fp_model = self.get_default_model()
def find_layer_in_model(name, model):
for layer in model.layers:
if layer.name == name:
return layer
return None
def find_weight_group_in_layer(name, layer):
for weight_group in quant_layer.trainable_weights:
if weight_group.name == name:
return weight_group
return None
for layer in self.fp_model.layers:
if 'input' in layer.name or 'quantize_layer' in layer.name:
continue
QUANT_TAG = "quant_"
quant_layer = find_layer_in_model(QUANT_TAG+layer.name,self.model)
if quant_layer is None:
raise RuntimeError('Failed to match layer ' + layer.name)
for i, weight_group in enumerate(layer.trainable_weights):
quant_weight_group = find_weight_group_in_layer(QUANT_TAG+weight_group.name, quant_layer)
if quant_weight_group is None:
quant_weight_group = find_weight_group_in_layer(weight_group.name, quant_layer)
if quant_weight_group is None:
raise RuntimeError('Failed to match weight group ' + weight_group.name)
layer.trainable_weights[i].assign(quant_weight_group)
self.model = self.fp_model
You can pass activation=tf.nn.relu6 to use ReLU6 activation.

Tensorflow reusing of Multi-Layered LSTM Network

I am trying to use same LSTM architecture for different inputs and hence passing the same cells while unfolding the bidirectional LSTM while unfolding different inputs. I am not sure if it's creating two whole different LSTM Networks. It looks like there are two different nodes in my Graph. My code and graph looks something like this:
def get_multirnn_cell(self):
cells = []
for _ in range(config.n_layers):
cell = tf.nn.rnn_cell.LSTMCell(config.n_hidden, initializer=tf.glorot_uniform_initializer())
dropout_cell = tf.nn.rnn_cell.DropoutWrapper(cell=cell,
input_keep_prob=config.keep_prob,
output_keep_prob=config.keep_prob)
cells.append(dropout_cell)
return cells
def add_lstm_op(self):
with tf.variable_scope('lstm'):
cells_fw = self.get_multirnn_cell()
cells_bw = self.get_multirnn_cell()
cell_fw = tf.nn.rnn_cell.MultiRNNCell(cells_fw)
cell_bw = tf.nn.rnn_cell.MultiRNNCell(cells_bw)
(_, _), (state_one_fw, state_one_bw) = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw,
inputs=self.question_one,
sequence_length=self.seql_one,
dtype=tf.float32)
self.state_one = tf.concat([state_one_fw[-1].h, state_one_bw[-1].h], name='state_one', axis=-1)
# self.state_one = tf.concat([state_one_fw, state_one_bw], axis=-1)
# [batch_size, 2*hidden_size]
(_, _), (state_two_fw, state_two_bw) = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw,
inputs=self.question_two,
sequence_length=self.seql_two,
dtype=tf.float32)
self.state_two = tf.concat([state_two_fw[-1].h, state_two_bw[-1].h], name='state_two', axis=-1)
If you want to reuse the multirnn_cell, you could pass a reuse=tf.AUTO_REUSE for the variable_scope.
with tf.variable_scope('lstm', reuse=tf.AUTO_REUSE)
See the doc.

combining DropoutWrapper and ResidualWrapper with variational_recurrent=True

I'm trying to create a MultiRNNCell of LSTM cells wrapped with both DropoutWrapper and ResidualWrapper. For using variational_recurrent=True, we must provide input_size parameter to DropoutWrapper. I'm not able figure out what input_size should be passed to each LSTM layer, since ResidualWrapper also adds skip connections to augment the input at each layer.
I'm using the following utility function to create one LSTM layer:
def create_cell(units, residual_connections, keep_prob, input_size):
lstm_cell = tf.nn.rnn_cell.LSTMCell(units,
activation=tf.nn.tanh,
initializer=tf.truncated_normal_initializer(),
cell_clip=5.)
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell,
dtype=tf.float32,
input_keep_prob=keep_prob,
output_keep_prob=keep_prob,
state_keep_prob=keep_prob,
variational_recurrent=True,
input_size=input_size)
if residual_connections:
lstm_cell = tf.nn.rnn_cell.ResidualWrapper(lstm_cell)
return lstm_cell
And the following code to create the complete cell:
net = tf.layers.dense(inputs,
128,
activation=tf.nn.relu,
kernel_initializer=tf.variance_scaling_initializer())
net = tf.layers.batch_normalization(net, training=training)
cells = [create_cell(64, False, keep_prob, ??)]
for _ in range(5):
cells.append(create_cell(64, True, keep_prob, ??))
multirnn_cell = tf.nn.rnn_cell.MultiRNNCell(cells)
net, rnn_s1 = tf.nn.dynamic_rnn(cell=multirnn_cell, inputs=net, initial_state=rnn_s0, dtype=tf.float32)
What values should be passed to input_size for first and subsequent LSTM layers?

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?