What does recurrent_initializer do? - tensorflow

I am experimenting with recurrent neural network layers in tensorflow & keras and I am having a look at the recurrent_initializer. I wanted to know more about its influence on the layer, so I created a SimpleRnn layer as the follows:
rnn_layer = keras.layers.SimpleRNN(1, return_sequences=True, kernel_initializer = keras.initializers.ones, recurrent_initializer=keras.initializers.zeros, activation="linear")
Running this code, makes the addition in the recurrent net visible:
inp = np.zeros(shape=(1,1,20), dtype=np.float32)
for i in range(20):
inp[0][0][:i] = 5
#inp[0][0][i:] = 0
print(f"i:{i} {rnn_layer(inp)}"'')
output:
i:0 [[[0.]]]
i:1 [[[5.]]]
i:2 [[[10.]]]
i:3 [[[15.]]]
i:4 [[[20.]]]
i:5 [[[25.]]]
i:6 [[[30.]]]
i:7 [[[35.]]]
i:8 [[[40.]]]
i:9 [[[45.]]]
i:10 [[[50.]]]
i:11 [[[55.]]]
i:12 [[[60.]]]
i:13 [[[65.]]]
i:14 [[[70.]]]
i:15 [[[75.]]]
i:16 [[[80.]]]
i:17 [[[85.]]]
i:18 [[[90.]]]
i:19 [[[95.]]]
Now I change the recurrent_initializer to something different, like a glorot_normal distribution:
rnn_layer = keras.layers.SimpleRNN(1, return_sequences=True, kernel_initializer = keras.initializers.ones, recurrent_initializer=keras.initializers.glorot_normal(seed=0), activation="linear")
But I still get the same results. I thought it might depend on some logic, which a Rnn is missing but a LSTM has, so I tried it with an lstm, but still same results. I guess there is something about the recurrent_logic, I still miss. Can someone explain me, what the reccurent_initializers purpose is and how it affects the recurrent layer?
Thanks alot!

Your input to the RNN layer is of shape (1, 1, 20), which mean one Timestep for each batch , the default behavior of RNN is to RESET state between each batch , so you cant see the effect of the recurrent ops(the recurrent_initializers).
You have to change the length of the sequence of your input:
inp = np.ones(shape=(5 ,4,1), dtype=np.float32) # sequence length == 4
rnn_layer1 = tf.keras.layers.LSTM(1,return_state=True, return_sequences=False,
kernel_initializer = tf.keras.initializers.ones,
recurrent_initializer=tf.keras.initializers.zeros, activation="linear")
rnn_layer2 = tf.keras.layers.LSTM(1,return_state=True , return_sequences=False,
kernel_initializer = tf.keras.initializers.ones,
recurrent_initializer=tf.keras.initializers.glorot_normal(seed=0),
activation="linear")
first_sample = inp[0 : 1 , : ,: ] #shape(1,4,1)
print(rnn_layer1(first_sample )
print(rnn_layer2(first_sample )

Related

RNN LSTM network for inputting a sequence of numbers

I'm trying to use LSTM networks to input a simple dataset that has multiple different sequences of numbers that represent musical data. The data is just a bunch of numpy arrays of floating point numbers with each song being one array. The data looks like this:
Song 1: [0.00013487907, 0.0002517006, 0.00021654845, ...]
Song 2: [-0.007279772, -0.011207076, -0.010082608, ...]
Song 3: [-0.00060827745, -0.00082834775, -0.0006534484, ...]
..and so on
I have done this before for MIDI files before, but those require embeddings of the different characters, however this is more continuous data as opposed to discrete data, so I'm not sure what the input model will look like, and how the data can be loaded for this particular task. For example, for the MIDI file project the input had an embedding layer to the model:
batch_size = 16
seq_length = 64
num_epochs = 100
optimizer_ = tf.keras.optimizers.Adam()
model = Sequential()
model.add(Embedding(input_dim = num_unique_chars, output_dim = 512, batch_input_shape = (batch_size, seq_length)))
model.add(LSTM(256, return_sequences = True, stateful = True))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences = True, stateful = True))
model.add(Dropout(0.2))
model.add(LSTM(256, return_sequences = True, stateful = True))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(num_unique_chars)))
model.add(Activation("softmax"))
model.compile(loss = "categorical_crossentropy", optimizer = optimizer_, metrics = ["accuracy"])
I wanna know how to do the same without tokenization/embedding, and feed each song into the model separately, and then further be able to generate samples from it.
I've tried looking for examples of this but everything related to LSTM networks seems to be text-based. Would appreciate any help/guidance with this!
Thanks
If you already have continuous values, you will not need an Embedding-layer. Either you directly pass the data into the LSTMs or you can use a Dense layer in-between. Additionally, you can also add a Masking-layer (depending on your data).
Also you have to adjust the shape of your data to (batch_size, seq_len, 1) as you only have one feature, but the time-series has to be "recognizable".
Here is a minimum working example with a Dense-layer instead the non-functioning Embedding-layer:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import Sequential
batch_size = 16
seq_length = 64
num_epochs = 100
num_unique_chars = 55 # I just picked any number
optimizer_ = tf.keras.optimizers.Adam()
model = Sequential()
model.add(layers.Dense(256, use_bias=False))
model.add(layers.LSTM(256, return_sequences = True, stateful = True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(256, return_sequences = True, stateful = True))
model.add(layers.Dropout(0.2))
model.add(layers.LSTM(256, return_sequences = True, stateful = True))
model.add(layers.Dropout(0.2))
model.add(layers.TimeDistributed(layers.Dense(num_unique_chars)))
model.add(layers.Activation("softmax"))
model.compile(loss = "categorical_crossentropy", optimizer = optimizer_, metrics = ["accuracy"])
test_data = tf.random.normal(shape=(batch_size, seq_length, 1))
test_out = model(test_data)
print(test_out.shape)
Output: (16, 64, 55)
P. S.: With Dense layers the TimeDistributed-layer is optional. The Dense layer will just manipulate the last dimension of its input tensor.
P. P. S.: I think for your limited amount of features, three LSTM-layers with a dimension of 256 might easily result in over-fitting or some other unpleasant effects. So it might be useful to reduce the number of layers and their dimension. (Of course, this does not target your initial question)

Is sigmoid function only applicable after dense() layer?

I am making a network which is similar to SE-Net(https://github.com/titu1994/keras-squeeze-excite-network/blob/master/se.py)
using keras, but quite different with it.
Suppose that I want to make some layer sequence like :
import keras
Input = keras.model.Input((None,None,3))
x1 = keras.layers.Conv2d(filters = 32, kernel_size = (3,3))(Input)
x_gp = keras.layers.GlobalAveragePooling()(x1)
x2 = keras.layers.Conv2d(filters = 32, kernel_size = (1,1))(x_gp)
x3 = keras.layers.Conv2d(filters = 8, kernel_size = (1,1))(x2)
x2_ = keras.layers.Conv2d(filters = 32, kernel_size = (1,1))(x3)
x_se = keras.activation.sigmoid()(x2_)
I want to know that applying x_se like this is programmable. Please tell me if I am doing wrong.
you can for sure experiment sigmoid as an activation for cnn layers too but the reason why sigmoid is not used with cnn layers are:
1. Sigmoid function is monotonic but it's derivative is not therefore there is a possibility that your training can be stuck
2. Sigmoid range:[0,1]
if you are experimenting sigmoid with cnn layers then I would suggest you to use it only for few layers.
You can give swish a try.

Deep Learning model (LSTM) predicts same class label

I am trying to solve the Spoken Digit Recognition task using the LSTM model, where the audio files are converted into spectrograms and fed into an LSTM model after doing Global Average Pooling. Here is the architecture of it
tf.keras.backend.clear_session()
#input layer
input_= Input(shape = (64, 35))
lstm = LSTM(100, activation='tanh', return_sequences= True, kernel_regularizer = l2(0.000001),
recurrent_initializer = 'glorot_uniform')(input_)
lstm = GlobalAveragePooling1D(data_format='channels_first')(lstm)
dense = Dense(20, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer='glorot_uniform')(lstm)
drop = Dropout(0.8)(dense)
dense1 = Dense(25, activation='relu', kernel_regularizer = l2(0.000001), kernel_initializer= 'he_uniform')(drop)
drop = Dropout(0.95)(dense1)
output = Dense(10,activation = 'softmax', kernel_regularizer = l2(0.000001), kernel_initializer= 'glorot_uniform')(drop)
model_2 = Model(inputs = [input_], outputs = output)
model_2.summary()
Having summary as -
I need to calculate the F1 score to check the performance of the model, I have implemented a custom callback and used TensorFlow addons F1 score too. However, I won't get the correct result, for every epoch I get the constant F1 score value.
On further digging, I found out that my model predicts the same class label, for the entire epoch, whereas it is supposed to predict 10 classes in one epoch. as there are 10 class label values present.
Here is my model.compile and model.predict commands. I have used TensorFlow addon here -
from tensorflow import keras
opt = keras.optimizers.Adam(0.001, clipnorm=0.8)
model_2.compile(loss='categorical_crossentropy', optimizer=opt, metrics = metric)
hist = model_2.fit([X_train_spectrogram],
[y_train_converted],
validation_data= ([X_test_spectrogram], [y_test_converted]),
epochs = 10,
verbose =1,
callbacks=[tensorBoard_callbk2, ClearMemory()],
# steps_per_epoch = 3,
batch_size=32)
Here is what I mean by getting the same prediction, the entire array is filled with the same predicted values.
Why is the model predicting the same class label? or How to rectify it?
I have tried increasing the number of trainable parameters, increasing - decreasing batch size too, but it won't help me. If anyone knows can you please help me out?

'Channels first' training accuracy very low compared to 'channels last'

My issue:
I am trying to train a semantic segmentation model in tf.keras, in fact it works very well when I am using channels_last (WHC) mode (it reaches 96%+ val acc). I wanted to train it in channels_first (CHW) mode so the weights are compatible with TensorRT. When I do this, the ~80% training accuracy in the first few epochs dips down to around 0.020% and stays there permanently.
It is useful to know that the base of my model is a tf.keras.applications.MobileNet() model with the pre-trained 'imagenet' weights. (Model architecture at the bottom.)
The transformation process:
I used the guidelines provided and I change only a few things here:
Set tf.keras.backend.set_image_data_format() to 'channels_first'.
I change the channel order in the input tensor from: input_tensor=Input(shape=(376, 672, 3)) to: input_tensor=Input(shape=(3, 376, 672))
In my image preprocessing (using tf.data.Dataset), i use tf.transpose(img, perm=[2, 0, 1]) on both my input image and one-hot encoded mask to change the channel orders. I checked this with equality assertion to make sure its correct and it seems to be fine.
When I change these the training starts fine but as I said the training accuracy goes down to almost zero. When I revert back everything's fine again.
Possible leads:
What am I doing wrong or what could be the problematic part here? My suspicions are around these questions:
Are the pre-trained imageNet weights changed to the 'channels_first' order also when I set the backend? Is this something I should consider at all?
Could it be that the tf.transpose() function messes up the mask's one-hot encoding? (I have 3 classes represented by 3 colors: lane, opposing lane, background)
Maybe I am not seeing something obvious. I can provide further code and answers as needed.
EDIT:
08/17: This is still an ongoing issue, I have tried several things:
I checked if the image and the mask is correct after the transpose with numpy assertion, seems correct.
I suspected that the loss function calculates on the wrong axis, so I customized the loss function for the first axis (where the channels are). Here it is:
def ReverseAxisLoss(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred, from_logits=True, axis=1)
My main suspicion is that the 'channels first' backend setting does nothing to transpose the pretrained 'imagenet' weights for the mobilenet part. Is there an updated way for TF2.x / Keras to transpose the pre-trained weights into CHW format?
Here is the architecture that I use (the skipNet() is the head network and the mobilenet is the base, and it is connected in the create_model() function)
def skipNet(encoder_output, feed1, feed2, classes):
# random initializer and regularizer
stddev = 0.01
init = RandomNormal(stddev=stddev)
weight_decay = 1e-3
reg = l2(weight_decay)
score_feed2 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed2)
score_feed2_bn = BatchNormalization()(score_feed2)
score_feed1 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed1)
score_feed1_bn = BatchNormalization()(score_feed1)
upscore2 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(encoder_output)
height_pad1 = ZeroPadding2D(padding=((1,0),(0,0)))(upscore2)
upscore2_bn = BatchNormalization()(height_pad1)
fuse_feed1 = add([score_feed1_bn, upscore2_bn])
upscore4 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg)(fuse_feed1)
height_pad2 = ZeroPadding2D(padding=((0,1),(0,0)))(upscore4)
upscore4_bn = BatchNormalization()(height_pad2)
fuse_feed2 = add([score_feed2_bn, upscore4_bn])
upscore8 = Conv2DTranspose(kernel_size=(16, 16), filters=classes, strides=(8, 8),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg, activation="softmax")(fuse_feed2)
return upscore8
def create_model(classes):
base_model = tf.keras.applications.MobileNet(input_tensor=Input(shape=IMG_SHAPE),
include_top=False,
weights='imagenet')
conv4_2_output = base_model.get_layer(index=43).output
conv3_2_output = base_model.get_layer(index=30).output
conv_score_output = base_model.output
head_model = skipNet(conv_score_output, conv4_2_output, conv3_2_output, classes)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs=base_model.input, outputs=head_model)
return model

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:
# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10
# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13
# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])
# Define weights
weights = {
'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'out' : tf.Variable(tf.random_normal([n_classes]))
}
However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:
1 input layer,
1 output layer,
2 hidden LSTM layers(with 512 neurons in each),
time step(sequence length): 10
Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.
Thank you so much in advance!
Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.
y = input_tensor
with tf.variable_scope('encoder') as scope:
rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
output = [None] * TIME_STEPS
for t in reversed(range(TIME_STEPS)):
y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
output[t], state = rnn_cell(y_t, state)
scope.reuse_variables()
y = tf.pack(output, 1)
First you need some placeholders to put your training data (one batch)
x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.
The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
Then you can use the built-in Tensorflow API to create the stacked LSTM layer.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)
From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.
Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)
You will have to convert the state to a numpy array before feeding it again.
Perhaps it is better to use a librarly like Tflearn or Keras instead?