How to implement a skip-connection structure between LSTM layers - tensorflow

I learnt ResNet's skip connection recently, and I found this structure of network can improve a lot in during training, and it also applies in convolutional networks such as U-net. However, I don't know how i can do to implement a similar structure with LSTM autoencoder network. it looks like I got trapped by some dimensional problems...
I'm using keras' method to implement, but I kept getting errors.
So here is the network code:
# lstm autoencoder recreate sequence
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
from keras.utils import plot_model
# from keras import regularizers
from keras.regularizers import l1
from keras.optimizers import Adam
import keras.backend as K
model = Sequential()
model.add(LSTM(512, activation='selu', input_shape=(n_in,1),return_sequences=True))
model.add(LSTM(256, activation='selu',return_sequences=True))
model.add(LSTM(20, activation='selu'))
model.add(RepeatVector(n_in))
model.add(LSTM(20, activation='selu',return_sequences=True))
model.add(LSTM(256, activation='selu',return_sequences=True))
model.add(LSTM(512, activation='selu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
# model.add
plot_model(model=model, show_shapes=True)
Just like skip connection diagram in resnet or unet, I'm trying to modify the network like this:
The output of a encoder lstm layer also combines(concat, or add?) the former layer output as the input of a decoder lstm layer. As the pic shows, the coresponding layers are symmetry. Is such idea of connection possible? But I'm new to keras API and skip-connection structure, I don't know how I can implement it.

First you need to start using the functional API instead of the Sequential.
The functional API allows you to build arbitrary input and output connections in each layer, instead of stacked networks.
Learn more about the functional API in:
https://keras.io/guides/functional_api/
About building skip connections from LSTM layers, it is as easy as building skip for any kind of layer. I will show you a sample code:
input = Input(shape=input_shape)
a = LSTM(32, return_sequences=True)(input)
x = LSTM(64, return_sequences=True)(a) # main1
a = LSTM(64, return_sequences=True)(a) # skip1
x = LSTM(64, return_sequences=True)(x) # main1
x = LSTM(64, return_sequences=True)(x) # main1
b = Add()([a,x]) # main1 + skip1
x = LSTM(128, return_sequences=True)(b) # main2
b = LSTM(128, return_sequences=True)(b) # skip2
x = LSTM(128, return_sequences=True)(x) # main2
x = LSTM(128, return_sequences=True)(x) # main2
c = Add()([b,x]) # main2 + skip2
x = LSTM(256, return_sequences=False)(c)
x = Dense(512, activation='relu')(x)
x = Dense(128, activation='relu')(x)
x = Dense(2, activation='softmax')(x)
model = Model(input, x)
This code will produce the following network:
As you can see, the Add layer receive as arguments the previous layer plus the layer before the block (a in the first block).
As Add require all arguments having the same shape, you must add an extra LSTM in the skip side equalizing the shape of the start and the end of the blocks (same concept as the original ResNet).
Of course you should mess with this network, adding different kinds of layers, Dropout, regularizers, Activation, or whatever you choose to work for your case. This is only a stump network to show the skip connections with LSTM.
The rest is pretty much the same as any other networks you have already trained.

Related

Masking zeroes in LSTM

My data set is satellite observation which includes a lot of zeroes so that highly effect my final simulation results.
I have two sets of input data, dynamic ones (X_dynamic_LSTM.shape (95931, 1, 5)) which change through time series and static ones (X_static_MLP.shape (95931, 10)) which is not change. For dynamic ones I used LSTM and for static ones the MLP. I Concatenate the two and get the final results by another MLP.
Can you suggest how should I ignore these zero variables in my prediction dataframe??? I know about Masking and Embedding but don't know how to add them in my code!
from tensorflow.keras.layers import Input, LSTM, Dense, Concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Masking
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Embedding
lstm_input = Input(shape=(X_dynamic_LSTM.shape[1], X_dynamic_LSTM.shape[2]))
x = Masking(mask_value=0.)(lstm_input)
x = LSTM(70, activation='tanh', return_sequences=True)(x)
x = Dropout(0.3)(x)
x = LSTM(35)(x)
x = Dropout(0.3)(x)
x = Dense(1, activation='tanh')(x)
#mlp input with additonal 3 variables at t=t
mlp_input=Input(shape=(X_static_MLP.shape[1]))
mlp = Dense(30, activation='relu')(mlp_input)
mlp = Dense(20, activation='relu')(mlp)
merge = Concatenate()([x, mlp])
hidden1 = Dense(5, activation='relu')(merge)
mlp_out = Dense(1, activation='relu')(hidden1)
model = Model(inputs=[lstm_input, mlp_input],outputs=mlp_out)
#compile the model
model.compile(loss='mae', optimizer='adam')
#fit the model
model.fit([X_dynamic_LSTM, X_static_MLP], y_train, batch_size=40,
epochs=10, validation_split=0.2)
use embedding layer in your first layer
you can use this link
>>> model = tf.keras.Sequential()
>>> model.add(tf.keras.layers.Embedding())

How to substract the loss of one branch of a Sequential model, with another branch?

Say I have a model called other_model, a pre-train model:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Concatenate
features = Concatenate()([other_model.layers[12].output, other_model.layers[11].output])
X = Dense(90, activation='relu')(features)
X = Dense(50, activation='relu')(X)
X = Dense(40, activation='relu')(X)
discriminator = Dense(n_coasses, activation='softmax', name='discriminator')(X)
discriminator_full = Model(inputs=other_model.input, outputs=[discriminator]+v134.other_model)
I first freeze other_model and train discriminator. Then unfreeze other_model and train them both.
What I struggle with is defining other_model loss as other_model_loss = other_model_loss - discriminator_loss. Is it possible to do it with this API?

trying to add a layer for transfer learning, getting ValueError: A merge layer should be called on a list of inputs [duplicate]

I am trying to do a transfer learning; for that purpose I want to remove the last two layers of the neural network and add another two layers. This is an example code which also output the same error.
from keras.models import Sequential
from keras.layers import Input,Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.core import Dropout, Activation
from keras.layers.pooling import GlobalAveragePooling2D
from keras.models import Model
in_img = Input(shape=(3, 32, 32))
x = Convolution2D(12, 3, 3, subsample=(2, 2), border_mode='valid', name='conv1')(in_img)
x = Activation('relu', name='relu_conv1')(x)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool1')(x)
x = Convolution2D(3, 1, 1, border_mode='valid', name='conv2')(x)
x = Activation('relu', name='relu_conv2')(x)
x = GlobalAveragePooling2D()(x)
o = Activation('softmax', name='loss')(x)
model = Model(input=in_img, output=[o])
model.compile(loss="categorical_crossentropy", optimizer="adam")
#model.load_weights('model_weights.h5', by_name=True)
model.summary()
model.layers.pop()
model.layers.pop()
model.summary()
model.add(MaxPooling2D())
model.add(Activation('sigmoid', name='loss'))
I removed the layer using pop() but when I tried to add its outputting this error
AttributeError: 'Model' object has no attribute 'add'
I know the most probable reason for the error is improper use of model.add(). what other syntax should I use?
EDIT:
I tried to remove/add layers in keras but its not allowing it to be added after loading external weights.
from keras.models import Sequential
from keras.layers import Input,Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.core import Dropout, Activation
from keras.layers.pooling import GlobalAveragePooling2D
from keras.models import Model
in_img = Input(shape=(3, 32, 32))
def gen_model():
in_img = Input(shape=(3, 32, 32))
x = Convolution2D(12, 3, 3, subsample=(2, 2), border_mode='valid', name='conv1')(in_img)
x = Activation('relu', name='relu_conv1')(x)
x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool1')(x)
x = Convolution2D(3, 1, 1, border_mode='valid', name='conv2')(x)
x = Activation('relu', name='relu_conv2')(x)
x = GlobalAveragePooling2D()(x)
o = Activation('softmax', name='loss')(x)
model = Model(input=in_img, output=[o])
return model
#parent model
model=gen_model()
model.compile(loss="categorical_crossentropy", optimizer="adam")
model.summary()
#saving model weights
model.save('model_weights.h5')
#loading weights to second model
model2=gen_model()
model2.compile(loss="categorical_crossentropy", optimizer="adam")
model2.load_weights('model_weights.h5', by_name=True)
model2.layers.pop()
model2.layers.pop()
model2.summary()
#editing layers in the second model and saving as third model
x = MaxPooling2D()(model2.layers[-1].output)
o = Activation('sigmoid', name='loss')(x)
model3 = Model(input=in_img, output=[o])
its showing this error
RuntimeError: Graph disconnected: cannot obtain value for tensor input_4 at layer "input_4". The following previous layers were accessed without issue: []
You can take the output of the last model and create a new model. The lower layers remains the same.
model.summary()
model.layers.pop()
model.layers.pop()
model.summary()
x = MaxPooling2D()(model.layers[-1].output)
o = Activation('sigmoid', name='loss')(x)
model2 = Model(inputs=in_img, outputs=[o])
model2.summary()
Check How to use models from keras.applications for transfer learnig?
Update on Edit:
The new error is because you are trying to create the new model on global in_img which is actually not used in the previous model creation.. there you are actually defining a local in_img. So the global in_img is obviously not connected to the upper layers in the symbolic graph. And it has nothing to do with loading weights.
To better resolve this problem you should instead use model.input to reference to the input.
model3 = Model(input=model2.input, output=[o])
Another way to do it
from keras.models import Model
layer_name = 'relu_conv2'
model2= Model(inputs=model1.input, outputs=model1.get_layer(layer_name).output)
As of Keras 2.3.1 and TensorFlow 2.0, model.layers.pop() is not working as intended (see issue here). They suggested two options to do this.
One option is to recreate the model and copy the layers. For instance, if you want to remove the last layer and add another one, you can do:
model = Sequential()
for layer in source_model.layers[:-1]: # go through until last layer
model.add(layer)
model.add(Dense(3, activation='softmax'))
model.summary()
model.compile(optimizer='adam', loss='categorical_crossentropy')
Another option is to use the functional model:
predictions = Dense(3, activation='softmax')(source_model.layers[-2].output)
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.layers[-1].output means the last layer's output which is the final output, so in your code, you actually didn't remove any layers, you added another head/path.
An alternative to Wesam Na's answer, if you don't know the layer names you can simply cut off the last layer via:
from keras.models import Model
model2= Model(inputs=model1.input, outputs=model1.layers[-2].output)

How to clear the entire network structure in TensorFlow

I would like to clear the memory / network after every time I am done with the training. I used the alternatives proposed online, but it seems like they are not working if I am correctly interpreting my results. I use tf.compat.v1.reset_default_graph() and tf.keras.backend.clear_session() since they are mostly recommended online.
import numpy as np
import random
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.keras import backend as K
upper_limit = 2
lower_limit = -2
training_input= np.random.random ([100,5])*(upper_limit - lower_limit) + lower_limit
training_output = np.random.random ([100,1]) *10*(upper_limit - lower_limit) + lower_limit
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(5,)),
tf.keras.layers.Dense(12, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss="mse",optimizer = tf.keras.optimizers.Adam(learning_rate=0.01))
for layer in model.layers:
print("layer weights before fitting: ",layer.get_weights(),"\n") # weights
model.fit(training_input, training_output, epochs=5, batch_size=100,verbose=0)
for layer in model.layers:
print("layer weights after fitting: ",layer.get_weights(),"\n") # weights
print("\n")
tf.compat.v1.reset_default_graph()
tf.keras.backend.clear_session()
print("after clear","\n")
for layer in model.layers:
print(layer.get_weights(),"\n") # weights
When I print the layer weights after attempting to clear the network, I get the same weight values as before cleaning the session.
I think what are you looking is reset the weights of you model, and that is not really related to the session or the graph (with some exceptions).
The reset of the weights is currently a debated topic you can find how to do it in most of the cases here but as you can see, today nobody is planning to implement this function
for easy access I post the current proposition below
def reset_weights(model):
for layer in model.layers:
if isinstance(layer, tf.keras.Model): #if you're using a model as a layer
reset_weights(layer) #apply function recursively
continue
#where are the initializers?
if hasattr(layer, 'cell'):
init_container = layer.cell
else:
init_container = layer
for key, initializer in init_container.__dict__.items():
if "initializer" not in key: #is this item an initializer?
continue #if no, skip it
# find the corresponding variable, like the kernel or the bias
if key == 'recurrent_initializer': #special case check
var = getattr(init_container, 'recurrent_kernel')
else:
var = getattr(init_container, key.replace("_initializer", ""))
var.assign(initializer(var.shape, var.dtype))
remember that if you are not defining a seed, the weigths will be differents each time you call reset

Keras,models.add() missing 1 required positional argument: 'layer'

I'm classifying digits of the MNIST dataset using a simple feed forward neural net with Keras. So I execute the code below.
import os
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data', one_hot=True)
# Path to Computation graphs
LOGDIR = './graphs_3'
# start session
sess = tf.Session()
#Hyperparameters
LEARNING_RATE = 0.01
BATCH_SIZE = 1000
EPOCHS = 10
# Layers
HL_1 = 1000
HL_2 = 500
# Other Parameters
INPUT_SIZE = 28*28
N_CLASSES = 10
model = Sequential
model.add(Dense(HL_1, input_dim=(INPUT_SIZE,), activation="relu"))
#model.add(Activation(activation="relu"))
model.add(Dense(HL_2, activation="relu"))
#model.add(Activation("relu"))
model.add(Dropout(rate=0.9))
model.add(Dense(N_CLASSES, activation="softmax"))
model.compile(
optimizer="Adam",
loss="categorical_crossentropy",
metrics=['accuracy'])
# one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
model.fit(
x=mnist.train.images,
y=mnist.train.labels,
epochs=EPOCHS,
batch_size=BATCH_SIZE)
score = model.evaluate(
x=mnist.test.images,
y=mnist.test.labels)
print("score = ", score)
However, I get the following error:
model.add(Dense(1000, input_dim=(INPUT_SIZE,), activation="relu"))
TypeError: add() missing 1 required positional argument: 'layer'
The syntax is exactly as shown in the keras docs. I am using keras 2.0.9, so I don't think it's a version control problem. Did I do something wrong?
It seems perfect indeed....
But I noticed you're not creating "an instance" of a sequential model, your using the class name instead:
#yours: model = Sequential
#correct:
model = Sequential()
Since the methods in a class are always declared containing self as the first argument, calling the methods without an instance will probably require the instance as the first argument (which is self).
The method's definition is def add(self,layer,...):