How to create a Keras model with switchable input layers? - tensorflow

Current simple model:
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.optimizers import Adam
def model():
input_A = Input(shape=(6, ))
out = Dense(64, activation="relu")(input_A)
out = Dense(32, activation="relu")(out)
outputs = Dense(1, activation="tanh")(out)
model = Model(
inputs=input_A,
outputs=outputs,
name="switchable_inputs_model")
model.compile(loss="mse", optimizer=Adam(), metrics=["accuracy"])
return model
I want to have another input layer input_B which will not be active all the time during learning. Let us say we have two input layers: input A, input B. However, at a given time, only one input layer can be active. This selection of input layer is decided by a binary combination of information available at the execution time(learning stage). For instance, if it is 1 0, then input layer A will be used. Similarly, if it is 0 1, input layer B will be used.
How can I do this?

It's hard to guess from your question what you are trying to accomplish in detail, but you should carefully consider if that is necessary.
It's common practice to have an input layer of a fixed size that matches the structure of your data. You preprocess your data to match that shape.
In the domain of e.g. images this might mean:
If you have images of different resolutions, you could consider cropping, padding or resizing your inputs to a fixed size.
If there is a rationale behind this please clarify.

Related

tf.keras loss from two images in serial

I want to use the stability training approach of the paper and apply it to a very simple CNN.
The principle architecture is given by:
As shown in the figure you compute the loss based on the output f(I) for the input image I and on
the output f(I') for the perturbed image I'.
My question would be how to do this in a valid way without having two instances of the DNN,
as I'm training on large 3D images. In other words: how can I process two images in serial and compute the loss based on those two images?
I'm using tf2 with keras.
You can first write your DNN as a tf.keras Model.
After that, you can write another model which takes two image inputs, applies some Gaussian noise to one, passes them to DNN.
Design a custom loss function which finds the proper loss from the two outputs.
Here's a demo code:
from tensorflow.keras.layers import Input, Dense, Add, Activation, Flatten
from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import random
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
# shared DNN, this is the base model with a feature-space output, there is only once instance of the model
ip = Input(shape=(32,32,1)) # same as original inputs
f0 = Flatten()(ip)
d0 = Dense(10)(f0) # 10 dimensional feature embedding
dnn = Model(ip, d0)
# final model with two version of images and loss
input_1 = Input(shape=(32,32,1))
input_2 = Input(shape=(32,32,1))
g0 = GaussianNoise(0.5)(input_2) # only input_2 passes through gaussian noise layer, you can design your own custom layer too
# passing the two images to same DNN
path1 = dnn(input_1) # no noise
path2 = dnn(g0) # noise
model = Model([input_1, input_2], [path1, path2])
def my_loss(y_true, y_pred):
# calculate your loss based on your two outputs path1, path2
pass
model.compile('adam', my_loss)
model.summary()

How can I print the activations of each layer of a model while training the same Keras model?

The following code (based on https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer) allows us to print the output of an intermediate layer of a model, given some input, which we need to provide. Specifically, in this example, I am printing the output of the layer dense, given the input to the layer input_1, which also happens to be the input layer of my model (but this does not have to be like that).
import numpy as np
import tensorflow as tf
def get_model():
inp = tf.keras.layers.Input(shape=(1,))
x = tf.keras.layers.Dense(8)(inp)
x = tf.keras.layers.Dense(16)(x)
out = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs=inp, outputs=out)
model.summary()
return model
def train():
my_model = get_model()
my_model.compile(optimizer="adam", loss="mse")
get_layer_output = tf.keras.backend.function([my_model.get_layer("input_1").input],
[my_model.get_layer("dense").output])
data_x = np.array([[1], [2], [3], [4]])
layer_output = get_layer_output(data_x)[0]
print(layer_output)
if __name__ == '__main__':
train()
However, I would like to print the output of each layer, given the output of the corresponding previous layer (as defined by the model), while training the model, i.e. after each mini-batch. I tried to use a callback, which calls tf.print for printing the output of a model, but I am getting an error, which is described in this Github issue (i.e. there is a bug in TensorFlow 2.0, which is the version I am using and that I want to use).
To be clearer, I would like to debug the output of each layer while I am training the model, so that I can understand how the inputs flow throughout each layer and if the activations are not too high (explode) or too small (vanish). I could iteratively provide a batch of data to get_layer_output, but I would like to be able to print the activations of each layer while training the model with fit or fit_generator. Furthermore, I would like to understand how the values of the activations evolve from the input layer to the output layer and not just print the activations of one layer given the input to another layer.
Here's a Github issue that asks a similar thing.

Unable to track record by record processing in LSTM algorithm for text classification?

We are working on multi-class text classification and following is the process which we have used.
1) We have created 300 dim's vector with word2vec word embedding using our own data and then passed that vector as a weights to LSTM embedding layer.
2) And then we have used one LSTM layer and one dense layer.
Here below is my code:
input_layer = layers.Input((train_seq_x.shape[1], ))
embedding_layer = layers.Embedding(len(word_index)+1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
embedding_layer = layers.SpatialDropout1D(0.3)(embedding_layer)
lstm_layer1 = layers.LSTM(300,return_sequences=True,activation="relu")(embedding_layer)
lstm_layer1 = layers.Dropout(0.5)(lstm_layer1)
flat_layer = layers.Flatten()(lstm_layer1)
output_layer = layers.Dense(33, activation="sigmoid")(flat_layer)
model = models.Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer=optimizers.Adam(), loss='categorical_crossentropy',metrics=['accuracy'])
Please help me out on the below questions:
Q1) Why did we pass word embedding vector(300 dim's) as weights in LSTM embedding layer?
Q2) How can we know optimal number of neural in LSTM layer?
Q3) Can you please explain how the single record processing in LSTM algorithm?
Please let me know if you requires more information on the same.
Q1) Why did we pass word embedding vector(300 dim's) as weights in
LSTM embedding layer?
In a very simplistic way, you can think of an embedding layers as a lookup table which converts a word (represented by its index in a dictionary) to a vector. It is a trainable layers. Since you have already trained word embeddings instead of initializing the embedding layer with the random weight you initialize it with the vectors you have learned.
Embedding(len(word_index)+1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
So here you are
creating an embedding layer or a look up table which can lookup words
indices 0 to len(word_index).
Each lookuped up word will map to a vector of size 300.
This lookup table is loaded with the vectors from "embedding_matrix"
(which is a pretrained model).
trainable=False will freez the weight in this layer.
You have passed 300 because it is the vector size of your pretrained model (embedding_matrix)
Q2) How can we know optimal number of neural in LSTM layer?
You have created a LSTM layer with takes 300 size vector as input and returns a vector of size 300. The output size and number of stacked LSTMS are hyperparameters which is tuned manually (usually using KFold CV)
Q3) Can you please explain how the single record processing in LSTM
algorithm?
A single record/sentence(s) are converted into indices of the vocabulary. So for every sentence you have an array of indices.
A batch of these sentences are created and feed as input to the model.
LSTM is unwrapped by passing in one index at a time as input at each timestep.
Finally the ouput of the LSTM is forward propagated by a final dense
layer to size 33. So looks like each input is mapped to one of 33
classes in your case.
Simple example
import numpy as np
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten, LSTM
from keras.layers.embeddings import Embedding
from nltk.lm import Vocabulary
from keras.utils import to_categorical
training_data = [ "it was a good movie".split(), "it was a bad movie".split()]
training_target = [1, 0]
v = Vocabulary([word for s in training_data for word in s])
model = Sequential()
model.add(Embedding(len(v),50,input_length = 5, dropout = 0.2))
model.add(LSTM(10, dropout_U = 0.2, dropout_W = 0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(model.summary())
x = np.array([list(map(lambda x: v[x], s)) for s in training_data])
y = to_categorical(training_target)
model.fit(x,y)

Can we have a combination of embedding layers and regular layers in a neural network?

I am trying to use neural networks for a binary classification problem using Keras. I am new to the whole neural network area. What I like to do is to have a network that has embedding layer for some features but regular input layer for the other features. For example, imagine I would like to use user ID as the input that goes to the embedding layer and everything else goes to the regular input layer. I know my question is more conceptual than technical so I am asking if this is possible to do in Keras or any other framework or tools for implementing neural networks.
Yes its possible, you have to use functioal API
Here is example, feel free to adapt for your needs:
from keras.models import Model, Sequential
from keras.layers import Dense, Flatten, Concatenate, Reshape, Input, Dropout, Dense, BatchNormalization, Activation, concatenate
from keras.layers.embeddings import Embedding
extraInput = Input((116,))
embed_input = Input((1,))
em_model = Embedding(10,
5,
input_length=1,
embeddings_initializer='uniform')(embed_input)
em_model = Reshape((5,))(em_model)
outputs = Concatenate(axis=1)([em_model,extraInput])
outputs = BatchNormalization(epsilon=1e-05, momentum=0.1) (outputs)
outputs = Dense(10, kernel_initializer='uniform', activation='relu')(outputs)
outputs = Dense(3, kernel_initializer='uniform', activation='relu')(outputs)
outputs = Dense(1)(outputs)
outputs = Activation('sigmoid')(outputs)
model = Model([embed_input, extraInput], outputs)
model.summary()
this will give you following graph: where you have two different inputs one for embedding and second for continues variable
Yes, it is possible. In any framework.
Basically, in such an architecture you'd have two neural components:
a feature extractor which would look at the raw data and produce an output, with the dimensionality of your choice
a classifier which would take features as inputs (features from your feature extractor, concatenated with your own engineered features) and produce a classification distribution

Implementing a many-to-many LSTM in TensorFlow?

I am using TensorFlow to make predictions on time-series data. So it is like I have 50 tags and I want to find out the next possible 5 tags.
As shown in the following picture, I want to make it like the 4th structure.
I went through the tutorial demo: Recurrent Neural Networks
But I found it can provide like the 5th one in the above picture, which is different.
I am wondering which model could I use? I am thinking of the seq2seq models, but not sure if it is the right way.
You are right that you can use a seq2seq model. For brevity I've written up an example of how you can do it in Keras which also has a Tensorflow backend. I've not run the example so it might need tweaking. If your tags are one-hot you need to use cross-entropy loss instead.
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
# The input shape is your sequence length and your token embedding size
inputs = Input(shape=(seq_len, embedding_size))
# Build a RNN encoder
encoder = LSTM(128, return_sequences=False)(inputs)
# Repeat the encoding for every input to the decoder
encoding_repeat = RepeatVector(5)(encoder)
# Pass your (5, 128) encoding to the decoder
decoder = LSTM(128, return_sequences=True)(encoding_repeat)
# Output each timestep into a fully connected layer
sequence_prediction = TimeDistributed(Dense(1, activation='linear'))(decoder)
model = Model(inputs, sequence_prediction)
model.compile('adam', 'mse') # Or categorical_crossentropy
model.fit(X_train, y_train)