Implementing a many-to-many LSTM in TensorFlow? - tensorflow

I am using TensorFlow to make predictions on time-series data. So it is like I have 50 tags and I want to find out the next possible 5 tags.
As shown in the following picture, I want to make it like the 4th structure.
I went through the tutorial demo: Recurrent Neural Networks
But I found it can provide like the 5th one in the above picture, which is different.
I am wondering which model could I use? I am thinking of the seq2seq models, but not sure if it is the right way.

You are right that you can use a seq2seq model. For brevity I've written up an example of how you can do it in Keras which also has a Tensorflow backend. I've not run the example so it might need tweaking. If your tags are one-hot you need to use cross-entropy loss instead.
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector
# The input shape is your sequence length and your token embedding size
inputs = Input(shape=(seq_len, embedding_size))
# Build a RNN encoder
encoder = LSTM(128, return_sequences=False)(inputs)
# Repeat the encoding for every input to the decoder
encoding_repeat = RepeatVector(5)(encoder)
# Pass your (5, 128) encoding to the decoder
decoder = LSTM(128, return_sequences=True)(encoding_repeat)
# Output each timestep into a fully connected layer
sequence_prediction = TimeDistributed(Dense(1, activation='linear'))(decoder)
model = Model(inputs, sequence_prediction)
model.compile('adam', 'mse') # Or categorical_crossentropy
model.fit(X_train, y_train)

Related

How to create a Keras model with switchable input layers?

Current simple model:
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.optimizers import Adam
def model():
input_A = Input(shape=(6, ))
out = Dense(64, activation="relu")(input_A)
out = Dense(32, activation="relu")(out)
outputs = Dense(1, activation="tanh")(out)
model = Model(
inputs=input_A,
outputs=outputs,
name="switchable_inputs_model")
model.compile(loss="mse", optimizer=Adam(), metrics=["accuracy"])
return model
I want to have another input layer input_B which will not be active all the time during learning. Let us say we have two input layers: input A, input B. However, at a given time, only one input layer can be active. This selection of input layer is decided by a binary combination of information available at the execution time(learning stage). For instance, if it is 1 0, then input layer A will be used. Similarly, if it is 0 1, input layer B will be used.
How can I do this?
It's hard to guess from your question what you are trying to accomplish in detail, but you should carefully consider if that is necessary.
It's common practice to have an input layer of a fixed size that matches the structure of your data. You preprocess your data to match that shape.
In the domain of e.g. images this might mean:
If you have images of different resolutions, you could consider cropping, padding or resizing your inputs to a fixed size.
If there is a rationale behind this please clarify.

How to use embedding models in tensorflow hub with LSTM layer?

I'm learning tensorflow 2 working through the text classification with TF hub tutorial. It used an embedding module from TF hub. I was wondering if I could modify the model to include a LSTM layer. Here's what I've tried:
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Embedding(10000, 50))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
I don't know how to get the vocabulary size from the hub_layer. So I just put 10000 there. When run it, it throws this exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[480,1] = -6 is not in [0, 10000)
[[node sequential/embedding/embedding_lookup (defined at .../learning/tensorflow/text_classify.py:36) ]] [Op:__inference_train_function_36284]
Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/34017 (defined at Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py:112)
Function call stack:
train_function
I stuck here. My questions are:
how should I use the embedding module from TF hub to feed an LSTM layer? it looks like embedding lookup has some issues with the setting.
how do I get the vocabulary size from the hub layer?
Thanks
Finally figured out the way to link pre-trained embeddings to LSTM or other layers. Just post the steps here in case anyone feels helpful.
Embedding layer has to be the first layer in the model. (hub_layer is the same as Embedding layer.) The not very intuitive part is that any text input to the hub layer will be converted to only one vector of shape [embedding_dim]. You need to do sentence splitting and tokenization to make sure whatever input to the model is a sequence in the form of array of arrays. e.g., "Let us prepare the data." should be converted to [["let"],["us"],["prepare"], ["the"], ["data"]]. You will also need to pad the sequences if you are using batch mode.
In addition, you will need to convert your target tokens to int if your training labels are strings. The input to the model is array of strings with shape [batch, seq_length], the hub embedding layer converts it to [batch, seq_length, embed_dim]. (If you add a LSTM or other RNN layer, the output from the layer is [batch, seq_length, rnn_units]. ) The output dense layer will output index of text instead of actual text. The index of text is stored in the downloaded tfhub directory as "tokens.txt". You can load the file and convert text to the corresponding index. Otherwise you cannot compute the loss.

tf.keras loss from two images in serial

I want to use the stability training approach of the paper and apply it to a very simple CNN.
The principle architecture is given by:
As shown in the figure you compute the loss based on the output f(I) for the input image I and on
the output f(I') for the perturbed image I'.
My question would be how to do this in a valid way without having two instances of the DNN,
as I'm training on large 3D images. In other words: how can I process two images in serial and compute the loss based on those two images?
I'm using tf2 with keras.
You can first write your DNN as a tf.keras Model.
After that, you can write another model which takes two image inputs, applies some Gaussian noise to one, passes them to DNN.
Design a custom loss function which finds the proper loss from the two outputs.
Here's a demo code:
from tensorflow.keras.layers import Input, Dense, Add, Activation, Flatten
from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import random
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
# shared DNN, this is the base model with a feature-space output, there is only once instance of the model
ip = Input(shape=(32,32,1)) # same as original inputs
f0 = Flatten()(ip)
d0 = Dense(10)(f0) # 10 dimensional feature embedding
dnn = Model(ip, d0)
# final model with two version of images and loss
input_1 = Input(shape=(32,32,1))
input_2 = Input(shape=(32,32,1))
g0 = GaussianNoise(0.5)(input_2) # only input_2 passes through gaussian noise layer, you can design your own custom layer too
# passing the two images to same DNN
path1 = dnn(input_1) # no noise
path2 = dnn(g0) # noise
model = Model([input_1, input_2], [path1, path2])
def my_loss(y_true, y_pred):
# calculate your loss based on your two outputs path1, path2
pass
model.compile('adam', my_loss)
model.summary()

How to apply l2 normalization to a layer in keras?

I am trying to normalize a layer in my neural network using l2 normalization. I want to divide each node/element in a specific layer by its l2 norm (the square root of the sum of squared elements), and my assumption is that keras' l2_normalize can achieve this: https://www.tensorflow.org/api_docs/python/tf/keras/backend/l2_normalize?version=stable. However, I am not sure how to actually use this since there are no examples in the documentation. I found other examples that use a lambda function along with it, for example Lambda(lambda x: K.l2_normalize(x,axis=1))(previous_layer). However, I am not sure why this needs to be done? Would appreciate help on how keras.backend.l2_normalize should be used and why a lambda function might be needed. Thanks!
Here is how I'd want to be used:
autoencoder = Sequential()
# Encoder Layer
autoencoder.add(Dense(encoded_dim, input_shape=(input_dim,),
activation='relu'))
# Normalization - Need help here!
# TODO: Add l2_normalize here
# Decoder Layer
# TODO: Add final output layer here
Do as the example you mentioned. It's ok.
You need a Layer for every operation in the model, backend operations are no exception, and that's the reason for the Lambda layer. (Keras needs layers to do its magic).
import keras.backend as K
autoencoder.add(Lambda(lambda x: K.l2_normalize(x,axis=1)))
If you are planning to get the encoder's output later for other things, I suggest you create the encoder and decoder as separate models:
encoder = Sequential()
#.... add encoder layers ....
#encoder.add(....)
decoder = Sequential()
#.... add decoder layers ....
#decoder.add(....)
Finally create the autoencoder as another model, for training:
from keras.layers import Input
from keras.models import Model
autoInput = Input(shape_of_the_encoder_input)
encodedData = encoder(autoInput)
decodedData = decoder(encodedData)
autoencoder = Model(autoInput, decodedData)

Can we have a combination of embedding layers and regular layers in a neural network?

I am trying to use neural networks for a binary classification problem using Keras. I am new to the whole neural network area. What I like to do is to have a network that has embedding layer for some features but regular input layer for the other features. For example, imagine I would like to use user ID as the input that goes to the embedding layer and everything else goes to the regular input layer. I know my question is more conceptual than technical so I am asking if this is possible to do in Keras or any other framework or tools for implementing neural networks.
Yes its possible, you have to use functioal API
Here is example, feel free to adapt for your needs:
from keras.models import Model, Sequential
from keras.layers import Dense, Flatten, Concatenate, Reshape, Input, Dropout, Dense, BatchNormalization, Activation, concatenate
from keras.layers.embeddings import Embedding
extraInput = Input((116,))
embed_input = Input((1,))
em_model = Embedding(10,
5,
input_length=1,
embeddings_initializer='uniform')(embed_input)
em_model = Reshape((5,))(em_model)
outputs = Concatenate(axis=1)([em_model,extraInput])
outputs = BatchNormalization(epsilon=1e-05, momentum=0.1) (outputs)
outputs = Dense(10, kernel_initializer='uniform', activation='relu')(outputs)
outputs = Dense(3, kernel_initializer='uniform', activation='relu')(outputs)
outputs = Dense(1)(outputs)
outputs = Activation('sigmoid')(outputs)
model = Model([embed_input, extraInput], outputs)
model.summary()
this will give you following graph: where you have two different inputs one for embedding and second for continues variable
Yes, it is possible. In any framework.
Basically, in such an architecture you'd have two neural components:
a feature extractor which would look at the raw data and produce an output, with the dimensionality of your choice
a classifier which would take features as inputs (features from your feature extractor, concatenated with your own engineered features) and produce a classification distribution