Transformer Network with keras - Does validation requires decoder input? - input

I have been studied Transformer network to apply on my dataset. I studied from a open-source as follows;
https://colab.research.google.com/drive/1CBe2VlogbyXzmIyRQGH5xzuvLwGrvjcf?usp=sharing#scrollTo=pjhneKhvtIi-
According to their code, they use 'model' from keras library, the input shape is [input,target] and the output shape is [target]. input is the input for encoder and target is the input for decoder, and also output for decoder.
This is the code they provided.
input_vocab_size = tokenizer_pt.vocab_size + 2
target_vocab_size = tokenizer_en.vocab_size + 2
x = encoder(input)
x = decoder([target, x] , mask = encoder.compute_mask(input))
x = tf.keras.layers.Dense(target_vocab_size)(x)
model = tf.keras.models.Model(inputs=[input, target], outputs=x)
When they train this, then this ML model also requires the same input shape like [input,target] to test the code out. This means we need 'X' and 'y' to predict 'y' to validate this model, which makes no sense.
Seems I need to separate the training and testing procedure, so that we train the model with [input, target],[target] and than validate with [input],[target].
What should I do if I want to validate like this?

Related

How to feed a Keras InputLayer without a model?

Let's take the following code as an example:
inputs = keras.layers.InputLayer(1).output
output = tf.random.uniform((1, )) * inputs
I want to feed inputs with value and have it propagate through the layers without using a Keras model.
How can it be done?

How to remove the last layer from trained model in Tensorflow

I want to remove the last layer of 'faster_rcnn_nas_lowproposals_coco' model which downloaded from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md.
I know I in Keras we can use model.layers.pop() to remove the last layer.
But I searched in the Internet and there are no equivalent function in tensorflow.
If there are no equivalent function in tensorflow, are there anyone can tell me how to load trained Model zoo by Keras?
You don't need to "pop" a layer, you just have to not load it:
For the example of Mobilenet (but put your downloaded model here) :
model = mobilenet.MobileNet()
x = model.layers[-2].output
The first line load the entire model, the second load the outputs of the before the last layer.
You can change layer[-x] with x being the outputs of the layer you want. So, for loading the model without the last layer, x should be equal to -2.
Then it's possible to use it like this :
x = Dense(256)(x)
predictions = Dense(15, activation = "softmax")(x)
model = Model(inputs = model.input, outputs = predictions)

How to initialize a keras tensor employed in an API model

I am trying to implemente a Memory-augmented neural network, in which the memory and the read/write/usage weight vectors are updated according to a combination of their previous values. These weigths are different from the classic weight matrices between layers that are automatically updated with the fit() function! My problem is the following: how can I correctly initialize these weights as keras tensors and use them in the model? I explain it better with the following simplified example.
My API model is something like:
input = Input(shape=(5,6))
controller = LSTM(20, activation='tanh',stateful=False, return_sequences=True)(input)
write_key = Dense(4,activation='tanh')(controller)
read_key = Dense(4,activation='tanh')(controller)
w_w = Add()([w_u, w_r]) #<---- UPDATE OF WRITE WEIGHTS
to_write = Dot()([w_w, write_key])
M = Add()([M,to_write])
cos_sim = Dot()([M,read_key])
w_r = Lambda(lambda x: softmax(x,axis=1))(cos_sim) #<---- UPDATE OF READ WEIGHTS
w_u = Add()([w_u,w_r,w_w]) #<---- UPDATE OF USAGE WEIGHTS
retrieved_memory = Dot()([w_r,M])
controller_output = concatenate([controller,retrieved_memory])
final_output = Dense(6,activation='sigmoid')(controller_output)`
You can see that, in order to compute w_w^t, I have to have first defined w_r^{t-1} and w_u^{t-1}. So, at the beginning I have to provide a valid initialization for these vectors. What is the best way to do it? The initializations I would like to have are:
M = K.variable(numpy.zeros((10,4))) # MEMORY
w_r = K.variable(numpy.zeros((1,10))) # READ WEIGHTS
w_u = K.variable(numpy.zeros((1,10))) # USAGE WEIGHTS`
But, analogously to what said in #2486(entron), these commands do not return a keras tensor with all the needed meta-data and so this returns the following error:
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
I also thought to use the old M, w_r and w_u as further inputs at each iteration and analogously get in output the same variables to complete the loop. But this means that I have to use the fit() function to train online the model having just the target as final output (Model 1), and employ the predict() function on the model with all the secondary outputs (Model 2) to get the variables to use at the next iteration. I have also to pass the weigth matrices from Model 1 to Model 2 using get_weights() and set_weights(). As you can see, it becomes a little bit messy and too slow.
Do you have any suggestions for this problem?
P.S. Please, do not focus too much on the API model above because it is a simplified (almost meaningless) version of the complete one where I skipped several key steps.

How do I share weights across different RNN cells that feed in different inputs in Tensorflow?

I'm curious if there is a good way to share weights across different RNN cells while still feeding each cell different inputs.
The graph that I am trying to build is like this:
where there are three LSTM Cells in orange which operate in parallel and between which I would like to share the weights.
I've managed to implement something similar to what I want using a placeholder (see below for code). However, using a placeholder breaks the gradient calculations of the optimizer and doesn't train anything past the point where I use the placeholder. Is it possible to do this a better way in Tensorflow?
I'm using Tensorflow 1.2 and python 3.5 in an Anaconda environment on Windows 7.
Code:
def ann_model(cls,data, act=tf.nn.relu):
with tf.name_scope('ANN'):
with tf.name_scope('ann_weights'):
ann_weights = tf.Variable(tf.random_normal([1,
cls.n_ann_nodes]))
with tf.name_scope('ann_bias'):
ann_biases = tf.Variable(tf.random_normal([1]))
out = act(tf.matmul(data,ann_weights) + ann_biases)
return out
def rnn_lower_model(cls,data):
with tf.name_scope('RNN_Model'):
data_tens = tf.split(data, cls.sequence_length,1)
for i in range(len(data_tens)):
data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
cls.n_rnn_inputs])
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)
outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
data_tens,
dtype=tf.float32)
with tf.name_scope('RNN_out_weights'):
out_weights = tf.Variable(
tf.random_normal([cls.n_rnn_nodes_lower,1]))
with tf.name_scope('RNN_out_biases'):
out_biases = tf.Variable(tf.random_normal([1]))
#Encode the output of the RNN into one estimate per entry in
#the input sequence
predict_list = []
for i in range(cls.sequence_length):
predict_list.append(tf.matmul(outputs[i],
out_weights)
+ out_biases)
return predict_list
def create_graph(cls,sess):
#Initializes the graph
with tf.name_scope('input'):
cls.x = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_inputs])
with tf.name_scope('labels'):
cls.y = tf.placeholder('float',[cls.batch_size,1])
with tf.name_scope('community_id'):
cls.c = tf.placeholder('float',[cls.batch_size,1])
#Define Placeholder to provide variable input into the
#RNNs with shared weights
cls.input_place = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_rnn_inputs])
#global step used in optimizer
global_step = tf.Variable(0,trainable = False)
#Create ANN
ann_output = cls.ann_model(cls.c)
#Combine output of ANN with other input data x
ann_out_seq = tf.reshape(tf.concat([ann_output for _ in
range(cls.sequence_length)],1),
[cls.batch_size,
cls.sequence_length,
cls.n_ann_nodes])
cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)
#Create 'unrolled' RNN by creating sequence_length many RNN Cells that
#share the same weights.
with tf.variable_scope('Lower_RNNs'):
#Create RNNs
daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2
When training mini-batches are calculated in two steps:
RNNinput = sess.run(cls.rnn_input,feed_dict = {
cls.x:batch_x,
cls.y:batch_y,
cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
cls.y:batch_y,
cls.x:batch_x,
cls.c:batch_c})
Thanks for your help. Any ideas would be appreciated.
You have 3 different inputs : input_1, input_2, input_3 fed it to a LSTM model which has the parameters shared. And then you concatenate the outputs of the 3 lstm and pass it to a final LSTM layer. The code should look something like this:
# Create input placeholder for the network
input_1 = tf.placeholder(...)
input_2 = tf.placeholder(...)
input_3 = tf.placeholder(...)
# create a shared rnn layer
def shared_rnn(...):
...
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)
# generate the outputs for each input
with tf.variable_scope('lower_lstm') as scope:
out_input_1 = shared_rnn(...)
scope.reuse_variables() # the variables will be reused.
out_input_2 = shared_rnn(...)
scope.reuse_variables()
out_input_3 = shared_rnn(...)
# verify whether the variables are reused
for v in tf.global_variables():
print(v.name)
# concat the three outputs
output = tf.concat...
# Pass it to the final_lstm layer and out the logits
logits = final_layer(output, ...)
train_op = ...
# train
sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}
I ended up rethinking my architecture a little and came up with a more workable solution.
Instead of duplicating the middle layer of LSTM cells to create three different cells with the same weights, I chose to run the same cell three times. The results of each run were stored in a 'buffer' like tf.Variable, and then that whole variable was used as an input into the final LSTM layer.
I drew a diagram here
Implementing it this way allowed for valid outputs after 3 time steps, and didn't break tensorflows backpropagation algorithm (i.e. The nodes in the ANN could still train.)
The only tricky thing was to make sure that the buffer was in the correct sequential order for the final RNN.

access trained parameter in CNTK

With a model like this, how can one access the trained parameters like weight and bias of each layer?
model = Sequential ([
Dense(xx, activation=cntk.sigmoid),
Dense(outputs)])
z = model(features)
Thanks.
The specific mechanisms are shown in this tutorial. Here is the sample that shows how to access the parameters:
model = create_model()
print(len(model.layers))
print(model.layers[0].E.shape)
print(model.layers[2].b.value)