I'm training a DNN model on some data, and am hoping to analyze the learned weights to learn something about the true system I am studying (signaling cascades in biology). I guess one could say I am using Artificial NNs to learn about Biological NNs.
For each of my training examples, I have removed a single gene, that is responsible for signaling at the top layer.
As I am modeling this signaling cascade as a NN, and removing one of the nodes in the first hidden layer, I realized that I'm doing a real life version of dropout.
I would therefore like to use dropout to train my model, however the implementations of dropout that I have seen online seem to randomly drop out a node. What I need is a way to specify which node to dropout for each training example.
Any advice on how to implement this? I'm open to any package, but right now everything i have already done is in Tensorflow so I'd appreciate a solution that uses that framework.
For those that prefer the details explained:
I have 10 input variables, that are fully connected to 32 relu nodes in the first layer, which are fully connected to a second layer (relu), which is fully connected to the output (linear because I am doing regression).
In addition to the 10 input variables, I also happen to know which of the 28 nodes should be dropped out.
Is there a way I can specify this when training?
Here is the code I currently use:
num_stresses = 10
num_kinase = 32
num_transcription_factors = 200
num_genes = 6692
# Build neural network
# Input variables (10)
# Which Node to dropout (32)
stress = tflearn.input_data(shape=[None, num_stresses])
kinase_deletion = tflearn.input_data(shape=[None, num_kinase])
# This is the layer that I want to perform selective dropout on,
# I should be able to specify which of the 32 nodes should output zero
# based on a 1X32 vector of ones and zeros.
kinase = tflearn.fully_connected(stress, num_kinase, activation='relu')
transcription_factor = tflearn.fully_connected(kinase, num_transcription_factors, activation='relu')
gene = tflearn.fully_connected(transcription_factor, num_genes, activation='linear')
adam = tflearn.Adam(learning_rate=0.00001, beta1=0.99)
regression = tflearn.regression(gene, optimizer=adam, loss='mean_square', metric='R2')
# Define model
model = tflearn.DNN(regression, tensorboard_verbose=1)
I would supply your input variables and an equal sized vector of all 1's except for the one you want to drop, that one is a 0.
The very first operation should then be multiplication to zero out the gene you want to drop. From there on out, it should be the exact same as what you have now.
You can either multiply (zero out your gene) before handing it to tensorflow or add another place holder and feed it into the graph in the feed_dict like you do your variables. The latter one would probably be better.
If you need to drop a hidden node (in layer 2), it's just another vector of 1s and a 0.
Let me know if that works or if you need more help.
Edit:
Ok, so I haven't really worked with tflearn very much (I just did regular tensorflow), but I think you can combine tensorflow and tflearn. Basically, I added tf.multiply. You might have to add in another tflearn.input_data(shape =[num_stresses]) and tflearn.input_data(shape =[num_kinase]) to give you placeholders for the stresses_dropout_vector and kinase_dropout_vector. And of course, you can change the number and positions of zeros in those two vectors.
import tensorflow as tf ###### New ######
import tflearn
num_stresses = 10
num_kinase = 32
num_transcription_factors = 200
num_genes = 6692
stresses_dropout_vector = [1] * num_stresses ###### NEW ######
stresses_dropout_vector[desired_node_to_drop] = 0 ###### NEW ######
kinase_dropout_vector = [1] * num_kinase ###### NEW ######
kinase_dropout_vector[desired_hidden_node_to_drop] = 0 ###### NEW ######
# Build neural network
# Input variables (10)
# Which Node to dropout (32)
stress = tflearn.input_data(shape=[None, num_stresses])
kinase_deletion = tflearn.input_data(shape=[None, num_kinase])
# This is the layer that I want to perform selective dropout on,
# I should be able to specify which of the 32 nodes should output zero
# based on a 1X32 vector of ones and zeros.
stress_dropout = tf.multiply(stress, stresses_dropout_vector) ###### NEW ###### Drops out an input
kinase = tflearn.fully_connected(stress_dropout, num_kinase, activation='relu') ### changed stress to stress_dropout
kinase_dropout = tf.multiply(kinase, kinase_dropout_vector) ###### NEW ###### Drops out a hidden node
transcription_factor = tflearn.fully_connected(kinase_dropout, num_transcription_factors, activation='relu') ### changed kinase to kinase_dropout
gene = tflearn.fully_connected(transcription_factor, num_genes, activation='linear')
adam = tflearn.Adam(learning_rate=0.00001, beta1=0.99)
regression = tflearn.regression(gene, optimizer=adam, loss='mean_square', metric='R2')
# Define model
model = tflearn.DNN(regression, tensorboard_verbose=1)
If adding in tensorflow doesn't work, you just have to find a regular old tflearn.multiply function that does an element wise multiplication of two given tensors/vectors.
Hope that helps.
For completeness, here is my final implementation:
import numpy as np
import pandas as pd
import tflearn
import tensorflow as tf
meta = pd.read_csv('../../input/nn/meta.csv')
experiments = meta["Unnamed: 0"]
del meta["Unnamed: 0"]
stress_one_hot = pd.get_dummies(meta["train"])
kinase_deletion = pd.get_dummies(meta["Strain"])
kinase_one_hot = 1 - kinase_deletion
expression = pd.read_csv('../../input/nn/data.csv')
genes = expression["Unnamed: 0"]
del expression["Unnamed: 0"] # This holds the gene names just so you know...
expression = expression.transpose()
# Set up data for tensorflow
# Gene expression
target = expression
target = np.array(expression, dtype='float32')
target_mean = target.mean(axis=0, keepdims=True)
target_std = target.std(axis=0, keepdims=True)
target = target - target_mean
target = target / target_std
# Stress information
data1 = stress_one_hot
data1 = np.array(data1, dtype='float32')
data_mean = data1.mean(axis=0, keepdims=True)
data_std = data1.std(axis=0, keepdims=True)
data1 = data1 - data_mean
data1 = data1 / data_std
# Kinase information
data2 = kinase_one_hot
data2 = np.array(data2, dtype='float32')
# For Reference
# data1.shape
# #(301, 10)
# data2.shape
# #(301, 29)
# Build the Neural Network
num_stresses = 10
num_kinase = 29
num_transcription_factors = 200
num_genes = 6692
# Build neural network
# Input variables (10)
# Which Node to dropout (32)
stress = tflearn.input_data(shape=[None, num_stresses])
kinase_deletion = tflearn.input_data(shape=[None, num_kinase])
# This is the layer that I want to perform selective dropout on,
# I should be able to specify which of the 32 nodes should output zero
# based on a 1X32 vector of ones and zeros.
kinase = tflearn.fully_connected(stress, num_kinase, activation='relu')
kinase_dropout = tf.mul(kinase, kinase_deletion)
transcription_factor = tflearn.fully_connected(kinase_dropout, num_transcription_factors, activation='relu')
gene = tflearn.fully_connected(transcription_factor, num_genes, activation='linear')
adam = tflearn.Adam(learning_rate=0.00001, beta1=0.99)
regression = tflearn.regression(gene, optimizer=adam, loss='mean_square', metric='R2')
# Define model
model = tflearn.DNN(regression, tensorboard_verbose=1)
# Start training (apply gradient descent algorithm)
model.fit([data1, data2], target, n_epoch=20000, show_metric=True, shuffle=True)#,validation_set=0.05)
Related
I need a little bit of clarification regarding my model results.
Here is my use case:
Deciding whether a review from a company from S&P 500 is negative or positive. I used a crawled data set from indeed. (The dataset was labelled (0 - positive, 1 - negative), tokenized and cleaned).
Here are some important information, in order to understand the model and my approach:
# Constants
NB_WORDS = 44000 # Parameter indicating the number of words we'll put in the dictionary
VAL_SIZE = 1000 # Size of the validation set
NB_START_EPOCHS = 10 # Number of epochs we usually start to train with
EPOCH_ITER = list(range(0,11)) # For stepwise evaluating the accuracy metrics for 10 epochs
BATCH_SIZE = 512 # Size of the batches used in the mini-batch gradient descent
MAX_LEN = 267 # Maximum number of words in a sequence (review)
REV_DIM = 300 # Number of dimensions of the indeed review word embeddings --> most common Mikolow et al., 2013
# Modeling
emb_model = models.Sequential()
emb_model.add(layers.Embedding(NB_WORDS, REV_DIM, input_length=MAX_LEN))
# Embedding layer is first hidden layer
"""
Embedding Layer (
input_length = no. of words in vocabularly;
output_dim = dimensionality;
max_length = length of largest review
)
"""
emb_model.add(layers.Flatten())
# Flatten Layers are reshaping tensor to 1-D array
emb_model.add(layers.Dense(2, activation='softmax'))
# Is the regular deeply connected neural network layer. It is most common and
# frequently used layer. Dense layer does the below operation on the input and return the output.
# Operation := output = activation(dot(input, kernel) + bias)
# further see: https://www.tutorialspoint.com/keras/keras_dense_layer.htm#:~:text=Advertisements,input%20and%20return%20the%20output.
# Defines the output size in our case 2, hence positive or negative (0 or 1)
emb_model.summary()
I already did some interpretation. But since I'm a beginner, I really need further information/interpretations/tips, especially on how and why to improve my model.
Here are my results:
I am trying to write code for this architecture (Question Answering model: Paper https://www.hindawi.com/journals/cin/2019/9543490/) and looking for help how to get hidden state matrices Hq and Ha from stacked BiLSTM layers. Could some one please advise.
# Creating Embedding Layer for Query
# Considered fixed length as 40 for both question and answer as per research paper
embedding_layer1 = layers.Embedding(vocab_size_query, 300, weights=[embedding_matrix_query], input_length =40, trainable=False)
input_text1 =Input(shape=(40,), name="input_text")
x = embedding_layer1(input_text1)
# Creating Bidirectional layer for Query
# Each word in the context and question should be made aware of the nearby words occurring. We use a bi-directional recurrent neural network (LSTM’s) here.
x = Bidirectional(LSTM(128,recurrent_dropout=0.5,kernel_regularizer=regularizers.l2(0.001),return_sequences=True))(x)
x = Bidirectional(LSTM(128,recurrent_dropout=0.5,kernel_regularizer=regularizers.l2(0.001),return_sequences=True))(x)
flatten_1 = Flatten()(x)
## Creating Embedding Layer for Passage
embedding_layer2 = layers.Embedding(vocab_size_answer, 300, weights=[embedding_matrix_answer], input_length =40, trainable=False)
input_text2 =Input(shape=(40,), name="input_text")
x2 = embedding_layer2(input_text2)
# Creating Bidirectional layer for Passage
x2 = Bidirectional(LSTM(128,recurrent_dropout=0.5,kernel_regularizer=regularizers.l2(0.001),return_sequences=True))(x2)
x2 = Bidirectional(LSTM(128,recurrent_dropout=0.5,kernel_regularizer=regularizers.l2(0.001),return_sequences=True))(x2)
flatten_2 = Flatten()(x2)
According to the model structure and your source code, you can obtain the Hq and Ha by extracting the output of flatten_1 and flatten_2 layer. To extract the output of an intermediate layer, you can create a new model with input as the original input, and the output as the appropriate layer.
from tensorflow.keras.models import Model
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)
I'm curious if there is a good way to share weights across different RNN cells while still feeding each cell different inputs.
The graph that I am trying to build is like this:
where there are three LSTM Cells in orange which operate in parallel and between which I would like to share the weights.
I've managed to implement something similar to what I want using a placeholder (see below for code). However, using a placeholder breaks the gradient calculations of the optimizer and doesn't train anything past the point where I use the placeholder. Is it possible to do this a better way in Tensorflow?
I'm using Tensorflow 1.2 and python 3.5 in an Anaconda environment on Windows 7.
Code:
def ann_model(cls,data, act=tf.nn.relu):
with tf.name_scope('ANN'):
with tf.name_scope('ann_weights'):
ann_weights = tf.Variable(tf.random_normal([1,
cls.n_ann_nodes]))
with tf.name_scope('ann_bias'):
ann_biases = tf.Variable(tf.random_normal([1]))
out = act(tf.matmul(data,ann_weights) + ann_biases)
return out
def rnn_lower_model(cls,data):
with tf.name_scope('RNN_Model'):
data_tens = tf.split(data, cls.sequence_length,1)
for i in range(len(data_tens)):
data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
cls.n_rnn_inputs])
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)
outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
data_tens,
dtype=tf.float32)
with tf.name_scope('RNN_out_weights'):
out_weights = tf.Variable(
tf.random_normal([cls.n_rnn_nodes_lower,1]))
with tf.name_scope('RNN_out_biases'):
out_biases = tf.Variable(tf.random_normal([1]))
#Encode the output of the RNN into one estimate per entry in
#the input sequence
predict_list = []
for i in range(cls.sequence_length):
predict_list.append(tf.matmul(outputs[i],
out_weights)
+ out_biases)
return predict_list
def create_graph(cls,sess):
#Initializes the graph
with tf.name_scope('input'):
cls.x = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_inputs])
with tf.name_scope('labels'):
cls.y = tf.placeholder('float',[cls.batch_size,1])
with tf.name_scope('community_id'):
cls.c = tf.placeholder('float',[cls.batch_size,1])
#Define Placeholder to provide variable input into the
#RNNs with shared weights
cls.input_place = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_rnn_inputs])
#global step used in optimizer
global_step = tf.Variable(0,trainable = False)
#Create ANN
ann_output = cls.ann_model(cls.c)
#Combine output of ANN with other input data x
ann_out_seq = tf.reshape(tf.concat([ann_output for _ in
range(cls.sequence_length)],1),
[cls.batch_size,
cls.sequence_length,
cls.n_ann_nodes])
cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)
#Create 'unrolled' RNN by creating sequence_length many RNN Cells that
#share the same weights.
with tf.variable_scope('Lower_RNNs'):
#Create RNNs
daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2
When training mini-batches are calculated in two steps:
RNNinput = sess.run(cls.rnn_input,feed_dict = {
cls.x:batch_x,
cls.y:batch_y,
cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
cls.y:batch_y,
cls.x:batch_x,
cls.c:batch_c})
Thanks for your help. Any ideas would be appreciated.
You have 3 different inputs : input_1, input_2, input_3 fed it to a LSTM model which has the parameters shared. And then you concatenate the outputs of the 3 lstm and pass it to a final LSTM layer. The code should look something like this:
# Create input placeholder for the network
input_1 = tf.placeholder(...)
input_2 = tf.placeholder(...)
input_3 = tf.placeholder(...)
# create a shared rnn layer
def shared_rnn(...):
...
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)
# generate the outputs for each input
with tf.variable_scope('lower_lstm') as scope:
out_input_1 = shared_rnn(...)
scope.reuse_variables() # the variables will be reused.
out_input_2 = shared_rnn(...)
scope.reuse_variables()
out_input_3 = shared_rnn(...)
# verify whether the variables are reused
for v in tf.global_variables():
print(v.name)
# concat the three outputs
output = tf.concat...
# Pass it to the final_lstm layer and out the logits
logits = final_layer(output, ...)
train_op = ...
# train
sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}
I ended up rethinking my architecture a little and came up with a more workable solution.
Instead of duplicating the middle layer of LSTM cells to create three different cells with the same weights, I chose to run the same cell three times. The results of each run were stored in a 'buffer' like tf.Variable, and then that whole variable was used as an input into the final LSTM layer.
I drew a diagram here
Implementing it this way allowed for valid outputs after 3 time steps, and didn't break tensorflows backpropagation algorithm (i.e. The nodes in the ANN could still train.)
The only tricky thing was to make sure that the buffer was in the correct sequential order for the final RNN.
I'm trying to create a simple neural net in tensorflow that learns some simple relationship between inputs and outputs (for example, y=-x) where the inputs and outputs are floating point values (meaning, no softmax used on the output).
I feel like this should be pretty easy to do, but I must be messing up somewhere. Wondering if there are any tutorials or examples out there that do something similar. I looked through the existing tensorflow tutorials and didn't see anything like this and looked through several other sources of tensorflow examples I found by googling, but didn't see what I was looking for.
Here's a trimmed down version of what I've been trying. In this particular version, I've noticed that my weights and biases always seem to be stuck at zero. Perhaps this is due to my single input and single output?
I've had good luck altering the mist example for various nefarious purposes, but everything I've gotten to work successfully used softmax on the output for categorization. If I can figure out how to generate a raw floating point output from my neural net, there are several fun projects I'd like to do with it.
Anyone see what I'm missing? Thanks in advance!
- J.
# Trying to define the simplest possible neural net where the output layer of the neural net is a single
# neuron with a "continuous" (a.k.a floating point) output. I want the neural net to output a continuous
# value based off one or more continuous inputs. My real problem is more complex, but this is the simplest
# representation of it for explaining my issue. Even though I've oversimplified this to look like a simple
# linear regression problem (y=m*x), I want to apply this to more complex neural nets. But if I can't get
# it working with this simple problem, then I won't get it working for anything more complex.
import tensorflow as tf
import random
import numpy as np
INPUT_DIMENSION = 1
OUTPUT_DIMENSION = 1
TRAINING_RUNS = 100
BATCH_SIZE = 10000
VERF_SIZE = 1
# Generate two arrays, the first array being the inputs that need trained on, and the second array containing outputs.
def generate_test_point():
x = random.uniform(-8, 8)
# To keep it simple, output is just -x.
out = -x
return ( np.array([ x ]), np.array([ out ]) )
# Generate a bunch of data points and then package them up in the array format needed by
# tensorflow
def generate_batch_data( num ):
xs = []
ys = []
for i in range(num):
x, y = generate_test_point()
xs.append( x )
ys.append( y )
return (np.array(xs), np.array(ys) )
# Define a single-layer neural net. Originally based off the tensorflow mnist for beginners tutorial
# Create a placeholder for our input variable
x = tf.placeholder(tf.float32, [None, INPUT_DIMENSION])
# Create variables for our neural net weights and bias
W = tf.Variable(tf.zeros([INPUT_DIMENSION, OUTPUT_DIMENSION]))
b = tf.Variable(tf.zeros([OUTPUT_DIMENSION]))
# Define the neural net. Note that since I'm not trying to classify digits as in the tensorflow mnist
# tutorial, I have removed the softmax op. My expectation is that 'net' will return a floating point
# value.
net = tf.matmul(x, W) + b
# Create a placeholder for the expected result during training
expected = tf.placeholder(tf.float32, [None, OUTPUT_DIMENSION])
# Same training as used in mnist example
cross_entropy = -tf.reduce_sum(expected*tf.log(tf.clip_by_value(net,1e-10,1.0)))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
# Perform our training runs
for i in range( TRAINING_RUNS ):
print "trainin run: ", i,
batch_inputs, batch_outputs = generate_batch_data( BATCH_SIZE )
# I've found that my weights and bias values are always zero after training, and I'm not sure why.
sess.run( train_step, feed_dict={x: batch_inputs, expected: batch_outputs})
# Test our accuracy as we train... I am defining my accuracy as the error between what I
# expected and the actual output of the neural net.
#accuracy = tf.reduce_mean(tf.sub( expected, net))
accuracy = tf.sub( expected, net) # using just subtract since I made my verification size 1 for debug
# Uncomment this to debug
#import pdb; pdb.set_trace()
batch_inputs, batch_outputs = generate_batch_data( VERF_SIZE )
result = sess.run(accuracy, feed_dict={x: batch_inputs, expected: batch_outputs})
print " progress: "
print " inputs: ", batch_inputs
print " outputs:", batch_outputs
print " actual: ", result
Your loss should be the squared difference of output and true value:
loss = tf.reduce_mean(tf.square(expected - net))
This way the network learns to optimize this loss and make the output closer to the real result. Cross entropy should only be used for output values between 0 and 1 i.e. for classification.
If anyone is interested, I got this example to work. Here's the code:
# Trying to define the simplest possible neural net where the output layer of the neural net is a single
# neuron with a "continuous" (a.k.a floating point) output. I want the neural net to output a continuous
# value based off one or more continuous inputs. My real problem is more complex, but this is the simplest
# representation of it for explaining my issue. Even though I've oversimplified this to look like a simple
# linear regression problem (y=m*x), I want to apply this to more complex neural nets. But if I can't get
# it working with this simple problem, then I won't get it working for anything more complex.
import tensorflow as tf
import random
import numpy as np
INPUT_DIMENSION = 1
OUTPUT_DIMENSION = 1
TRAINING_RUNS = 100
BATCH_SIZE = 10000
VERF_SIZE = 1
# Generate two arrays, the first array being the inputs that need trained on, and the second array containing outputs.
def generate_test_point():
x = random.uniform(-8, 8)
# To keep it simple, output is just -x.
out = -x
return (np.array([x]), np.array([out]))
# Generate a bunch of data points and then package them up in the array format needed by
# tensorflow
def generate_batch_data(num):
xs = []
ys = []
for i in range(num):
x, y = generate_test_point()
xs.append(x)
ys.append(y)
return (np.array(xs), np.array(ys))
# Define a single-layer neural net. Originally based off the tensorflow mnist for beginners tutorial
# Create a placeholder for our input variable
x = tf.placeholder(tf.float32, [None, INPUT_DIMENSION])
# Create variables for our neural net weights and bias
W = tf.Variable(tf.zeros([INPUT_DIMENSION, OUTPUT_DIMENSION]))
b = tf.Variable(tf.zeros([OUTPUT_DIMENSION]))
# Define the neural net. Note that since I'm not trying to classify digits as in the tensorflow mnist
# tutorial, I have removed the softmax op. My expectation is that 'net' will return a floating point
# value.
net = tf.matmul(x, W) + b
# Create a placeholder for the expected result during training
expected = tf.placeholder(tf.float32, [None, OUTPUT_DIMENSION])
# Same training as used in mnist example
loss = tf.reduce_mean(tf.square(expected - net))
# cross_entropy = -tf.reduce_sum(expected*tf.log(tf.clip_by_value(net,1e-10,1.0)))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
# Perform our training runs
for i in range(TRAINING_RUNS):
print("trainin run: ", i, )
batch_inputs, batch_outputs = generate_batch_data(BATCH_SIZE)
# I've found that my weights and bias values are always zero after training, and I'm not sure why.
sess.run(train_step, feed_dict={x: batch_inputs, expected: batch_outputs})
# Test our accuracy as we train... I am defining my accuracy as the error between what I
# expected and the actual output of the neural net.
# accuracy = tf.reduce_mean(tf.sub( expected, net))
accuracy = tf.subtract(expected, net) # using just subtract since I made my verification size 1 for debug
# tf.subtract()
# Uncomment this to debug
# import pdb; pdb.set_trace()
print("W=%f, b=%f" % (sess.run(W), sess.run(b)))
batch_inputs, batch_outputs = generate_batch_data(VERF_SIZE)
result = sess.run(accuracy, feed_dict={x: batch_inputs, expected: batch_outputs})
print(" progress: ")
print(" inputs: ", batch_inputs)
print(" outputs:", batch_outputs)
print(" actual: ", result)
When using the built in, easy way of constructing the NN, I used
loss=tf.keras.losses.MeanSquaredError().
I went through the code and I'm afraid I don't grasp an important point.
I can't seem to find the weights matrix of the model for the encoder and decoder, neither where they are updated. I found the target_weights but it seems to be reinitialized at every get_batch() call so I don't really understand what they stand for either.
My actual goal is to concatenate two hidden states of two source encoders for one decoder by applying a linear transformation with a weight matrix that I'll have to train along with the model (I'm building a manytoone model), but I have no idea where to start because of my problem mentionned above.
This might help you start. There are a couple of models implemented in tensorflow.python.ops.seq2seq.py (with/without buckets, attention, etc.) but take a look at the definition for embedding_attention_seq2seq (which is the one called in their example model seq2seq_model.py that you seem to be referencing):
def embedding_attention_seq2seq(encoder_inputs, decoder_inputs, cell,
num_encoder_symbols, num_decoder_symbols,
num_heads=1, output_projection=None,
feed_previous=False, dtype=dtypes.float32,
scope=None, initial_state_attention=False):
with variable_scope.variable_scope(scope or "embedding_attention_seq2seq"):
# Encoder.
encoder_cell = rnn_cell.EmbeddingWrapper(cell, num_encoder_symbols)
encoder_outputs, encoder_state = rnn.rnn(
encoder_cell, encoder_inputs, dtype=dtype)
# First calculate a concatenation of encoder outputs to put attention on.
top_states = [array_ops.reshape(e, [-1, 1, cell.output_size])
for e in encoder_outputs]
attention_states = array_ops.concat(1, top_states)
....
You can see where it picks out the top layer of encoder outputs as top_states before handing them off to the decoder.
So you could implement a similar function with two encoders and concatenate those states before handing off to the decoder.
The value created in the get_batch function is only used for the first iteration. Even though the weights are passed every time into the function, their value gets updated as a global variable in the Seq2Seq model class in the init function.
with tf.name_scope('Optimizer'):
# Gradients and SGD update operation for training the model.
params = tf.trainable_variables()
if not forward_only:
self.gradient_norms = []
self.updates = []
opt = tf.train.GradientDescentOptimizer(self.learning_rate)
for b in range(len(buckets)):
gradients = tf.gradients(self.losses[b], params)
clipped_gradients, norm = tf.clip_by_global_norm(gradients,
max_gradient_norm)
self.gradient_norms.append(norm)
self.updates.append(opt.apply_gradients(
zip(clipped_gradients, params), global_step=self.global_step))
self.saver = tf.train.Saver(tf.global_variables())
The weights are fed seperately as a place-holder because they are normalized in the get_batch function to create zero weights for the PAD inputs.
# Batch decoder inputs are re-indexed decoder_inputs, we create weights.
for length_idx in range(decoder_size):
batch_decoder_inputs.append(
np.array([decoder_inputs[batch_idx][length_idx]
for batch_idx in range(self.batch_size)], dtype=np.int32))
# Create target_weights to be 0 for targets that are padding.
batch_weight = np.ones(self.batch_size, dtype=np.float32)
for batch_idx in range(self.batch_size):
# We set weight to 0 if the corresponding target is a PAD symbol.
# The corresponding target is decoder_input shifted by 1 forward.
if length_idx < decoder_size - 1:
target = decoder_inputs[batch_idx][length_idx + 1]
if length_idx == decoder_size - 1 or target == data_utils.PAD_ID:
batch_weight[batch_idx] = 0.0
batch_weights.append(batch_weight)