Assigning values manually to Tensors during training - tensorflow

I'm training a seq2seq model.
I want to set the hidden state of the decoder to the hidden state of the encoder in the tf.Session().
Doing something like the following just makes LSTM2's hidden state refer to LSTM1's hidden state object:
LSTM2.hidden_state = LSTM1.hidden_state
How do I copy it? I have tried using assign_op = LSTM2.hidden_state.assign(LSTM1.hidden_state) but get the error 'Tensor' object has no attribute 'assign' when I call it in sess.run()
Using tf.assign() in a similar way inside of the graph gives me the error Input 'ref' of 'Assign' Op requires l-value input
Thanks in advance.

You can "feed" the Tensor during session.run call, ie, suppose new set of values are in numpy array vals, then you can do sess.run(..., feed_dict={tensor: vals})

Related

How do I explicitly split a Dataset tuple in Tensorflow's functional API into two separate layers from just one input layer?

Context
The input to my model is a BatchDataset object called dataset_train, and it is batched to yield (training_data, label).
For some of the machinery in my model, I need to be able to split the Dataset tuple inside the model and independently access both the data and the label. This is a single input model with multiple outputs, so I am using Tensorflow's Functional API. For the sake of reproducibility, I am working with timeseries, so a toy dataset would look like this:
time = np.arange(1000)
data = np.random.randn(1000)
label = np.random.randn(1000)
training_data = np.zeros(shape=(time.size,2))
training_data[:,0] = time
training_data[:,1] = data
dataset_train = tf.keras.utils.timeseries_dataset_from_array(
data = training_data,
targets = label,
batch_size = batch_size,
sequence_length = sequence_length,
sequence_stride = 1,
)
Note: Sequence Length and batch_size are additional semi-arbitrary hyperparameters that are not important for the purposes of this question.
Question
How do I split apart the Dataset in Tensorflow's Functional API into the training data element and the label element?
Here is pseudocode of what I am looking for:
input = Single Input Layer that defines something capable of accepting dataset_train
training_data = input.element_spec[0]
label = input.element_spec[1]
After that point, my model can perform it's actions on training_data and label independently.
First Solution I tried:
I first started by trying to define two input layers and pass each element of the dataset tuple to each input layer, and the act on each input layer independently.
training_data = tf.keras.Input(shape=(sequence_length,2))
label = tf.keras.Input(shape = sequence_length)
#model machinery
model = tf.keras.Model(
inputs = [training_data, label],
outputs = [output_1, output_2]
)
#model machinery
history = model.fit(dataset_train, epochs = 500)
The first problem I had with this is that I got the following error:
ValueError: Layer "model_5" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, 2) dtype=float64>]
This is a problem, because if I actually pass the model a dictionary of datasets (nevermind that this isn't supported) then I introduce a circular dependency where in order to use model.predict, it expects labels for the inputs to model.predict. In other words, I need the answers to get the answers. Because I need to pass it only a single Dataset to prevent introducing this circular dependency (tensorflow implicitly assumes that the second element in a Dataset is the label, and doesn't require Datasets with labels for model.predict), I decided to abandon this strategy for unpacking the Input layer directly within the functional API for the model.
Second Solution I tried:
I thought maybe I could unpack the Dataset using the .get_single_element() method in the following code excerpt
input = tf.keras.Input(shape = (sequence_length, 2))
training_dataset, label = input.get_single_element()
This gave the following error:
AttributeError: 'KerasTensor' object has no attribute 'get_single_element'
I then thought the problem was that because the symbolic tensor wasn't of type Dataset, I needed to define the input layer to expect a Dataset. After reading through the documentation and spending ~9 hours messing around, I realized that tf.keras.Input takes an argument called type_spec, which allows the user to specify exactly the type of symbolic tensor to create (I think - I'm still a little shaky on understanding exactly what's going on and I'm more than a little sleep deprived, which isn't helping). As it turns out there's a way to generate the type_spec from the dataset itself, so I did that to make sure that I wasn't making a mistake in generating it.
input = tf.keras.Input(tensor = dataset_train)
training_dataset, label = input.get_single_element()
Which gives the following error:
AttributeError: 'BatchDataset' object has no attribute 'dtype'
I'm not really sure why I get this error, but I tried to circumvent it by explicitly defining the type_spec in the Input layer
input = tf.keras.Input(type_spec: tf.data.DatasetSpec.from_value(dataset_train))
training_dataset, label = input.get_single_element()
Which gives the following error:
ValueError: KerasTensor only supports TypeSpecs that have a shape field; got DatasetSpec, which does not have a shape.
I also had tried to make the DatasetSpec manually instead of generating it using .from_value earlier and had gotten the same error. I thought then it was just because I was messing it up, but now that I've gotten this error from .from_value, I'm beginning to suspect that this line of solutions won't work because DatasetSpec implicitly is missing a shape. I might also be confused, because performing dataset_train.element_spec clearly reveals that the dataset does have a shape, so I'm not sure why Tensorflow can't infer from it.
Any help in furthering either of those non-functional solutions so that I can explicitly access the training_data and label separately from an input Dataset inside the Functional API would be much appreciated!

How to get value of a tensor from a Tensorflow Mode

I am using the following implementation of the Seq2Seq model. Now, if I want to pass some inputs and get the corresponding values of encoder's hidden state (self.encoder_last_state), how can I do it?
https://github.com/JayParks/tf-seq2seq/blob/master/seq2seq_model.py
You need to first assemble input_feed, similar to the predict routine. Once you have that, just execute sess.run over the required hidden layer.
To assmeble the input_feed:
input_feed = self.check_feeds(encoder_inputs, encoder_inputs_length, decoder_inputs=None, decoder_inputs_length=None, decode=True)
input_feed[self.keep_prob_placeholder.name] = 1.0
sess.run over self.encoder_last_state:
encoder_last_state_activations = sess.run(self.encoder_last_state, input_feed)

How do you create a dynamic_rnn with dynamic "zero_state" (Fails with Inference)

I have been working with the "dynamic_rnn" to create a model.
The model is based upon a 80 time period signal, and I want to zero the "initial_state" before each run so I have setup the following code fragment to accomplish this:
state = cell_L1.zero_state(self.BatchSize,Xinputs.dtype)
outputs, outState = rnn.dynamic_rnn(cell_L1,Xinputs,initial_state=state, dtype=tf.float32)
This works great for the training process. The problem is once I go to the inference, where my BatchSize = 1, I get an error back as the rnn "state" doesn't match the new Xinputs shape. So what I figured is I need to make "self.BatchSize" based upon the input batch size rather than hard code it. I tried many different approaches, and none of them have worked. I would rather not pass a bunch of zeros through the feed_dict as it is a constant based upon the batch size.
Here are some of my attempts. They all generally fail since the input size is unknown upon building the graph:
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
.....
state = tf.zeros([Xinputs.get_shape()[0], self.state_size], Xinputs.dtype, name="RnnInitializer")
Another approach, thinking the initializer might not get called until run-time, but still failed at graph build:
init = lambda shape, dtype: np.zeros(*shape)
state = tf.get_variable("state", shape=[Xinputs.get_shape()[0], self.state_size],initializer=init)
Is there a way to get this constant initial state to be created dynamically or do I need to reset it through the feed_dict with tensor-serving code? Is there a clever way to do this only once within the graph maybe with an tf.Variable.assign?
The solution to the problem was how to obtain the "batch_size" such that the variable is not hard coded.
This was the correct approach from the given example:
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
The problem is the use of "get_shape()[0]", this returns the "shape" of the tensor and takes the batch_size value at [0]. The documentation doesn't seem to be that clear, but this appears to be a constant value so when you load the graph into an inference, this value is still hard coded (maybe only evaluated at graph creation?).
Using the "tf.shape()" function, seems to do the trick. This doesn't return the shape, but a tensor. So this seems to be updated more at run-time. Using this code fragment solved the problem of a training batch of 128 and then loading the graph into TensorFlow-Service inference handling a batch of just 1.
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
batch_size = tf.shape(Xinputs)[0]
state = self.cell_L1.zero_state(batch_size,Xinputs.dtype)
Here is a good link to TensorFlow FAQ which describes this approach 'How do I build a graph that works with variable batch sizes?':
https://www.tensorflow.org/resources/faq

Tensorflow cannot initialize tf.Variable for dynamic batch size

I tried creating a tf.Variable with a dynamic shape. The following outlines the problem.
Doing this works.
init_bias = tf.random_uniform(shape=[self.config.hidden_layer_size, tf.shape(self.question_inputs)[0]])
However, when i try to do this:
init_bias = tf.Variable(init_bias)
It throws the error ValueError: initial_value must have a shape specified: Tensor("random_uniform:0", shape=(?, ?), dtype=float32)
Just come context (question input is a placeholder which dynamic batch ):
self.question_inputs = tf.placeholder(tf.int32, shape=[None, self.config.qmax])
It seems like putting a dynamic value into random uniform gives shape=(?,?) which gives an error with tf.Variable.
Thanks and appreciate any help!
This should work:
init_bias = tf.Variable(init_bias,validate_shape=False)
If validate_shape is False, tensorflow allows the variable to be initialized with a value of unknown shape.
However, what you're doing seems a little strange to me. In tensorflow, Variables are generally used to store weights of a neural net, whose shape remains fixed irrespective of the batch size. Variable batch size is handled by passing a variable length tensor into the graph (and multiplying/adding it with a fixed shape bias Variable).

TensorFlow attention_decoder with RNNCell (state_is_tuple=True)

I want to build a seq2seq model with an attention_decoder, and to use MultiRNNCell with LSTMCell as the encoder. Because the TensorFlow code suggests that "This default behaviour (state_is_tuple=False) will soon be deprecated.", I set the state_is_tuple=True for the encoder.
The problem is that, when I pass the state of encoder to attention_decoder, it reports an error:
*** AttributeError: 'LSTMStateTuple' object has no attribute 'get_shape'
This problem seems to be related to the attention() function in seq2seq.py and the _linear() function in rnn_cell.py, in which the code calls the 'get_shape()' function of the 'LSTMStateTuple' object from the initial_state generated by the encoder.
Although the error disappears when I set state_is_tuple=False for the encoder, the program gives the following warning:
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell.LSTMCell object at 0x11763dc50>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.
I would really appreciate if someone can give any instruction about building seq2seq with RNNCell (state_is_tuple=True).
I ran into this issue also, the lstm states need to be concatenated or else _linear will complain. The shape of LSTMStateTuple depends on the kind of cell you're using. With a LSTM cell, you can concatenate the states like this:
query = tf.concat(1,[state[0], state[1]])
If you're using a MultiRNNCell, concatenate the states for each layer first:
concat_layers = [tf.concat(1,[c,h]) for c,h in state]
query = tf.concat(1, concat_layers)