TensorFlow attention_decoder with RNNCell (state_is_tuple=True) - tensorflow

I want to build a seq2seq model with an attention_decoder, and to use MultiRNNCell with LSTMCell as the encoder. Because the TensorFlow code suggests that "This default behaviour (state_is_tuple=False) will soon be deprecated.", I set the state_is_tuple=True for the encoder.
The problem is that, when I pass the state of encoder to attention_decoder, it reports an error:
*** AttributeError: 'LSTMStateTuple' object has no attribute 'get_shape'
This problem seems to be related to the attention() function in seq2seq.py and the _linear() function in rnn_cell.py, in which the code calls the 'get_shape()' function of the 'LSTMStateTuple' object from the initial_state generated by the encoder.
Although the error disappears when I set state_is_tuple=False for the encoder, the program gives the following warning:
WARNING:tensorflow:<tensorflow.python.ops.rnn_cell.LSTMCell object at 0x11763dc50>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.
I would really appreciate if someone can give any instruction about building seq2seq with RNNCell (state_is_tuple=True).

I ran into this issue also, the lstm states need to be concatenated or else _linear will complain. The shape of LSTMStateTuple depends on the kind of cell you're using. With a LSTM cell, you can concatenate the states like this:
query = tf.concat(1,[state[0], state[1]])
If you're using a MultiRNNCell, concatenate the states for each layer first:
concat_layers = [tf.concat(1,[c,h]) for c,h in state]
query = tf.concat(1, concat_layers)

Related

Keras Upsampling2d -> tflite conversion results in failing shape inference and undefined output shape

Keras Upsampling2d operation is converted into this with additional operations and undefined shape
Tensorflow however converts without this operations with correct shape
This leads to undefined overall model output shape and leads to errors on device. How can this be fixed?
This behavior is described here https://github.com/tensorflow/tensorflow/issues/45090
Keras by default sets dynamic batch size to true.
That means that the model input shape is [*,28,28] not [1,28,28].
The old(deprecated) converter used to ignore the dynamic batch and override this to 1 - which is wrong since this is not what the original model has - you can imagine how bad it will be when you try to resize the inputs at runtime.
The current converter instead handles the dynamic batch size correct, and the model generated can be resized at runtime correct.
That's why the sequence of "Shape, StridedSlice, Pack" wasn't constant folded, since the shape is dependent on the shape defined at runtime.
For single input model this can be fixed by setting constant shape for keras model before saving
model.input.set_shape(1 + model.input.shape[1:])

Tensorflow Saving Error (from Tensorflow Example)

I am trying to use the Basic Text Classification example from Tensorflow on my own dataset. Training and verification have gone well and I am to the point in the tutorial for exporting the model. The model compiles and works on an array of strings.
After that, I'd like to save the model in h5 format for use in other projects. At this point, the tutorial refers you to save and load keras models tutorial.
This second tutorial essentially says to do this:
model.save('path/saved_model.h5')
This fails with
ValueError: Weights for model sequential_X have not yet been created. Weights are created when the Model is first called on inputs or build() is called with an input_shape.
So next I attempt to do this:
model.build((None, max_features))
model.save('path/saved_model.h5')
There are several errors with this:
ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: <tf.Tensor 'Placeholder:0' shape=(None, 45000) dtype=float32>
TypeError: Input 'input' of 'StringLower' Op has type float32 that does not match expected type of string.
ValueError: You cannot build your model by calling build if your layers do not support float type inputs. Instead, in order to instantiate and build your model, call your model on real tensor data (of the correct dtype).
I think this essentially means the input I defined to pass into model.build defaults to float and needs to be string. I think I have two options:
Somehow define my input layer to be string, which I cannot see how to do. This feels like the correct thing to do.
Use model.call. However I am not sure how to 'call my model on real tensor data' because tensors can't be strings and that is the input to the network.
I've seen one other person with this issue here, with no solution other than to rebuild the model in functional style with mixed results. I am not sure of the point of rebuilding in the functional style since I don't fully understand the problem.
I'd prefer to have the TextVectorization layer built into the final model to simplify deployment. This is exactly the reason the docs give for doing this in the example in the first place. (The model will save without it.)
I am a novice with this so I might be making a simple mistake. How can I get this model to save?

Loss tensor being pruned out of graph in PopART

I’ve written a very simple PopART program using the C++ interface, but every time I try to compile it to run on an IPU device I get the following error:
terminate called after throwing an instance of ‘popart::error’
what(): Could not find loss tensor ‘L1:0’ in main graph tensors
I’m defining the loss in my program like so:
auto loss = builder->aiGraphcoreOpset1().l1loss({outputs[0]}, 0.1f, popart::ReductionType::Sum, “l1LossVal”);
Is there something wrong with my loss definition that’s resulting in it being pruned out of the graph? I’ve followed the same structure as one of the Graphcore examples here.
This error usually happens when the model protobuf you pass to the TrainingSession or InferenceSession objects doesn’t contain the loss tensor. A common reason for this is when you call builder->getModelProto() before you add the loss tensor to the graph. To ensure your loss tensor is part of the protobuf your calls should be in the following order:
...
auto loss = builder->aiGraphcoreOpset1().l1loss(...);
auto proto = builder->getModelProto();
auto session = popart::TrainingSession::createFromOnnxModel(...);
...
The key point is that the getModelProto() call should be the last call from the builder interface before setting up the PopART session.

how to create a tf.layers.Dense object

I want to create a dense layer in tensorflow. I tried tf.layers.dense(input_placeholder, units) which will directly create this layer and get result, but what I want is just a "layer module", i.e. an object of the class tf.layers.Dense(units). I want to first declare these modules/layers in a class, and then to have several member functions apply1(x, y), apply2(x,y) to use these layers.
But when I did in tensorflow tf.layers.Dense(units), it returned:
layer = tf.layers.Dense(100) AttributeError: 'module' object has no
attribute 'Dense'
But if I do tf.layers.dense(x, units), there's no problem.
Any help is appreciated, thanks.
tf.layers.Dense returns a function object that you later apply to your input. It performs variable definitions.
func = tf.layers.Dense(out_dim)
out = func(inputs)
tf.layers.dense performs both variable definitions and application of the dense layer to your input to calculate your output.
out = tf.layers.dense(inputs, out_dim)
Try to avoid the usage of placeholders, you have to feed_dict into the tf.Session so its probably causing this issue.
Try to use the new estimator api to load the data and then use dense layers as is done in the tensorflow's github examoples: [https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/layers/cnn_mnist.py]:
tf.layers.Dense was not exported in TensorFlow before version 1.4. You probably have version 1.3 or earlier installed. (You can check the version with python -c 'import tensorflow as tf; print(tf.__version__)'.)

Assigning values manually to Tensors during training

I'm training a seq2seq model.
I want to set the hidden state of the decoder to the hidden state of the encoder in the tf.Session().
Doing something like the following just makes LSTM2's hidden state refer to LSTM1's hidden state object:
LSTM2.hidden_state = LSTM1.hidden_state
How do I copy it? I have tried using assign_op = LSTM2.hidden_state.assign(LSTM1.hidden_state) but get the error 'Tensor' object has no attribute 'assign' when I call it in sess.run()
Using tf.assign() in a similar way inside of the graph gives me the error Input 'ref' of 'Assign' Op requires l-value input
Thanks in advance.
You can "feed" the Tensor during session.run call, ie, suppose new set of values are in numpy array vals, then you can do sess.run(..., feed_dict={tensor: vals})